Jump to content

use api to spellcheck whole site+clean up spacing


Macrura
 Share

Recommended Posts

Hi,

I was wondering if anyone has come up with a way to use the api to scan through say the body fields sitewide, and check for spelling errors, and correct, or make a report of the errors... trying to launch a site where much content was ported from an older site with a mass of spelling errors/typos.

also on this same site the original editors typed everything using 2 spaces after every period. I'm wondering if there is some easy way using the api to run a preg replace or trim to make those double spaces into single spaces.

-marc

Link to comment
Share on other sites

Especially running these actions without supervision could lead to even worse results, so I wouldn't suggest it (or, at least, you should do it first with some test data and generate thorough report of what's changed.)

Don't really know anything about spell checking tools, but I'd expect couple of those to be pop up once you Google a bit. PHP has some native features for this (but not sure how commonly installed those are) and apparently there are various custom methods available too. You'll probably also have to split field data to single words and actually recognize which items resemble proper words.

Anyway, API part would be quite simple, something along these lines:

foreach (wire('pages')->find('template!=admin') as $p) {
    foreach ($p->template->fields as $f) {
        // include fieldtypes you want to check (or check for $f->name == "body" if you prefer that)
        if (in_array($f->type, array('FieldtypePageTitle', 'FieldtypeText', 'FieldtypeTextarea'))) {
            $original = $p->$f;
            // do some magic here, such as this:
            $p->$f = preg_replace('/. [ ]+/', '. ', $p->$f);
            // some logging would be nice:
            if ($p->isChanged($f)) {
                echo "{$p->url} {$f->name} changed: {$original} => {$p->$f}\n\n";
            }
        }
        if ($p->isChanged()) $p->save();
    }
}

What could be somewhat problematic are fields with HTML in them..  :)

  • Like 3
Link to comment
Share on other sites

@teppo, many thanks for the reply.... for the spelling ultimately they might have to hire someone to go through and manually spellcheck all of the body fields on the relevant pages;  i know there are some database spellcheck tools, but in this case there's not an overwhelming number of pages...

i appreciate the code for the space fixing, and i'm going to test it on a few fields..!

Link to comment
Share on other sites

One thing you could do would be to use PhpMyAdmin (or mysqldump) to export the whole database (or just the relevant field_[name] text field tables) to an SQL/text file. Load it in a quality editor that won't mess it up (like BBEdit/TextWrangler), and let it run a spellcheck. It should red underline everything it thinks is misspelled. It will be simpler to make manual corrections in this one file, and then re-import the whole thing, than to go through it online. 

As for space fixing, I wouldn't worry about it. Assuming they aren't non-breaking space characters, they will get collapsed down to 1 space in HTML either way. So it doesn't really matter how many spaces one uses after a sentence, because it'll always be displayed as one. 

  • Like 3
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...