Jump to content

How to import multi-language content using the ProcessWire API


Recommended Posts

I have a database where most of the content has versions in two languages. Suppose I want to import that content into a ProcessWire instance where I have configured mult-language support for those languages. How could I write an import script that iterates over each article, and pairs it with it's partner so that they both appear together in the article's language tabs?

Here's what I'm thinking so far:

// Get the data from the old database
$articles = database()->query("
    SELECT *
    FROM textpattern
    WHERE section='" . IMPORT_SECTION . "'
    // Join to l10n_articles
    ORDER BY posted desc
");

// Create each article in ProcessWire
foreach($articles as $article) {

    if ($article['Status'] == 4) {

        $newArticle = new Page();
        $newArticle->template = 'article';
        $newArticle->parent = pages()->get("/" . IMPORT_SECTION);
        $newArticle->name = $article['url_title'];

        // Import each field into the appropriate ProcessWire field here.
        // I understand how to do this already.
        
        // Determine this article's language from the l10n_articles table here.
        // Determine the ID of the article's other-language partner here.
        // This should be pretty straightforward.

        // TBD: Insert this content into the appropriate language tab in ProcessWire here.
        // I have no idea how to do this.
        
        $newArticle->save();

    }

}

I imagine the API for this works according to principles I already understand, but I don't know where to look.

Thanks in advance for any guidance you can offer! Also, any notes about expected obstacles and challenges would be great.

Has anyone here solved this problem before?

Link to comment
Share on other sites

  • johnstephens changed the title to How to import multi-language content using the ProcessWire API
<?php namespace ProcessWire;

// get your languages
$default = $languages->get("default"); // retrive default (german)
$english = $languages->get("english"); // retrive english

// imported pages with content in different fields
// needed to move content in the correct fields there
$importedPages = $pages->findMany("template=product, parent=4372, fkImportId=import44242");

foreach ($importedPages as $importedPage) {
    $importedPage->of(false); // outputFormatting must be OFF

    // title field
    $importedPage->title->setLanguageValue($default, $importedPage->fkTitleDe); // set in default
    $importedPage->title->setLanguageValue($english, $importedPage->fkTitleEn); // set in english
	
  	// textarea field
    $importedPage->textarea->setLanguageValue($default, $importedPage->fkDescShortDe); // set in default
    $importedPage->textarea->setLanguageValue($english, $importedPage->fkDescShortEn); // set in english

    $importedPage->save(); // save the page with the new language values
}

Had a similar task a while back with content already stored in pages but in totally wrong fields (on purpose) - as we imported from CSV and moved content into their correct places later on which was way easier and faster with ImportPagesCSV and this little snippet here.

Hope this helps a bit.

Link to comment
Share on other sites

Another thing I see here... as you are moving content from Textpattern (I loved to use it!)... you might want to double check that you don't import the textile version but rather the HTML version of each page/article. Otherwise it might end looking funny.

Another thing that might cause a headache could be image management. The last time I used Textpattern images were placed just by referencing them by ID. At least that's what I did back then. In this case you could (if possible) import those nowadays with Import External Images which looks up full URLs to an image and imports them.

Link to comment
Share on other sites

Thank you, @wbmnfktr! This looks doable and helped me estimate the work involved.

Thanks for the Textpattern import tips, too! I became intimately familiar with Textpattern's database scheme before writing my first TXP content importer for anther ProcessWire site migration, and I remembered to import from the body_html field then. ? But I did not know about the "Import External Images" plugin, and wound up writing my own image import scripts, which got a little hairy.

  • Like 1
Link to comment
Share on other sites

  • 6 months later...

I wanted to update this thread because I had to solve some unexpected problems when importing my content. Maybe documenting this will save someone else the struggle—even me, if my memory fails and I search the forum for answers to the same question.

Firstly, my site requires multi-language URLs, and titles. This requires some deliberate configuration with the multi-language modules that must be taken care of BEFORE attempting to import data. This seems obvious in hindsight, but I assumed that these features were enabled by default once the core Multi-language module was active.

One thing I noticed when attempting to import data was that the setLanguageValue() method would not work for the name field, even after enabling multi-language URLs. I had to do a lot of searching the forum before finding out that the language-specific name fields have a different way of assigning values via the API than what I could find in the Multi-language documentation.

In the end, using the API to assign appropriate data to the fields in ProcessWire required two incantations that I failed to find in the Multi-language docs:

  1. In order to make the non-default language(s) "active", I had to use the page()->set() method to establish a hidden property "status$language". This property did not appear as unset when I buffer dumped the page() using TracyDebugger.
  2. Assigning the URL title (commonly called the "name" in ProcessWire) in the non-default language(s) required using the same set method to establish a "name$language" property.

Here's a stripped down, abstracted example of my working import script, so far.

<?php namespace ProcessWire;

$en = languages()->get("default");
$es = languages()->get("es-es");

$articles = = database()->query("
    // Magic SQL query here
");

foreach ($articles as $article) {

    $parent = pages()->get('/' . IMPORT_SECTION);
    $template = templates()->get('my-article-template');

    $newArticle = pages()->add($template, $parent);
    $newArticle->set("status$es", 1);

    $newArticle->title->setLanguageValue($en, $article['Title_en']);
    $newArticle->title->setLanguageValue($es, $article['Title_es']);

    $newArticle->name = sanitizer()->pageName($article['url_title_en']);

    $newArticle_name_es = sanitizer()->pageName($article['url_title_es']);
    $newArticle->set("name$es", $newArticle_name_es);

    $newArticle->body->setLanguageValue($en, $article['Body_en']);
    $newArticle->body->setLanguageValue($es, $article['Body_en']);

    // Etc., setting each field's value using setLanguageValue() method

    $newArticle->save();

}

I have moved on to other aspects of the project, but I wanted to post this before I forget. It makes me wonder what else I'm missing out on, and how I might learn to do this better.

Thank you!

  • Like 7
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...