Jump to content

Import/Export module(s)


Nico Knoll
 Share

Recommended Posts

Hi,

I guess the import and export functions are some of the most important functions of a CMS. And if I'm talking about an import function I'm not just meaning the functionality to import other ProcessWire backups but also to import backups from other CMS' or blogging systems like WordPress, Joomla, Drupal, Typo3, etc. I know it is a big task and of course I'm not finished it yet. I just did some researches about the different backup formats until today. And thought about the best way to create a module for this. Or some modules. So here are the two ways which I would recommend:

The one for all way:

My first idea was to find a way to convert all backups to .csv-files so the .csv-importer, written by Ryan, could import all of the different backups and you could switch more easily to ProcessWire with all of your old data. (It don't have to be the .csv-format. Another good format could be a custom .xml-format, like WordPress, but than we/I would have to write a special .xml-import-module).

The all for one way:

The other way would be to write a custom module for each different backup-file format. That would be the WordPress way (yeah, maybe you recognized already that I worked a long time with WordPress).

I myself would prefer the first way because i think it's more elegant and if every file is converted to a special Processwire backup format or maybe a standard backup format these also could be used for the backup file of your own Processwire page. But what do you think about this and how would YOU solve this problem?

Link to comment
Share on other sites

Nice consideration. Not sure how easy this all would be. Sure a big task to build something up. Of course it would help people migrating and this can be a plus on the feature list.

Not sure about what version I would go for.

Are you planning on doing this, or do you want someone to do it?

Either of the two ways would be possible and wouldn't require much work on PW side. Though I'm not sure what is planned by Ryan in this area. I'm sure he will make the CSV importer much more clever and versatile. Maybe there could be some collaboration taking place.

Since I barely used WordPress and the other system (don't like them at all) I can't really help on this, as I will also spend my spare time on other helpful modules.

Link to comment
Share on other sites

If there were to be a standard format for importing ProcessWire-to-ProcessWire, I'd probably choose JSON. Like XML, but without unnecessary verbosity for when you don't need it.

However, this isn't a great way for going to/from other CMSs because every system is different and there isn't going to be a universal way to reconstruct everything from one CMS into another with 1 click. That is, unless the importer is written specifically for importing from a specific CMS. Given that, I tend to think CSV is a good way to go. CSV is just data and lacks any real metadata (unless you decide to bundle it in). In my experience, when switching platforms (importing data from another CMS), the job is easiest if you can just deal with the data and not the platform-specific metadata. In fact, I find it sometimes easiest to ignore the existing CMS and just screen-scrape the data right off the site.

However, the main reason I like CSV is just because I can bring it into a spreadsheet very easily. Once I've got data in a spreadsheet, I know that anything can be done with it. I can modify it, save it, and bring it into ProcessWire. It's not nearly as straightforward trying to bring a JSON or XML into a spreadsheet.

CSV is such a stupidly-simple format that there is little question about the scope of it. You know that when you import a CSV, you are going to have to figure out where those fields should go, and create the fields if necessary. I think it creates the right balance of involvement between data and user and leaves out any ambiguity.

I wrote the CSV importer primarily because I deal with so much data that comes from spreadsheets. So having a CSV importer was the obvious choice there. Previously I'd always just used the API for doing imports (and still do half the time), as that's also quite easy and you can work with any format there. Ultimately I'd like to expand the importer to support JSON and XML too, but I suspect that CSV will be the most used.

As for making safety backups, a MySQL dump file can't be beat for that. I don't think that's what we're talking about here, but just wanted to mention that one should always use mysqldump files for their regular site database backups.

Link to comment
Share on other sites

Well, I really like your CSV-importer but I need a way to obtain (right word?) the page structure because I want to import a lot of parent pages AND their childs (like 1000). And if I would have to sort them manually it would be really annoying...

Link to comment
Share on other sites

OK, here's an example of my code. But first a short description: I exported this code out of Typo3 with an plugin as a .csv file (so there is no content only the pages theirselves). There are a lot of needless columns in it. I guess the most importants are "uid, pid, title, hidden" (uid = unique ID, pid = parent ID).

And here's finally the code ;) :

"uid","pid","t3ver_oid","t3ver_id","t3ver_wsid","t3ver_label","t3ver_state","t3ver_stage","t3ver_count","t3ver_tstamp","t3ver_swapmode","t3_origuid","tstamp","sorting","deleted","perms_userid","perms_groupid","perms_user","perms_group","perms_everybody","editlock","crdate","cruser_id","title","doktype","TSconfig","storage_pid","is_siteroot","php_tree_stop","tx_impexp_origuid","url","hidden","starttime","endtime","urltype","shortcut","shortcut_mode","no_cache","fe_group","subtitle","layout","target","media","lastUpdated","keywords","cache_timeout","newUntil","description","no_search","SYS_LASTCHANGED","abstract","module","extendToSubpages","author","author_email","nav_title","nav_hide","content_from_pid","mount_pid","mount_pid_ol","alias","l18n_cfg","fe_login_mode","tx_rlmptmplselector_main_tmpl","tx_rlmptmplselector_ca_tmpl","t3ver_move_id"

"10","1","0","0","0","","0","0","0","0","0","0","1161721085","2816","0","1","1","31","27","0","0","1159038030","1","Impressum","1","","0","0","0","10","","0","0","0","1","0","0","0","","und rechtliche Hinweise","0","","","0","","0","0","","0","1235477424","","","0","","","","0","0","0","0","impressum","0","0","layout_2col_left_vlines.html","0","0"
"68","1","0","0","0","First draft version","0","0","0","0","0","0","1161721085","2560","0","1","1","31","27","0","0","1159038030","1","Sitemap","1","","0","0","0","68","","0","0","0","1","0","0","0","","","0","","","0","","0","0","","0","1234729230","","","0","","","","0","0","0","0","sitemap","0","0","layout_2col_left_vlines.html","0","0"

"34","1","0","0","0","","0","0","0","0","0","0","1259098298","2304","0","1","1","31","27","0","0","1159038030","1","Logout alt","1","","2","0","0","34","","0","0","0","1","0","0","0","-2","","0","","","0","","0","0","","1","1259098298","","","0","","","","0","0","0","0","","0","0","layout_3col_vlines_v1.html","0","0"

"26","1","0","0","0","","0","0","0","0","0","0","1259098321","2048","0","1","1","31","27","0","0","1159038030","1","Login alt","1","","2","0","0","26","","0","0","0","1","0","0","0","-1","","0","","","0","","0","0","","1","1259098321","","","0","","","","0","0","0","0","","0","0","layout_3col_vlines_v1.html","0","0"

"37","34","0","0","0","","0","0","0","0","0","0","1161721085","1536","0","1","1","31","27","0","0","1159038030","1","Suche","1","","0","0","0","37","","0","0","0","1","0","0","0","","innerhalb dieser Website","0","","","0","","0","0","","1","1170148658","","","0","","","","0","0","0","0","suche","0","0","layout_2col_left_vlines.html","0","0"

"115","34","0","0","0","","0","0","0","0","0","0","1172681574","250","0","1","1","31","27","0","0","1162201214","1","Schüler","1","","0","0","0","0","","0","0","0","1","0","0","0","","","0","","","0","","0","0","","0","1313429676","","","0","","","","0","0","0","0","","0","0","layout_2col_left_vlines.html","0","0"

It would be great if there would be a way to import it right :)

Link to comment
Share on other sites

This is rather some heavy CSV :) But to get you started, here some pseudocode I would use for the tree itself (and this is without using the csv importer; just pure php - assuming you can access .CSV via '$record['field_name']' notation.

<?php
$parentArray = array('1'=>'/'); //typo3 pid -> PW API url translation
foreach ($csv as $csvRecord) {
 $newPage = new Page();
 $newPage->template = str_replace('.html', '', $csvRecord['layout']);
 $newPage->parent = $pages->get($parentArray[$csvRecord['pid']]);

 $newPage->title = $csvRecord['title'];
 $newPage->name = $csvRecord['name']; //assuming you have access page's url string; create from title if needed;

 $parentArray[$csvRecord['uid']] = $newPage->parent->url.$newPage->name; //add new 'parent array' record for next pages

 //continue with data import

 $newPage->save();
}

also, for layout->template translation, you can either use the notation I used in my example (that is, initially name your templates the same way layouts in typo3 were named and then rename them) or, alternatively, you can create another translation array (the same way you do with parents, that is, create translation array [this time pre-populated], like so):

<?php
  $layoutToTemplate = array(
    'layout_3_v_col1.html' => 'page',
    'layout_3_v_col2.html' => 'product'
  );

(names of typo3 layouts are examples only)

Link to comment
Share on other sites

Will take a closer look when I'm at the computer tomorrow, but wanted to mention that when importing data where you need to maintain the previous systems unique id, create new field(s) for it (like an integer field) and name it something like source_id, old_id, old_parent_id, etc.  Then use that where needed to create your structure. PW won't be happy if you try to tell it what numbers it should use for it's own unique IDs, in part because it reserves several number ranges for internal use.

Link to comment
Share on other sites

I think that it'll be hard for us to tell you exactly what to do without having all the data and doing it ourselves, but I'm going to try my best to give you an applicable example that should roughly fall in with the approach you'll want to take. Things like setting a different template based on parent or the like are going to be very case specific, so I've kept it simple here. Also, I'm doing this as a command line script, but the same example should translate to populating in a template or bootstrapping the API from another web script. Lastly, I've assumed you already have a template in your system called 'example-template' that has the fields 'typo3_uid' and 'typo3_pid' (FieldtypeInteger) added to it, and that you already have a page in your system called '/example/' which will serve as the root parent where we start populating your imported pages and build the structure from.

#!/usr/local/bin/php -q
<?php // change the line above to reflect your system path or remove if not command-line

include("./index.php"); // bootstrap ProcessWire's API, if applicable

// the example template and parent we will be using, change for your use
$template = wire('templates')->get('example-template'); 
$parent = wire('pages')->get('/example/'); 

// open the CSV file with the data in it
$fp = fopen('your-csv-file.csv', 'r'); 
$n = 0;

// we'll keep track of the CSV labels on line 1 so we can use those labels to make an associative array
$labels = array();

// loop through each line in the CSV
while(($data = fgetcsv($fp)) !== false) {

    if(++$n == 1) {
        // if we're on the first line, save the CSV labels and skip over the line
        $labels = $data; 
        continue; 
    }

    // use the labels we found from line 1 to make the $data array associative
    foreach($data as $key => $value) {
        $label = $labels[$key]; 
        $data[$label] = $value; // now $data[1] is $data[pid], for example
        unset($data[$key]); // don't need the numbered index anymore
    }

    // create the new page to import
    $page = new Page();
    $page->template = $template;
    $page->parent = $parent; 
    $page->typo3_uid = $data['uid']; 
    $page->typo3_pid = $data['pid']
    $page->name = "typo3-$data[uid]"; // i.e. typo3-123, you may find a better field for this
    $page->title = $data['title']; 
    // add any other fields you want to import from the CSV here
    $page->save(); 
}

// with all the pages imported, now lets give them structure according to the typo3 'pid' field. 
// we didn't do this before just in case the children imported before the parent, which may or may
// not be the case in your CSV, but I didn't want to risk it... 

// locate all the typo3 pages that have a parent greater than 1 (i.e. greater than /example/, our import root) 
$matches = wire('pages')->find("parent=$parent, typo3_pid>1"); 

// loop through the pages that were found
foreach($matches as $page) {
    // find the parent they are supposed to have
    $newParent = wire('pages')->get("typo3_uid=" . $page->typo3_pid); 
    if($newParent->id) {
        // give them the proper parent and then save
        $page->parent = $newParent; 
        $page->save(); 
    }
}

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...