Jump to content

How to import table of data into ProcessWire


Tony Carnell
 Share

Recommended Posts

Hi,

I'll confess to having only just discovered ProcessWire, and am currently getting my head around the fundamental differences between it and other CMSs/CMFs I've used in the past. I'm very impressed with its potential though.

I have a question however that I hope you guys can help me with.

I'm involved in a project to implement a searchable database of dams from across the world (in fact something akin to the Skyscraper demo except that the dams will be searchable on many more criteria).

We already have a database table containing all 600+ dams and I'm wondering how to import those into ProcessWire in a way that it can handle.

Is it possible simply to add that table to the ProcessWire database and tell it how to interact with it, or will that complicate things further down the line?

I don't fancy having to manually add all the dams to ProcessWire, since quite apart from the time and hassle it would also introduce human error.

All advice is gratefully received :)

Tony.

Link to comment
Share on other sites

Hi Tony! Welcome to the forum :)

I think one of the fundamental things to consider with projects like these, is to actually look and analyse the existing data - the structure, the format, the quality, and any relationships there may be. It's almost going through the normalisation process but perhaps not as strict.

It would be a good idea to take every column in the existing database to decide how that will be represented in the ProcessWire site. Probably, some of these will be simple translations - Varchars will become Text fields, and Ints will still be Integer fields. But what will be more important is the data in these fields - does it vary with just about every single dam? Is it a value representing Yes/No? Are there only a few values for that column that all dams can potentially use? If so, do the values all use the same spelling of words, or are some mixed case or have weird formatting or spacing issues? When you have a set list of values to choose from, this is where you can decide how to store the data in ProcessWire - Page fields, PageTables, Options field, etc.

If your source data quality is good, your job will be easier. If it's not, there will be some work involved in preparing it so that it is clean to work with.

You may already have a good idea of how your data looks or is structured, but I usually like to do some form of planning on paper or using a spreadsheet to map out the source data to the ProcessWire templates and fields. Doing the important work now will save you hassle later on down the line and will really help you to make the search feature more functional.

As an example:

In ProcessWire, I imagine there would be a template created called 'dam'. You might add the following fields to this template (just for starters):

  • title
  • description
  • location (FieldtypeMapMarker)
  • height (integer/float)
  • length (integer/float)
  • capacity/volume (integer/float)
  • country (Page field/Options field?)
  • built (date)
  • opened (date)

For each dam, a Page would exist somewhere in the page tree, using the 'dam' template.

As you've already seen the Skyscraper profile, you can probably look at that for more ideas on how to structure things though :)

  • Like 5
  • Thanks 1
Link to comment
Share on other sites


$myDB = new Database('localhost', 'username', 'password', 'db_name', 3306);

$result = $myDB->query("SELECT * FROM my_table");

while($row = $result->fetch_assoc()) {

$p = new Page();

$p->template = "dam";

$p->parent = "something";

$p->title = $row['title'];

$p->save();

}

  • Like 8
Link to comment
Share on other sites

@LostKobrakai, thanks for this example. I like that kind of approach. Never thought this could be done

with so little lines of code. Nice to see this pdo/php directly mixed with processwire api.

Edit:

There is more to find about this in the forum like here:

https://processwire.com/talk/topic/2242-interact-with-external-database-mysql/#entry20969

But are $myDB and new Database reserved words in Processwire, and if so where can I find them declared or explained, or can you chose just any other word ? I can't find anything about Database in the api cheatsheet

or with google.

Link to comment
Share on other sites

Wow, thanks for all your comments and suggestions guys.

The ProcessWire community sure seems a friendly and helpful lot! :)

I'll read through all the pointers you've given me and take it from there.

You've certainly all given me plenty of food for thought.

Thanks again, it's much appreciated.

Tony.

  • Like 2
Link to comment
Share on other sites

@pwired.

You can find out about these things, when looking into the wire/core (and wire/modules/) folders. These classes are just things the core itself uses and are not providing an "api" like $page or $pages. About the length of the snippet: Even raw php wouldn't need that much more lines to read from an mysql table.

  • Like 1
Link to comment
Share on other sites

There is one thing that needs a bit attention when importing data. We need to check if a record was also imported by a previous run of the script, otherwise you may end up with numerous duplicates.

while($row = $result->fetch_assoc()) {
  $title = wire("sanitizer")->text($row['title'], array("maxLength"=>128));  // fetch and sanitize the title
  $p = wire('pages')->get("title={$title}");                                 // check if it already exists
  if(0 < $p->id) continue;                                                   // move on to the next record if it exists
  $p = new Page();
  $p->template = "dam";
  $p->parent = "something";
  $p->title = $title;                                                        // use the sanitized title
  $p->save();
}
  • Like 7
Link to comment
Share on other sites

When I'm importing data, I usually add a ProcessWire field called legacy_id. Any import script will set this value to the actual ID from the item in the old dataset. It helps the script easily and accurately find existing imported items, and you can keep this around for a little while after so you can trace it back if you need to.

  • Like 3
Link to comment
Share on other sites

Craig is saying that he adds a field for each created page that contains the original id in the old database. If something goes wrong, or you need to adjust something in the import, or you need some more info from the old database that you didn't import originally, or you simply need to check if the items were imported correctly, you can check that field against the ID in the old database.

$p->legacy_id = $row['id'];
  • Like 4
Link to comment
Share on other sites

  • 1 month later...
  • 2 years later...
On 9/2/2015 at 3:31 PM, horst said:

There is one thing that needs a bit attention when importing data. We need to check if a record was also imported by a previous run of the script, otherwise you may end up with numerous duplicates.


while($row = $result->fetch_assoc()) {
  $title = wire("sanitizer")->text($row['title'], array("maxLength"=>128));  // fetch and sanitize the title
  $p = wire('pages')->get("title={$title}");                                 // check if it already exists
  if(0 < $p->id) continue;                                                   // move on to the next record if it exists
  $p = new Page();
  $p->template = "dam";
  $p->parent = "something";
  $p->title = $title;                                                        // use the sanitized title
  $p->save();
}

how would I do this for multilingual data?

if in the external DB in each row we have some fields for both languages

example:

$row['title_en'] $row['title_fr']

$row['summary_en'] $row['summary_fr']

$row['other_en'] $row['other_fr']

 

 

Link to comment
Share on other sites

17 minutes ago, bernhard said:

@bernhard   thanks, this should work I guess

while($row = $result->fetch_assoc()) {
  $title_en = wire("sanitizer")->text($row['title_en'], array("maxLength"=>128));
  $title_fr = wire("sanitizer")->text($row['title_fr'], array("maxLength"=>128));
  $p = wire('pages')->get("title={$title_en}");                                 
  if(0 < $p->id) continue;                                                   
  $p = new Page();
  $p->title->setLanguageValue($languages->get('en'), $title_en);
  $p->title->setLanguageValue($languages->get('fr'), $title_fr);                                                       
  $p->save();
}

 

Link to comment
Share on other sites

  • 1 year later...
On 9/2/2015 at 2:34 AM, Craig A Rodway said:

Hi Tony! Welcome to the forum :)

I think one of the fundamental things to consider with projects like these, is to actually look and analyse the existing data - the structure, the format, the quality, and any relationships there may be. It's almost going through the normalisation process but perhaps not as strict.

It would be a good idea to take every column in the existing database to decide how that will be represented in the ProcessWire site. Probably, some of these will be simple translations - Varchars will become Text fields, and Ints will still be Integer fields. But what will be more important is the data in these fields - does it vary with just about every single dam? Is it a value representing Yes/No? Are there only a few values for that column that all dams can potentially use? If so, do the values all use the same spelling of words, or are some mixed case or have weird formatting or spacing issues? When you have a set list of values to choose from, this is where you can decide how to store the data in ProcessWire - Page fields, PageTables, Options field, etc.

If your source data quality is good, your job will be easier. If it's not, there will be some work involved in preparing it so that it is clean to work with.

You may already have a good idea of how your data looks or is structured, but I usually like to do some form of planning on paper or using a spreadsheet to map out the source data to the ProcessWire templates and fields. Doing the important work now will save you hassle later on down the line and will really help you to make the search feature more functional.

As an example:

In ProcessWire, I imagine there would be a template created called 'dam'. You might add the following fields to this template (just for starters):

  • title
  • description
  • location (FieldtypeMapMarker)
  • height (integer/float)
  • length (integer/float)
  • capacity/volume (integer/float)
  • country (Page field/Options field?)
  • built (date)
  • opened (date)

For each dam, a Page would exist somewhere in the page tree, using the 'dam' template.

As you've already seen the Skyscraper profile, you can probably look at that for more ideas on how to structure things though :)

thanks for the input

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Similar Content

    • By ngrmm
      I would like to create pages from a json feed.
      So i decode my json and create them via API.
      $jsonData = json_decode($jsonFromOtherWebsite); foreach($jsonData as $jsonDataItem) { $pageTitle = $jsonDataItem->name; $p = new Page(); $p->template = 'import_page'; $p->parent = $pages->get(xxxx); $p->title = $pageTitle; $p->save(); } Let's say the source (json) changes and i have to do another import.
      Then I want to compare the new json with the existing pages to see if there are new ones and if there some aren't there anymore.

      Is there a way to compare the new JsonData with my existing pw-pages with the API.
      Something like
      foreach($jsonData as $jsonDataItem) { // check if a page with this title exist if($pages->find("template=import_page, title=$jsonDataItem->name") { // update existing field values $getExistingPage = $pages->find("template=import_page, title=$jsonDataItem->name"); // update value $getExistingPage->setAndSave('field', $jsonDataItem->x); } else { // create new page $pageTitle = $jsonDataItem->name; $p = new Page(); $p->template = 'import_page'; $p->parent = $pages->get(xxxx); $p->title = $pageTitle; $p->save(); } } // search for pages wich are not anymore in the json and hide/delete them // …  
    • By Vineet Sawant
      Hi,
      I'm trying to import some heavy data into Processwire, but I'm not sure what would be the best way to do it.
      Usually I use CSV to Pages plugin, but this time the data is too heavy(~40k rows with 10+ columns of excel sheet), thus this plugin can't help.
      I also tried Tasker plugin but I can't seem to go through the setup itself, it requires some template setup but I'm totally clueless about how to do it, so that plugin is not of any use either.
      I wanted to know from you guys how you do it and in future what would be the best way to migrate thousands of rows of data in to PW.
       
      Thanks.
       
       
    • By Rodd
      Hi everyone!
      I have a website in a production environment and I want to duplicate it in a local environment. I exported the content of the website (with the 'Site Profile Exporter' module) but I cannot use it actually. I've got an issue with the database. I imported this one in MAMP then.

      I also exported the pages (with the 'ProcessPagesExportImport' module), but I cannot import it to my local website because the fields don't exist. So I created this fields, but I have this error :
      How can I use the elements that already exist and are presents in my database? How can I duplicate correctly the templates, fields and pages?
      Thanks by advance
      PS: Sorry if my english is bad
       
    • By hellerdruck
      Hi all
      I need help with something. Situation: We have let's say 2'000 Files (Excel) that should be displayed (list with links) on a page. We'd need to filter these files by given Keywords or a tree structure or both. Now, I'm looking for a solution whereas our customer can synchronise the files from his local computer with the folder on the webserver. They will update and upload files on a daily basis. Therefore, it would need to synchronise rather than load the files manually in pages or repeaters. Maybe indexing would be an idea, too.
      Are there any modules for Processwire that would help achieving this? Could anyone point me in the right direction?
      Thanks in advance.
    • By iNoize
      Hello, need some help for an RealEstate project. It have to use the OnOffice to import the objects. 
      https://apidoc.onoffice.de/
       
×
×
  • Create New...