Jump to content
Tony Carnell

How to import table of data into ProcessWire

Recommended Posts

Hi,

I'll confess to having only just discovered ProcessWire, and am currently getting my head around the fundamental differences between it and other CMSs/CMFs I've used in the past. I'm very impressed with its potential though.

I have a question however that I hope you guys can help me with.

I'm involved in a project to implement a searchable database of dams from across the world (in fact something akin to the Skyscraper demo except that the dams will be searchable on many more criteria).

We already have a database table containing all 600+ dams and I'm wondering how to import those into ProcessWire in a way that it can handle.

Is it possible simply to add that table to the ProcessWire database and tell it how to interact with it, or will that complicate things further down the line?

I don't fancy having to manually add all the dams to ProcessWire, since quite apart from the time and hassle it would also introduce human error.

All advice is gratefully received :)

Tony.

Share this post


Link to post
Share on other sites

You don't even necessarily need the csv step. You could also create a php file which connects to the existing database and then imports them to pw via the api. 

  • Like 1

Share this post


Link to post
Share on other sites

Hi Tony! Welcome to the forum :)

I think one of the fundamental things to consider with projects like these, is to actually look and analyse the existing data - the structure, the format, the quality, and any relationships there may be. It's almost going through the normalisation process but perhaps not as strict.

It would be a good idea to take every column in the existing database to decide how that will be represented in the ProcessWire site. Probably, some of these will be simple translations - Varchars will become Text fields, and Ints will still be Integer fields. But what will be more important is the data in these fields - does it vary with just about every single dam? Is it a value representing Yes/No? Are there only a few values for that column that all dams can potentially use? If so, do the values all use the same spelling of words, or are some mixed case or have weird formatting or spacing issues? When you have a set list of values to choose from, this is where you can decide how to store the data in ProcessWire - Page fields, PageTables, Options field, etc.

If your source data quality is good, your job will be easier. If it's not, there will be some work involved in preparing it so that it is clean to work with.

You may already have a good idea of how your data looks or is structured, but I usually like to do some form of planning on paper or using a spreadsheet to map out the source data to the ProcessWire templates and fields. Doing the important work now will save you hassle later on down the line and will really help you to make the search feature more functional.

As an example:

In ProcessWire, I imagine there would be a template created called 'dam'. You might add the following fields to this template (just for starters):

  • title
  • description
  • location (FieldtypeMapMarker)
  • height (integer/float)
  • length (integer/float)
  • capacity/volume (integer/float)
  • country (Page field/Options field?)
  • built (date)
  • opened (date)

For each dam, a Page would exist somewhere in the page tree, using the 'dam' template.

As you've already seen the Skyscraper profile, you can probably look at that for more ideas on how to structure things though :)

  • Like 5

Share this post


Link to post
Share on other sites

You don't even necessarily need the csv step. You could also create a php file which connects to the existing database and then imports them to pw via the api.

It would be very interesting to see some php example of this.

Share this post


Link to post
Share on other sites

$myDB = new Database('localhost', 'username', 'password', 'db_name', 3306);

$result = $myDB->query("SELECT * FROM my_table");

while($row = $result->fetch_assoc()) {

$p = new Page();

$p->template = "dam";

$p->parent = "something";

$p->title = $row['title'];

$p->save();

}

  • Like 8

Share this post


Link to post
Share on other sites

@LostKobrakai, thanks for this example. I like that kind of approach. Never thought this could be done

with so little lines of code. Nice to see this pdo/php directly mixed with processwire api.

Edit:

There is more to find about this in the forum like here:

https://processwire.com/talk/topic/2242-interact-with-external-database-mysql/#entry20969

But are $myDB and new Database reserved words in Processwire, and if so where can I find them declared or explained, or can you chose just any other word ? I can't find anything about Database in the api cheatsheet

or with google.

Share this post


Link to post
Share on other sites

I guess $myDB could be anything else.

And that Database() is like Page(), something related to ProcessWire.

But I could be wrong :).

  • Like 1

Share this post


Link to post
Share on other sites

Wow, thanks for all your comments and suggestions guys.

The ProcessWire community sure seems a friendly and helpful lot! :)

I'll read through all the pointers you've given me and take it from there.

You've certainly all given me plenty of food for thought.

Thanks again, it's much appreciated.

Tony.

  • Like 2

Share this post


Link to post
Share on other sites

@pwired.

You can find out about these things, when looking into the wire/core (and wire/modules/) folders. These classes are just things the core itself uses and are not providing an "api" like $page or $pages. About the length of the snippet: Even raw php wouldn't need that much more lines to read from an mysql table.

  • Like 1

Share this post


Link to post
Share on other sites

There is one thing that needs a bit attention when importing data. We need to check if a record was also imported by a previous run of the script, otherwise you may end up with numerous duplicates.

while($row = $result->fetch_assoc()) {
  $title = wire("sanitizer")->text($row['title'], array("maxLength"=>128));  // fetch and sanitize the title
  $p = wire('pages')->get("title={$title}");                                 // check if it already exists
  if(0 < $p->id) continue;                                                   // move on to the next record if it exists
  $p = new Page();
  $p->template = "dam";
  $p->parent = "something";
  $p->title = $title;                                                        // use the sanitized title
  $p->save();
}
  • Like 6

Share this post


Link to post
Share on other sites

Or by some other field/way, if title might not be unique.

For instance, when adding children to previously empty page for the first time, you simply check for $parent->children->count :)

  • Like 1

Share this post


Link to post
Share on other sites

When I'm importing data, I usually add a ProcessWire field called legacy_id. Any import script will set this value to the actual ID from the item in the old dataset. It helps the script easily and accurately find existing imported items, and you can keep this around for a little while after so you can trace it back if you need to.

  • Like 3

Share this post


Link to post
Share on other sites

hi craig,

i'm not sure if i understand you. can you please go a little bit more into detail, thanks :)

Share this post


Link to post
Share on other sites

Craig is saying that he adds a field for each created page that contains the original id in the old database. If something goes wrong, or you need to adjust something in the import, or you need some more info from the old database that you didn't import originally, or you simply need to check if the items were imported correctly, you can check that field against the ID in the old database.

$p->legacy_id = $row['id'];
  • Like 4

Share this post


Link to post
Share on other sites

To select/ provide any data of any table in your database which does not belong to processwire use module FieldtypeSelectExtOption. No need to import or transform data in a processwire format.

  • Like 1

Share this post


Link to post
Share on other sites
On 9/2/2015 at 3:31 PM, horst said:

There is one thing that needs a bit attention when importing data. We need to check if a record was also imported by a previous run of the script, otherwise you may end up with numerous duplicates.


while($row = $result->fetch_assoc()) {
  $title = wire("sanitizer")->text($row['title'], array("maxLength"=>128));  // fetch and sanitize the title
  $p = wire('pages')->get("title={$title}");                                 // check if it already exists
  if(0 < $p->id) continue;                                                   // move on to the next record if it exists
  $p = new Page();
  $p->template = "dam";
  $p->parent = "something";
  $p->title = $title;                                                        // use the sanitized title
  $p->save();
}

how would I do this for multilingual data?

if in the external DB in each row we have some fields for both languages

example:

$row['title_en'] $row['title_fr']

$row['summary_en'] $row['summary_fr']

$row['other_en'] $row['other_fr']

 

 

Share this post


Link to post
Share on other sites
17 minutes ago, bernhard said:

@bernhard   thanks, this should work I guess

while($row = $result->fetch_assoc()) {
  $title_en = wire("sanitizer")->text($row['title_en'], array("maxLength"=>128));
  $title_fr = wire("sanitizer")->text($row['title_fr'], array("maxLength"=>128));
  $p = wire('pages')->get("title={$title_en}");                                 
  if(0 < $p->id) continue;                                                   
  $p = new Page();
  $p->title->setLanguageValue($languages->get('en'), $title_en);
  $p->title->setLanguageValue($languages->get('fr'), $title_fr);                                                       
  $p->save();
}

 

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By maba
      Hello,
      I need to import regularly - every 15 or 30 days - a big .xslx file into my PW installation.
      This file now has 14 columns, 5.000 rows and grows every month.
      I'll need to group, order and work with these data to:
      analyse User monthly costs analyse User costs per Asset ... User (real AD account) has to match with a PW user - I can't join to the domain - but as you can see I have some services users (start with sca_*) or no user at all. Those rows have to be assigned to a specific user, e.g. account100.
      And:
      I would like to be able to have a kind of diff function to compare User assets between this and last month (and so on) other request is to have a notification when something change for a User between actual and latest import First request: which is the best solution to store those data in your opinion? Page, Table, Repeater Matrix, ...?
      Those are very repetitive data and I think a page reference is better than to import all the data every time but I have to understand how to manage those "dynamic" groups of software (AccType Det), hardware (Asset), ... For example Price will be imported and not stored with the description because it could be change in the future and I'll not have any control on it.
      Thanks!
      User,OE,productNmr,AccType1,AccType Det,Count,Price (€),Sum,ASNA,CC,AccType Info,Asset,AccGroup,,,,,,,,,,,,,
    • By dragan
      Is it by design that a site/ready.php is not included when creating a new site profile? Is it possible to include it with a hook? Or are there any security thoughts? (I don't want to redistribute it in public, it's just so I have my own boilerplate)
    • By karian
      I don't know why multiple instances (repeater_repeat_columns1, repeater_repeat_columns2, ...) of my repeater field are displayed inside Template field (see image).
      Is there a way to clean/reset it ?
       

    • By psy
      I'm combining two PW sites into one, Site A into Site B.
      At each step, I did it bit by bit as the 'all at once' approach failed.
       
      First, I exported all the fields from Site A and imported into Site B. Any field types not supported by import/export, eg FieldtypeOptions I manually recreated. All good.
      Next I exported all the templates from Site A and imported them into Site B and copied across their associated template files. All good.
      Finally I exported the pages I needed from Site A into Site B - again, bit by bit to ensure it all went smoothly.
      From the admin side, it all looked and worked perfectly.
      Front end was a totally different story. All existing pages in Site B worked as expected. NONE of the pages imported from Site A displayed. They all ended in a redirect loop with no errors in the PW logs or Tracy Debugger.
      After some trial-and-error, I finally got it working with:
      - create a new template in Site B admin with no associated template file and just a title field
      - import the fields from the imported Site A template into the newly created template (both on Site B)
      - copy the Site A php template file into a new file that matched the new PW Site B template name and save in Site B site/templates
      I can deal with the above workaround. Just curious to know if I did something wrong or if the template import/export feature is problematic?
       
      ### Solution:
      While the export/import was a slow process, turned out the front end redirecting issue was unrelated. For reasons unknown, all templates marked as HTTPS only were the ones redirecting, ie all templates from Site A. Finally solved it by changing the $config->https to true in site/config.php
      Now the pages display correctly as https whether the template forces the issue or not.
       
    • By rareyush
      i am receiving and error whenever I try to run my processwire on localhost,

       
       
      sql code
       
      -- -- Table structure for table `field_fieldset_meta_end` -- CREATE TABLE `field_fieldset_meta_end` ( `pages_id` int(10) UNSIGNED NOT NULL, `data` int(11) NOT NULL ) ENGINE=MyISAM DEFAULT CHARSET=utf8; -- -------------------------------------------------------- -- -- Table structure for table `field_fieldset_meta_END` -- CREATE TABLE `field_fieldset_meta_END` ( `pages_id` int(10) UNSIGNED NOT NULL, `data` int(11) NOT NULL ) ENGINE=MyISAM DEFAULT CHARSET=utf8; anyone ?
      whenever I make a new database and upload it there, database get imported without errors.
×
×
  • Create New...