Jump to content

help with dbase import/synchronisation/direction


lickny2001
 Share

Recommended Posts

dear processwire!

i’ve been on this channel for a while, this is my first reach out for help. i have created simple pw driven sites from the ground up, now i have taken on a challenge that involves an existing (wine) dbase … and it freaks me out!

here goes:
by design, wines are presented like so: 
country
- region
- - supplier
- - - wine | color | year | info | etc

we have +/-: 10 countries, 60 regions, 150 suppliers, 600 wines

our data is connected to a system the company also uses for stock, pricing, invoicing, etc. for now i have managed to import the dbase via CSVImport. to get the page structure that a want & understand (above) i first created - via CSV import - the country pages, then the region pages, supplier, and wine … i manually! (batcher module) changed the parent pages of each group of pages. now that this is done, i want to learn how to do it smarter. this is also necessary as we will need automatic updates when the source data changes. wines may come and go, prices change, etc.

the CSV file i have lists all the wine data per row (country, region, supplier, etc), i can also get a json file, or json api calls directly to the data.

i would appreciate help / pointers on a number of questions:
1. how to automate data synchronisation between system source & web data? or should we just make calls to the source? how does that impact performance, SEO? frankly if we take this route, I’m not sure i can build it, but i would like to …

2. i tried importing all wines (CSV import) with country, region, supplier as page fields related to (hidden) pages in a separate branch. this seems to work fine, however:
a. I’m not comfortable with pages selecting based on 3 dynamic page fields
b. one big branch with 600 wine records makes individual items difficult to manage

3. so the way i see it, i would like to automate synchronisation of data (cron or manual when data has updated). then i would need import script(s) to create,edit,delete pages based on the CSV or json data? how to do this?

your thoughts, help are very much appreciated. i understand that I’m asking many questions in one and believe me, i have read most of the posts / topics on each individual question on the forum. regarding my last point 3. if theres any one out there who wants to take on this task, perhaps we can negotiate a price for your help/work, but i would also like to learn myself.

thanks again!

Link to comment
Share on other sites

Welcome to the forums.

As first questions go, that's a doozy. And I'm not even going to try for a definitive answer, but let's just think out loud a bit.

2 hours ago, lickny2001 said:

json api calls directly to the data

I like the sound of this. Once you've got your initial structure could you maybe query individual products on page load? (You probably wouldn't want to do this on every page load, just after a certain time since the last call, kind of like a cache.) Then return the info or an 'out of stock' or 'discontinued' message. That saves deleting pages and preserves any link juice that individual products might have garnered over time. You could also set up some kind of alternative product suggestion for those pages ('We're sorry, but X is no longer available, what about Y?') - good for internal linking and keeping visitors on-site, then after a period (6 months or a year maybe) turning that suggestion into a 301.

Is it possible to ask the host system just for new or updated records? Either way, a cron or lazy cron is probably going to be involved. The good news is that iterating over all 600 or so records isn't going to take forever, even if that is the only way forward.

2 hours ago, lickny2001 said:

I’m not sure i can build it

Break it down into bite sized chunks. Solve one small aspect at a time.

Not a complete answer, I know, but I'm sure others will chime in.

  • Like 1
Link to comment
Share on other sites

That makes life a bit easier as it doesn't necessarily need critical business logic as validation, calculation, payments or whatever.

Data structure

My personal preference would be a data structure that consists of branches for countries, regions, years and other repetitive data that will later be referenced in each wine. And a branch with all wines.

Combined with modules like PageFieldPairs or Connect Page Fields you could easily sync those datasets and references from one page to another.

But every other structure might work here as well. It's a personal preference thing here.

Update: You should use a more unique ID for each wine than the PW page ID. The SKU or EAN or whatever the data source can deliver. This can make updates easier in the future.

Data sync

This depends.

  • How often new wines will arrive?
  • How often a wine needs to be removed?
  • How often details will change? What will change?
  • Do details need attention (for example: price formatting)?

A manual CSV update might work well, but JSON makes things easier sometimes.

Maybe I can find something about structure and sync in my files and projects that I can post here.

 

Link to comment
Share on other sites

try this https://modules.processwire.com/modules/process-data-import/ ..although it's old and still in alpha version, I use it this week in PW 3.0.84 importing products from amazon feeds and it works perfect. Very simple to use, it has a built-in cron job, accepts xml, json and csv and allows easy mappings between the source data and the template fields.. or maybe you can fork the module and customize it for you.. check

Link to comment
Share on other sites

thank you all, 

we decided to go for an automated page creation through a json import script. a dear friend helped me with this. perhaps some one in the future stumbles across this topic and finds this helpful, below the import script

  <?php

  $TEMPLATE_COUNTRY = "country";
  $TEMPLATE_REGION = "region";
  $TEMPLATE_SUPPLIER = "supplier";
  $TEMPLATE_WINE = "wine";

  $rootid = wire('pages')->get('title=wines')->id;

    // All Wines:
  $file = "./lijst.json";
  $raw = file_get_contents($file);

  // clean: replaces newlines in strings with \n and some other
  $raw2 = preg_replace('/\n(\w*)([^ \t\]\[{}"])/', '\\n$1$2', $raw);

  $data = json_decode($raw2, TRUE);

  $wines = $data['data'];

  // trash country pages + children before import
  $pages = wire('pages')->find('template=' . $TEMPLATE_COUNTRY);
  foreach($pages as $p) {
    $p->trash();
  }


  foreach($wines as $wine) {

      // do the country pages
      $country = $wine['country'];


      // create new page is not exists
      $country_pages = wire('pages')->find('title="' . $country . '"');
      if ($country_pages->count == 0) {

        $country_page = new Page();
        $country_page->template = $TEMPLATE_COUNTRY;

        $country_page->title = $country;
        $country_page->parent = $rootid;
        $country_page->country_text = $wine["country_text"];
        $country_page->save();
      }

      // update exsiting
      else {
        $country_page = $country_pages[0];
      }

      // do region pages
      $region = $wine['region'];
      if (!$region) { $region = "Other"; }
      $region_pages = wire('pages')->find('title="' . $region . '"');

      if ($region_pages->count == 0) {
        $region_page = new Page();
        $region_page->template = $TEMPLATE_REGION;
        $region_page->parent = $country_page->id;
        $region_page->title = $region;
        $region_page->region_text = $wine['region_text'];
        $region_page->save();

      }
      else {
        $region_page = $region_pages[0];
      }

      // do supplier pages
      $supplier = $wine['supplier'];

      // outlyer: wine without region ... should fix in source json
      if (!$supplier) { $supplier = "Other"; }
      $supplier_pages = wire('pages')->find('title="' . $supplier.'", parent='.$region_page->id);

      if ($supplier_pages->count == 0) {
        $supplier_page = new Page();
        $supplier_page->template = $TEMPLATE_SUPPLIER;
        $supplier_page->parent = $region_page->id;
        $supplier_page->title = $supplier;
        $supplier_page->supplier_text = $wine['supplier_description'];
        $supplier_page->save();

      }
      else {
        $supplier_page = $supplier_pages[0];
      }

      // do wine pages
      $wine_page = new Page();
      $wine_page->template = $TEMPLATE_WINE;
      $wine_page->parent = $supplier_page->id;
      $wine_page->title = $wine['wine'];
      $wine_page->wine_text = $wine['wijnomschrijving'];

      // same wine ... multiple years
      $year = $wine['year'];
      if (!$year) { $year = ''; }
      $wine_page->year = $year;

      // add 00s after comma if not there
      $price = preg_replace('/\.[0-9]$/', '${0}0', ''.$wine['price']);
      $price = preg_replace('/^[0-9]+$/', '${0}.00', ''.$price);
      $wine_page->prijs = $price;
      $wine_page->save();

  }
?>

 

curious to hear if any one wants to improve/comment on this

thanks again!

  • Like 4
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...