Jump to content

Automating a lenghty PHP process


Guy Verville
 Share

Recommended Posts

Hi,

I am building a site containing around 50000 products which will be updated on a regular basis. We are far from putting this site into production. I am actually devising the future functionalities. 

The actual update script uses an AJAX process so to avoid timeout. However, the update process will have to be fired automatically, something with either curl or phantom.js.  

What I have coded at the moment isn't probably not the best solution, albeit it works pretty well. There is a PW template called from an address : http://example.com/start-ajax which starts on load a simple jQuery Ajax process. The 50000 products are listed in a CSV file that is previoulsy broken down into small pieces (200 lines per file). The process is smooth and takes 40 minutes to read the data and update/or create Processwire pages accordingly.

I would like to know your advices, your experience in that matter. What would you do to create an automated batch file?

Link to comment
Share on other sites

Hello Guy,

I'd be tempted to look at running your script directly from a cron task, rather than using a trigger URL. Is there a particular reason you couldn't take the cron route?

Also, with that many products, it might be beneficial to pre-process the data so you only have PW process lines from the CSV file that represent changes from the previous import - ie, products that have been added, removed or updated. I suspect many of your 50000 line file will be the same between runs of your update script, but I could be wrong as your situation may be different and your data might be changing more rapidly than I've previously experienced (for example for very fluid stock levels.)

  • Like 2
Link to comment
Share on other sites

Hi,

The data changes daily (mainly prices, but also additions or subtractions of products).

A cron route would mean to run a lengthy PHP script. To preprocess the data, I would have to read from PW to get the information. I don’t have a created/modified date in the CSV file, which is a build of many sources, so I have to read everything, hence compare with a bunch of fields in the PW pages.

Making a quick dump of the 50000 products and preprocessing would be an option, but PW is not that fast (i tried to get a $pages->find of every product… ). Since we test Elastic Search, it would perhaps be a way to get it done.

The actual script is based on the Import CSV module from Ryan. It does essentially the same thing with some extras: read a line, compare with a page, if that page exists, check if such and such data is the same, if so, continue to the next line. If a new sku is present (it’s not at the end of the CSV file…), create the page, etc.

Going the cron route would require a stepping mechanism… I know already that 6-7 entries are read per second… The phantomjs way is elegant because I can test on a browser anytime without porting anything.

 

Link to comment
Share on other sites

I just made a test with Phamton.js. It works like a charm. The script below is incomplete, because you have to stop the server once the script is done, but you can get an idea. My local script resides at http://example.local/script-ajax, the same script I can fire from a browser. To make it run from a server, you create another little script below:

"use strict";
var page = require('webpage').create(),
    system = require('system');

if (system.args.length < 2) {
    console.log('Usage: loadurltest.js URL');
    phantom.exit();
}

var address = system.args[1];

page.open(address, function(status) {
    if (status === 'success') {
        console.log('The script goes here');
    } else {
        console.log('Unable to load the address!');
        phantom.exit();
    }
});

Where you see  console.log('The script goes here'); is where you should wait for the script to finish (because you must exit phantom).

So, the crontab just call this: phantomjs yourscript.js http://example.com/script-ajax

 

Link to comment
Share on other sites

You may well be right, Guy. You know your application better than anyone else.

If you design your cron script to process a small part of the larger picture on each invocation, there's no reason that an invocation has to process the entire input set in a single run.

Also, as you already have an exact record of the previous input set in the form of the previously imported CSV file, it should be fairly easy to compare the newly generated file you want to import into PW with the previous one in order to quickly generate your addition, deletion and change lines. This approach does work if the external data set is the master copy and is in sync with any sales that are made through the PW site. If stock levels are independently tracked in PW and/or take some time to back propagate the external data source, then this pre-processing approach can't be used for updates, just for additions and deletions. Also, if a large percentage of the data set is different between imports, then it may not gain you anything to pre-process.

Best wishes

  • Like 1
Link to comment
Share on other sites

The previous CSV file could indeed the base of the comparison. The site will not be a store and the stock inventory will be accessible on demand. We will have six months to see the extent of daily changes.

There will be site administrators in charge of completing the missing information of each product (the external database contains only prices, and basic information, SKU, numbers of items in a palette, etc.)

Thank you for your input!

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...