PW 3.0.19: Something special from our friends at Avoine

ryan · May 20, 2016

ProcessWire 3.0.19 lets you work with thousands of pages at once, with a new $pages->findMany() API method!

https://processwire.com/blog/posts/find-and-iterate-many-pages-at-once/

adrian · May 20, 2016

That makes such a huge difference - I have a query (that I was obviously caching) which was taking around 4-5 minutes, that now happens in 45seconds!

Obviously I still need to optimize some other components, but this is a huge improvement - thanks to everyone involved in this!

apeisa · May 20, 2016

Ryan, big thanks for putting this into core! This is all magic by sforsman. For us use cases are things like looping and sending newsletters, big exports etc. Really a big difference in performance, memory usage and code simplicity when working with more than 10 000 pages.

pwired · May 20, 2016

Pageboggling ! Processwire seems to have BIG friends.

joer80 · June 1, 2016

Nice!

When I have been working on thousands of pages, I have been using cron jobs to pull a limited group at a time, and adding a status field or a last run timestamp field that I can update to know if its run or not yet on that page. Basically breaking the job into many small batches. This may be another way to go about this for me.

sforsman · June 2, 2016

Excellent write-up Ryan!

Here's a quick tip. If you are processing large amounts of data, you should always use PW's field joining options to achieve maximum performance. Even if you only use a single field, you should get around 50% faster execution when the field is joined straight away. With two fields joined, the execution time will be cut into half (i.e. 100% increase in speed).

Let's say you need to export the e-mail addresses of thousands of customers. Here's a simplified example using "on-demand" join

// Prepare a pointer for writing the output
$fp = fopen('customers.csv', 'w') or die('Failed opening the file for writing');

// Ask PW to join two fields (the regular find() also supports this).
$options = ['loadOptions' => ['joinFields' => ['customer_email', 'customer_number']]];

// Searching for imaginary customers
$selector = 'template=customer';

// Find the pages using the new method
foreach($pages->findMany($selector, $options) as $page) {
  // Write the data in CSV format
  fputcsv($fp, [$page->customer_number, $page->customer_email]);
}

// Free the pointer
fclose($fp);

As a reminder, you can also force some fields to be always joined (through PW admin).

@Joer: That is pretty much what the implementation of findMany() does behind the scenes. However splitting the job into smaller batches still makes sense if you use multiple workers to process the data (e.g. for MapReduce or whatever) or if the execution time of your script is limited.

LostKobrakai · June 2, 2016

Thanks for the tip with the on demand field join. I'm always a bit hesitant to use autojoin, but I'll certainly give this a try on all my reporting / listings pages.

Sign In

PW 3.0.19: Something special from our friends at Avoine

Recommended Posts

ryan

adrian

apeisa

pwired

joer80

sforsman

LostKobrakai

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Browse

Activity

My Activity Streams

Support

Store

My Details