Jump to content
ryan

PW 3.0.19: Something special from our friends at Avoine

Recommended Posts

That makes such a huge difference - I have a query (that I was obviously caching) which was taking around 4-5 minutes, that now happens in 45seconds!

Obviously I still need to optimize some other components, but this is a huge improvement - thanks to everyone involved in this!

  • Like 10

Share this post


Link to post
Share on other sites

Ryan, big thanks for putting this into core! This is all magic by sforsman. For us use cases are things like looping and sending newsletters, big exports etc. Really a big difference in performance, memory usage and code simplicity when working with more than 10 000 pages.

  • Like 15

Share this post


Link to post
Share on other sites

Pageboggling ! Processwire seems to have BIG friends.

Share this post


Link to post
Share on other sites

Nice!

When I have been working on thousands of pages, I have been using cron jobs to pull a limited group at a time, and adding a status field or a last run timestamp field that I can update to know if its run or not yet on that page.  Basically breaking the job into many small batches.  This may be another way to go about this for me.

Share this post


Link to post
Share on other sites

Excellent write-up Ryan!

Here's a quick tip. If you are processing large amounts of data, you should always use PW's field joining options to achieve maximum performance. Even if you only use a single field, you should get around 50% faster execution when the field is joined straight away. With two fields joined, the execution time will be cut into half (i.e. 100% increase in speed).

 
Let's say you need to export the e-mail addresses of thousands of customers. Here's a simplified example using "on-demand" join
 
// Prepare a pointer for writing the output
$fp = fopen('customers.csv', 'w') or die('Failed opening the file for writing');

// Ask PW to join two fields (the regular find() also supports this).
$options = ['loadOptions' => ['joinFields' => ['customer_email', 'customer_number']]];

// Searching for imaginary customers
$selector = 'template=customer';

// Find the pages using the new method
foreach($pages->findMany($selector, $options) as $page) {
  // Write the data in CSV format
  fputcsv($fp, [$page->customer_number, $page->customer_email]);
}

// Free the pointer
fclose($fp);

As a reminder, you can also force some fields to be always joined (through PW admin).

@Joer: That is pretty much what the implementation of findMany() does behind the scenes. However splitting the job into smaller batches still makes sense if you use multiple workers to process the data (e.g. for MapReduce or whatever) or if the execution time of your script is limited.
  • Like 16

Share this post


Link to post
Share on other sites

Thanks for the tip with the on demand field join. I'm always a bit hesitant to use autojoin, but I'll certainly give this a try on all my reporting / listings pages.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...