ryan Posted May 20, 2016 Share Posted May 20, 2016 ProcessWire 3.0.19 lets you work with thousands of pages at once, with a new $pages->findMany() API method! https://processwire.com/blog/posts/find-and-iterate-many-pages-at-once/ 19 Link to comment Share on other sites More sharing options...
adrian Posted May 20, 2016 Share Posted May 20, 2016 That makes such a huge difference - I have a query (that I was obviously caching) which was taking around 4-5 minutes, that now happens in 45seconds! Obviously I still need to optimize some other components, but this is a huge improvement - thanks to everyone involved in this! 10 Link to comment Share on other sites More sharing options...
apeisa Posted May 20, 2016 Share Posted May 20, 2016 Ryan, big thanks for putting this into core! This is all magic by sforsman. For us use cases are things like looping and sending newsletters, big exports etc. Really a big difference in performance, memory usage and code simplicity when working with more than 10 000 pages. 15 Link to comment Share on other sites More sharing options...
pwired Posted May 20, 2016 Share Posted May 20, 2016 Pageboggling ! Processwire seems to have BIG friends. Link to comment Share on other sites More sharing options...
joer80 Posted June 1, 2016 Share Posted June 1, 2016 Nice! When I have been working on thousands of pages, I have been using cron jobs to pull a limited group at a time, and adding a status field or a last run timestamp field that I can update to know if its run or not yet on that page. Basically breaking the job into many small batches. This may be another way to go about this for me. Link to comment Share on other sites More sharing options...
sforsman Posted June 2, 2016 Share Posted June 2, 2016 Excellent write-up Ryan! Here's a quick tip. If you are processing large amounts of data, you should always use PW's field joining options to achieve maximum performance. Even if you only use a single field, you should get around 50% faster execution when the field is joined straight away. With two fields joined, the execution time will be cut into half (i.e. 100% increase in speed). Let's say you need to export the e-mail addresses of thousands of customers. Here's a simplified example using "on-demand" join // Prepare a pointer for writing the output $fp = fopen('customers.csv', 'w') or die('Failed opening the file for writing'); // Ask PW to join two fields (the regular find() also supports this). $options = ['loadOptions' => ['joinFields' => ['customer_email', 'customer_number']]]; // Searching for imaginary customers $selector = 'template=customer'; // Find the pages using the new method foreach($pages->findMany($selector, $options) as $page) { // Write the data in CSV format fputcsv($fp, [$page->customer_number, $page->customer_email]); } // Free the pointer fclose($fp); As a reminder, you can also force some fields to be always joined (through PW admin). @Joer: That is pretty much what the implementation of findMany() does behind the scenes. However splitting the job into smaller batches still makes sense if you use multiple workers to process the data (e.g. for MapReduce or whatever) or if the execution time of your script is limited. 16 Link to comment Share on other sites More sharing options...
LostKobrakai Posted June 2, 2016 Share Posted June 2, 2016 Thanks for the tip with the on demand field join. I'm always a bit hesitant to use autojoin, but I'll certainly give this a try on all my reporting / listings pages. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now