Subfield selectors failing with large data sets

thetuningspoon · October 18, 2017

We are putting PW through its paces on one of our projects. Currently we have 3.5 million pages and counting. Everything has been scaling well and is running great, except for some of our selectors that use subfields.

Our project involves physical units (represented by PW pages) that transmit their status to our application each day, resulting in a new page for each transmission.

We are using a page field on the unit to store a reference to its latest transmission. We then need to filter the units by their last transmission date, so we are using the following selector:

$pages->find('template=unit, last_transmission.date_field<-1 day, last_transmission.date_field>=-7 day, sort=created, limit=10');

Once we had several thousand unit pages in the system, this selector began to fail.

After doing some debugging to see what SQL PW was actually producing, we discovered that the problem was that the selector was generating three separate sql queries. One returned the ids of all of the units that had a last_transmission date greater than a given timestamp, one returned the ids of all the units that had a date less than a given timestamp, and the last took the results of the other two queries and applied any remaining selectors. Since there were several thousand units, the first two queries caused a memory error.

We resolved this issue by using the created date instead of a custom date field on the last transmission, so the selector changed to:

$pages->find('template=unit, last_transmission.created<-1 day, last_transmission.created>=-7 day, sort=created, limit=10');

For some reason PW was able to combine this into a single SQL query, whereas it was unable to do so with the custom field.

Not sure if this is the intended behavior or not, but it looks like this is an area of PW core that might be improved upon. Has anyone else come across this issue and are there any other workarounds I might not be aware of?

dragan · October 18, 2017

Have you tried using one of these?

https://processwire.com/blog/posts/find-and-iterate-many-pages-at-once/

https://modules.processwire.com/modules/pages-sum/

thetuningspoon · October 24, 2017

@dragan Yes. Unfortunately, findMany() has the same issue in that the queries are still separate and the initial queries still have to return all of the matching page IDs. Pages sum doesn't fit this use case, I don't think.

dragan · October 24, 2017

Well, I guess Ryan could shed some light on the inner workings of selectors / optimizations. I never had to deal with millions of pages...

It would be interesting to replicate the same query with vanilla mysql, and compare the time it takes to fetch results "directly" vs. inside PW.

Perhaps some kind of time(stamp) calculations are not optimized (your original first two queries your selector produced).

Creating DB indexes normally speed up queries, but I have no clue whether PW is already doing that somehow, or if it is possible to add one on your own.

thetuningspoon · November 9, 2017

@ryan Could you provide any insight on this? Thanks

Sign In

Subfield selectors failing with large data sets

Recommended Posts

thetuningspoon

Link to comment

Share on other sites

dragan

Link to comment

Share on other sites

thetuningspoon

Link to comment

Share on other sites

dragan

Link to comment

Share on other sites

thetuningspoon

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Browse

Activity

My Activity Streams

Support

Store

My Details