Jump to content

Subfield selectors failing with large data sets


thetuningspoon
 Share

Recommended Posts

We are putting PW through its paces on one of our projects. Currently we have 3.5 million pages and counting. Everything has been scaling well and is running great, except for some of our selectors that use subfields.

Our project involves physical units (represented by PW pages) that transmit their status to our application each day, resulting in a new page for each transmission.

We are using a page field on the unit to store a reference to its latest transmission. We then need to filter the units by their last transmission date, so we are using the following selector:

$pages->find('template=unit, last_transmission.date_field<-1 day, last_transmission.date_field>=-7 day, sort=created, limit=10');

Once we had several thousand unit pages in the system, this selector began to fail.

After doing some debugging to see what SQL PW was actually producing, we discovered that the problem was that the selector was generating three separate sql queries. One returned the ids of all of the units that had a last_transmission date greater than a given timestamp, one returned the ids of all the units that had a date less than a given timestamp, and the last took the results of the other two queries and applied any remaining selectors. Since there were several thousand units, the first two queries caused a memory error.

We resolved this issue by using the created date instead of a custom date field on the last transmission, so the selector changed to:

$pages->find('template=unit, last_transmission.created<-1 day, last_transmission.created>=-7 day, sort=created, limit=10');

For some reason PW was able to combine this into a single SQL query, whereas it was unable to do so with the custom field.

Not sure if this is the intended behavior or not, but it looks like this is an area of PW core that might be improved upon. Has anyone else come across this issue and are there any other workarounds I might not be aware of?

  • Like 2
Link to comment
Share on other sites

Well, I guess Ryan could shed some light on the inner workings of selectors / optimizations. I never had to deal with millions of pages...

It would be interesting to replicate the same query with vanilla mysql, and compare the time it takes to fetch results "directly" vs. inside PW.

Perhaps some kind of time(stamp) calculations are not optimized (your original first two queries your selector produced).

Creating DB indexes normally speed up queries, but I have no clue whether PW is already doing that somehow, or if it is possible to add one on your own.

Link to comment
Share on other sites

  • 3 weeks later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...