thetuningspoon Posted October 18, 2017 Share Posted October 18, 2017 We are putting PW through its paces on one of our projects. Currently we have 3.5 million pages and counting. Everything has been scaling well and is running great, except for some of our selectors that use subfields. Our project involves physical units (represented by PW pages) that transmit their status to our application each day, resulting in a new page for each transmission. We are using a page field on the unit to store a reference to its latest transmission. We then need to filter the units by their last transmission date, so we are using the following selector: $pages->find('template=unit, last_transmission.date_field<-1 day, last_transmission.date_field>=-7 day, sort=created, limit=10'); Once we had several thousand unit pages in the system, this selector began to fail. After doing some debugging to see what SQL PW was actually producing, we discovered that the problem was that the selector was generating three separate sql queries. One returned the ids of all of the units that had a last_transmission date greater than a given timestamp, one returned the ids of all the units that had a date less than a given timestamp, and the last took the results of the other two queries and applied any remaining selectors. Since there were several thousand units, the first two queries caused a memory error. We resolved this issue by using the created date instead of a custom date field on the last transmission, so the selector changed to: $pages->find('template=unit, last_transmission.created<-1 day, last_transmission.created>=-7 day, sort=created, limit=10'); For some reason PW was able to combine this into a single SQL query, whereas it was unable to do so with the custom field. Not sure if this is the intended behavior or not, but it looks like this is an area of PW core that might be improved upon. Has anyone else come across this issue and are there any other workarounds I might not be aware of? 2 Link to comment Share on other sites More sharing options...
dragan Posted October 18, 2017 Share Posted October 18, 2017 Have you tried using one of these? https://processwire.com/blog/posts/find-and-iterate-many-pages-at-once/ https://modules.processwire.com/modules/pages-sum/ 1 Link to comment Share on other sites More sharing options...
thetuningspoon Posted October 24, 2017 Author Share Posted October 24, 2017 @dragan Yes. Unfortunately, findMany() has the same issue in that the queries are still separate and the initial queries still have to return all of the matching page IDs. Pages sum doesn't fit this use case, I don't think. Link to comment Share on other sites More sharing options...
dragan Posted October 24, 2017 Share Posted October 24, 2017 Well, I guess Ryan could shed some light on the inner workings of selectors / optimizations. I never had to deal with millions of pages... It would be interesting to replicate the same query with vanilla mysql, and compare the time it takes to fetch results "directly" vs. inside PW. Perhaps some kind of time(stamp) calculations are not optimized (your original first two queries your selector produced). Creating DB indexes normally speed up queries, but I have no clue whether PW is already doing that somehow, or if it is possible to add one on your own. Link to comment Share on other sites More sharing options...
thetuningspoon Posted November 9, 2017 Author Share Posted November 9, 2017 @ryan Could you provide any insight on this? Thanks Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now