Jump to content

$pages->find("selector") response time


dfunk006
 Share

Recommended Posts

Hi,

I've recently moved from Modx Revolution to Processwire and so far my experience with the platform has been great! The jqueryUI admin is much faster as compared to Modx's extJS manager.

kongondo's article for those transitioning from Modx has been very helpful. Link - http://processwire.com/talk/topic/3691-tutorial-a-quick-guide-to-processwire-for-those-transitioning-from-modx/

One of the things that really bugged me about Modx was the slowness of getResources. Even after optimising and caching the getResources call, it would still produce slow response times if the number of resources were high. Now, I understand that the equivalent of getResources in Processwire is $pages->find("selector").

My question is: Has anyone had any experience with running $pages->find("selector") api call on over 10,000 pages? What are the response times like?

Thanks

Link to comment
Share on other sites

Do you want to find something in a set of 10.000 pages or do you mean a find result of 10.000 items? Searching in 10.000 pages isn't a problem, PW is really fast. But if there are a lot of possible results it's generally advised to limit the pages->find call to 25, 50 or another sane value. http://processwire.com/api/selectors/#limit . If you want you can combine this with pagination.

The reason for this (others can prolly explain better) is that the pages->find call will return a PageArray that gets put in memory and if there are a lot of pages in the array things will inevitably slow down. Of course, depending on the situation, 'slow' in PW terms may still be acceptable. I have grabbed hundreds or even thousands of pages in one go and it still performed pretty well.

  • Like 4
Link to comment
Share on other sites

<qoute>experience with running $pages->find("selector") api call on over 10,000 pages?</quote>

I Don't think you will have any noticeable slowness with that amount of pages to be searched. You can't compare getResources with the ProcessWire way of finding pages. So far I understand Modx loads all the fields with getResources & the custom fields if needed in your results. ProcessWire wil only load the fields when you access them, so there's no field load with a page find. 

Result, searching 100.000 pages is not a problem. Searching 10.000 pages, just a blink of an eye.

  • Like 4
Link to comment
Share on other sites

Thank you for all your inputs. Your help is much appreciated!

So from what I understand, searching is not an issue, but if there are lots of results then it might cause a slow down since there will be lots of pages in the memory. In a scenario like this, increasing memory on the server should ideally fix this right?

@Martijn Geerts, in Modx there is an option to turn off the loading of fields by setting ProcessTVs as 0 in the getResources call. This makes it a bit faster, but overall it's still slow!

@kongondo, thanks for the tip, and for the awesome tutorial!

Link to comment
Share on other sites

@dfunk006: working with a ton of pages is going to be slow, it's simple as that. Sure, you can always add more muscle to your server, but honestly, how often do you really need to show something like 10 000+ results simultaneously? How is that ever going to be useful for end-users? :)

Adding sensible limit and using pagination or infinite scroll etc. makes sense not only resource-wise, but also from usability point of view.

To expand Martijns reply a bit: PW also loads fields defined as "autoload" (via field settings) automatically, so you'll want to be careful with that if you're expecting to handle huge numbers of pages. Unless, of course, you actually always need those fields :)

  • Like 3
Link to comment
Share on other sites

@teppo, you're absolutely right - in most cases there won't be any need for it. :rolleyes: However, if I want to create an auto-complete search engine in which a user can search for any page on the site by its title, I would have to pass a JSON array of all the pages as its source, for which i'll need to get all pages in the site along with their titles. This might slow things down - unless there is a better way of doing this.

Also, I wasn't able to find the "autoload" setting. Do you mean "autojoin" by any chance?

One more thing, apart from adding more muscle to the server, adding sensible limits and using pagination, are there any other optimizations that can be achieved when using $page->find("selector")?

Link to comment
Share on other sites

However, if I want to create an auto-complete search engine in which a user can search for any page on the site by its title, I would have to pass a JSON array of all the pages as its source, for which i'll need to get all pages in the site along with their titles. This might slow things down - unless there is a better way of doing this.

Usually you wouldn't do this real-time.

I'd suggest cron job that periodically updates the source JSON. ProcessWire even provides nifty "lazy cron" module you could use, but for tasks like this I prefer proper cron job -- that way there's not even that (rare) slowdown for end-users and you don't have to worry at all about that process getting interrupted halfway.

I've recently been working on converting some pretty large and active sites to PW and that's exactly what I did there. In my case JSON file is generated once a day, but of course you could build it much more often. Depends on how often your data changes etc.

Also, I wasn't able to find the "autoload" setting. Do you mean "autojoin" by any chance?

That sounds just about right :)

One more thing, apart from adding more muscle to the server, adding sensible limits and using pagination, are there any other optimizations that can be achieved when using $page->find("selector")?

There are certain things that are slower than others, but the PW selector engine is pretty well optimized already.

What you can and should do is mostly just about keeping it simple -- fewer fields in a selector string is usually faster (take a look at Fieldtype Cache by the way), searching with $page->children("selector") is faster than $page->find("selector") but only finds direct children, comparisons using "=" should be faster than "*=" (which should be faster than "%=") etc.

I'm pretty sure you could find quite a few posts about keeping queries fast around here. I'm definitely not the most qualified person here to comment on this. As a general tip forget the native forum search function, it's not very helpful -- do a Google search with site:processwire.com/talk and you'll get much better results..  :)

  • Like 3
Link to comment
Share on other sites

There is no need for autocomplete search to return all the pages... only 10 or 100 most relevant. PW admin has autocomplete search for all pages and also templates, fields and users (well, they are pages too).

Of course there are cases where all pages, or at least more than can be loaded on memory, need to be loaded. Then it's again cron job (or probably paginated results and memory flush, but I haven't ever had need for that).

  • Like 1
Link to comment
Share on other sites

I'd suggest cron job that periodically updates the source JSON. ProcessWire even provides nifty "lazy cron" module you could use, but for tasks like this I prefer proper cron job -- that way there's not even that (rare) slowdown for end-users and you don't have to worry at all about that process getting interrupted halfway.

Thanks for the suggestion @teppo - i'll definitely try that out! However, instead of a cron job, I was thinking of updating the source JSON every time a page is updated (created/modified/deleted) using hooks. That should pretty much give me the same results, right?

Link to comment
Share on other sites

That sounds good and you'll definitely have more up-to-date results with that approach.

I used cron mostly because sites in question were really large, content is constantly being updated and generating the JSON file could take (relatively speaking) a long time. I didn't want each page save to take that long.

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...