Jump to content
elabx

Paginating 5 million pages.

Recommended Posts

Hi everyone! I'm with the task of paginating 5 million pages and things are getting a bit slow here lol

I found out that you can disable counting within the selector and that bring huge performance benefits, basically making the queries instantly.

I am just wondering I'd still like to have pagination, specially because the sets of data won't really change often so I only need to count once in a while and the find() call seems to hit the database and counting on every request. 

Maybe I'm missing something and there is a way for the count value to stay cached? Would you recommend me to hack a bit into the pagination module?

Share this post


Link to post
Share on other sites

I guess you could disable counting in your selector and get the count of matching pages separately via $pages->count() on the first page only. Then pass the count in the query string (or store it in $session) and use PaginatedArray::setTotal() to set the total count to the PageArray on each pagination. And if necessary you can fake the pagination entirely as shown by Ryan here:

 

  • Like 1

Share this post


Link to post
Share on other sites

With 5 million pages do people actually click through who knows how many pages you get? Generally offset/limit pagination scales quite badly because it always needs to count the offset part as well. So the higher the page number (offset) the slower the query will get. For big datasets cursor based pagination is usually adviced, but on processwire you‘d need custom sql for that. Also it will no longer give you pagination in terms of „you‘re on page 6043 of 20383“. You can only do next/prev. But from an UX pov page numbers that big aren‘t useful in the first place. Having means of filtering down to a more manageable result set is what I would rather strive for.

  • Like 6

Share this post


Link to post
Share on other sites
7 hours ago, LostKobrakai said:

With 5 million pages do people actually click through who knows how many pages you get? Generally offset/limit pagination scales quite badly because it always needs to count the offset part as well. So the higher the page number (offset) the slower the query will get. For big datasets cursor based pagination is usually adviced, but on processwire you‘d need custom sql for that. Also it will no longer give you pagination in terms of „you‘re on page 6043 of 20383“. You can only do next/prev. But from an UX pov page numbers that big aren‘t useful in the first place. Having means of filtering down to a more manageable result set is what I would rather strive for.

Agree an all your points! Thanks for the feedback!

Share this post


Link to post
Share on other sites

Hey @elabx, what solution did you come up with in the end?

I also am paginating (when no filters have been selected) 1,2 Million pages, and the response time is about 15 seconds - which is not usable. 

Let me know if you found a good solution - thank you!!

Share this post


Link to post
Share on other sites

I'm basically removing pagination cause I feel @LostKobrakai's comments on UX make all the sense and will rather prioritize search which is actually fast enough with simple selectors. For pagination replacement I will just go with a "Next" button. 

How is search speed looking for you? (if you are doing it)?

Share this post


Link to post
Share on other sites

@elabx thanks for the fast response!

Search is fast - as long as one picks at least one filter/selector. Displaying all 1,2 mio results takes about 14-16 seconds, even with the limit and pagination. I guess its because of the count?

Share this post


Link to post
Share on other sites
5 minutes ago, Erik Richter said:

earch is fast - as long as one picks at least one filter/selector. Displaying all 1,2 mio results takes about 14-16 seconds, even with the limit and pagination. I guess its because of the count?

 You can pass in a selector like this: get_count=0 so it doesn't do the counting, that will get the find to  much better performance.

Check this topic: 

 

I'd still wonder if there isn't a way MySQL could cache that count? Just out of curiosity.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...