hheyne

Site with millions of „pages”

Recommended Posts

Hello,

I have a question about a recent project. In this project I use PW as framework (really amazing framework) to build a web app. In this app users can login (via pw user, group access) and post something. I think there will be one million posts per year (200 users, each 20 posts per day). Every post is of course a „page”. Do anyone have had any trouble with the database in such cases? Is there anything (database or process wire) to keep an closer eye on?

Best Regards

Henning

Share this post


Link to post
Share on other sites

There are people using that much pages (https://processwire.com/talk/topic/9336-need-help-deleting-an-empty-field-from-a-template-with-2-million-pages/?hl=%2Bmillion+%2Bfield) and the only thing that does not automatically scale for that much pages are files/images. I can't recall where I've read about it, but basicly it's a filesystem limitation, which lets folders only have a certain amount of subfolders/items. ProcessWire needs to be set to use multiple site/assets/files/ folders to prevent a "overflow" of the standart single folder.

  • Like 3

Share this post


Link to post
Share on other sites

@lostkobrakai There's in config.php for that problem.

$config->pagefileExtendedPaths = true;

Apart from that there's no limit in PW, going for millions may require special and different strategy depending on what it is built for. After millions of pages a fulltext search on large text's can grow linear quickly to several seconds. One also just have to take care about what code you build and it's easy to run into timeout or memory limit if you don't be careful. Also it depends if there's a lot going on like lots of user that post something. PW handles that all well but depends also on what server.

Edited by Adam Kiss
Added code styling
  • Like 4

Share this post


Link to post
Share on other sites

I am currently managing a PW site with 2 million+ pages. It's admirably fast, and much, much faster than any other CMS we tested. Searching is also ridiculously fast when done on single fields like title. (I also just did a test search using the page finder and it took < 4 seconds to find pages which had a particular field empty from a template which has 1.63 million pages.)

The site doesn't deal with many image or file uploads (yet), but two optimizations I have applied so far are to 1) always, always use limits on using $pages->find(), and 2)to cache the sitemaps (which contain thousands of links each) using Procache(https://processwire.com/api/modules/procache/)

Once you know where specifically your site is using the most resources, you can apply more selective caching / database optimizations.

Thanks for starting this topic, I learned about pageFileExtendedPaths... there's always some cool feature I didn't know about and now must have!

  • Like 12

Share this post


Link to post
Share on other sites

Thank you very much for all the answers and hints. I will proceed with the project and report later (in a few month) about the result and what I have learned so far from this project.

Share this post


Link to post
Share on other sites

 Searching is also ridiculously fast when done on single fields like title. (I also just did a test search using the page finder and it took < 4 seconds to find pages which had a particular field empty from a template which has 1.63 million pages.)

Have you tried using the cache fieldtype to combine multiple fields' data into one for things such as searching? It's in the core an Teppo explains it well here: https://processwire.com/talk/topic/5513-fieldtype-cache-please-elaborate/

  • Like 2

Share this post


Link to post
Share on other sites

Thanks for the suggestion and the link, Pete, so far our front-end search needs were limited to the title field, but this is going to change soon as lots of social features are being implemented, so I will definitely need to learn all about the cache fieldtype.

enricob, the site is currently in open beta and lacking some functionality and server tuning we want to have in place before sharing (especially as an example of what PW can do). A lot of new features are still being implemented and hopefully the site should be well-tested and ready to showcase within about a month. I am eagerly looking forward to posting a case-study about it in the forums then, and will drop you a note as a heads up.

  • Like 4

Share this post


Link to post
Share on other sites

@nickie I'm also looking forward to see the site.

To the end of the year I will share the results of my project – hopefully it will be in a nearly finished state until then.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


  • Recently Browsing   0 members

    No registered users viewing this page.