Jump to content

Web App with millions of pages


formulate
 Share

Recommended Posts

I'm developing a web app in PW and have a lot of PW experience. However, this particular project will be very large scale and I have some questions.

The "app" will be creating approximately 3000 pages per day at launch and continue to grow with an expected 300,000 pages per day after a few months. As you can see, even after half a year I will be over 100+ million pages. Frankly, this seems ridiculous, but it's the case.

1. Can ProcessWire even handle this?

2. Does it just come down to server capabilities?

3. Should I consider trying to break this down to separate multiple databases?

4. Alternately, instead of Pages, should I look at using fields instead? Maybe storing JSON in a text field? This would reduce the amount of pages to less than 100 per day, even after half a year. I presume the database itself would still get very large.

Thoughts?

Thanks.

Link to comment
Share on other sites

For the storage of data that does not change anymore after saving it, (I mean it's not necessary to edit and save constantly) I use the module Fieldtype YAML. It's very easy to read the data and to save it too. You can save simple objects or an entire structure in the field, however you want to do, only using an associated array. For saving part check this post:

 

  • Like 1
Link to comment
Share on other sites

I think it depends a bit on how you display the data. I'm doing a 2+ million pages and I'm struggling a lot with queries. But displaying pages of the data is not that bad.  Searching on one field, such as a title field is somehow fast (a 5- 10 seconds). Trying to do something like the following can be really slow:

$pages->find("template=blog_post, page_reference_field1.title|title|page_reference_field2.title%='something I want to match'");

I'm using a lightsails erver 2vcpu, 4gb ram. Don't really know if using a dedicated db server would help. 

Link to comment
Share on other sites

Thanks to both of you for the feedback.

Maybe I'm approaching this wrong. All I'm storing is time stamps. There's a hierarchical organization of pages and within these, a need to store time stamps. I considered JSON at a lower sub-level of pages, but even then, the JSON would get too large for the MySQL character limit for the text field. Also, the JSON would become time consuming to process and work with.

Is there a better way of storing millions of time stamps that I'm not thinking of?

Link to comment
Share on other sites

4 hours ago, formulate said:

All I'm storing is time stamps. There's a hierarchical organization of pages and within these, a need to store time stamps.

I think custom database tables and SQL queries would be the way to go.

This is a good read on working with hierarchies in MySQL: http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/

  • Like 3
Link to comment
Share on other sites

21 hours ago, elabx said:

I'm doing a 2+ million pages and I'm struggling a lot with queries.

Not sure if that would help, but I'm curious: Have you ever tried RockFinder2 ? ? 

+1 for custom db tables and custom SQL. RockFinder2 makes combining custom SQL + PW magic (like access control, hidden/published pages, pagination etc) really easy!

  • Like 1
Link to comment
Share on other sites

1 hour ago, bernhard said:

+1 for custom db tables and custom SQL. RockFinder2 makes combining custom SQL + PW magic (like access control, hidden/published pages, pagination etc) really easy!

Gonna try this ASAP. Thanks!

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...