Jump to content

ProcessWire for larger site with community driven content - where are the limits?


Christof Kehr
 Share

Recommended Posts

Hello everyone,

this is my first post on the forum and before I start with my questions, I want to say a big thanks to all people working on this wonderful framework!

I'm a happy ProcessWire user, running a smaller site since two years and another just recently launched.
Now I'm planning for a larger project and I'm wondering if ProcessWire is the right framework here, or where its limits are, especially regarding the database.

The project will be a community site with typical features (profiles, events, forums) supporting local arts communities.
The communities create their own content, so it will also have frequent write requests.

For the MVP there will be just two pilot cities, but on the long it should support cities worldwide.
I'm pretty sure that ProcessWire will have no issues with the MVP, but will it also serve it when it scales?

Some test I did so far with PW 3.0.98, apache 2.4.29, php 7.2.2, mysql 5.7.19 on my desktop (Windows10_64, i3-6300/16GB/NVMe):
- create a ProcessWire setup with the skyscrapers profile
-- browse pages and log mysql queries
-- run apache-bench against it
- create another ProcessWire setup with the blank profile
-- browse page and log mysql queries
-- run apache-bench against it

While the apache-bench results were actually good, I'm a bit scared about the number of mysql queries I've seen for a single page request.
On the skyscapers/about/ I got 76 requests, on the blank home still 23.
Repeating my requests, I see the same queries hitting mysql again, so there is nothing cached.

Questions:
- is the number of mysql queries usual?
- can a layer like memcached be implemented easily?
- is ProCache a solution also for frequently changing content?
- are there experiences with clustering?

Thank you very much for your time!
Christof

  • Like 1
Link to comment
Share on other sites

Hello,

ProcessWire is built with the idea that it can handle 1,000's of pages. However, there has been sites in the past that have 100,000's. This is where things like $pages->findMany() came in. People have ran large websites without any issue. I can't say much for the amount of requests however, I don't think ProCache is right for you in this instance. For a dynamic site you are best taking advantage of the core cache system (https://processwire.com/api/ref/cache/).

For example, you can use core cache to cache all posts so each user only gets post from the cache and you can clear that cache only when someone posts a comment. ProcessWire's API is so powerful once you get into it that something like this will actually be really easy. 

I can honestly say you are making the right choice with ProcessWire for this project, and with a few tweaks you can have it scale very nicely. 

  • Like 2
Link to comment
Share on other sites

PW runs potentially slower when debug mode is on I guess. Did you disable it in your tests?

There are quite a few big sites and web-apps (even multi-site environments) built with PW. 

Perhaps at some point, no matter what framework you're choosing, you'll need to tweak and optimize at some point.

I'm sure it will only be a matter of time until @bernhard chimes in here and tell you about the wonderful RockFinder SQL module he built.

Some "native PW" queries were dramatically faster when using that module. Just one of many examples what you can do if you want even more speed.

re: ProCache

So far, I haven't had the need for this, but I only keep hearing good things about it. FWIW, here's a related forum thread 

And finally, there's also some "best practise" stuff worth knowing, like (chosen randomly):

 

  • Like 4
Link to comment
Share on other sites

oh, the author simply moved the new version to another, newer forum post, that's all (afaik).

Here's a recent PW community-site (with a nice retro touch) btw, maybe you can get some ideas or inspiration there

 

Link to comment
Share on other sites

Hi @Christof Kehr and welcome to the forum.

1 hour ago, Christof Kehr said:

While searching the forum for sql performance some days ago I stumbled upon the RockSqlFinder post, but as it stated "outdated", I did not look closer into it unfortunately. Thanks for pointing me there, @dragan!

Thx for the hint, I renamed the topic and placed a more obvios comment in the first post.

2 hours ago, Christof Kehr said:

- is ProCache a solution also for frequently changing content?

Why should it not be a solution? It's an absolutely awesome tool, bypassing sql and php completely once the content has been cached, so for sure it would reduce your server load drastically. For example if you had 2.000 requests ans content changed 2 times during these 2.000 requests this would mean 2 requests including php+mysql and 1.998 requests serving only the static HTML file ?

1 hour ago, dragan said:

I'm sure it will only be a matter of time until @bernhard chimes in here and tell you about the wonderful RockFinder SQL module he built.

Thx for the kudos, but I don't think he will need my module for the features he mentioned so far ? RockFinder is great for listings of pages. It is especially built for the coming RockGrid module where you can list thousands of pages in a grid that the user can manipulate on the client side (filtering, instant aggregations etc). It can also be handy for things like CSV export or RSS feeds or the like. Other than that I don't think that it is that useful and the core utilities are perfectly fine (just use pagination and proper limits).

ProcessWire is already built with scalability in mind and others have already reported sites handling millions of pages (see here for example https://processwire.com/talk/topic/9491-site-with-millions-of-„pages”/ ). It seems that in your case you are more concerned about the scalability of the infrastructure (requests) than the amount of data handled by the system (pages)?! ProCache will for sure be great:

Quote

Using ApacheBench with the homepage of the Skyscrapers site profile, we completed 500 requests (10 concurrent) to the homepage. The amount of time occupied to complete each of these was as follows:

  • 29 seconds: no cache enabled
  • 6 seconds: built-in cache enabled
  • 0.017 seconds: ProCache enabled

These are typical results. As you can see, the performance benefits are huge. ProCache gives you the ability to drastically reduce server resources and exponentially increase the amount of traffic your server could handle. This is especially useful for traffic spikes.

https://processwire.com/api/modules/procache/

2 hours ago, Christof Kehr said:

Repeating my requests, I see the same queries hitting mysql again, so there is nothing cached.

Not sure about this one, because the find queries usually are cached, but I don't know exactly how that works because usually you don't have to care about such things as it just works? 

1 hour ago, Tom. said:

I can't say much for the amount of requests however, I don't think ProCache is right for you in this instance. For a dynamic site you are best taking advantage of the core cache system (https://processwire.com/api/ref/cache/).

Maybe I'm misunderstanding you or @Christof Kehr but why do you think ProCache would be no good idea here? Or why should the core cache be the better choice? Of course, ProCache is not the solution to all our problems, but this statement definitely needs a more detailed explanation ? 

  • Like 1
Link to comment
Share on other sites

11 hours ago, bernhard said:

Not sure about this one, because the find queries usually are cached, but I don't know exactly how that works because usually you don't have to care about such things as it just works? 

Find queries are cached for the lifetime of a request. Caching them more permanently needs additional code. I think there was an example for using MarkupCache to store JSONified query results in the forum a while ago, but I might remember wrongly.

With highly dynamic sites, everything you know about caching best practices is at first no longer valid. Until you start factoring out the dynamic parts, that is. Decouple searches from the database by using a dedicated in-memory search engine (Lucene, OpenSearchServer etc.) to avoid high loads and table level locks. Cache IDs and values retrieved from frequent searches in memory (like newest n posts/comments), and when you scale up, delegate the job of updating those to cronjobs to eliminate any delays caused by rebuilding expired memory caches. Retrieve dynamic parts of the page through JS in JSON format and render them with a frontend framework so the static parts and templates can be served from the filesystem (that's where ProCache ties in). Design your pages to use placeholders for dynamic content so your users don't have to wait for client side rendering of dynamic content before they can start using the page.

  • Like 2
Link to comment
Share on other sites

Note: I believe WireCache is the successor of MarkupCache (which is unofficially deprecated?).

From this blog post (April 2015):
https://processwire.com/blog/posts/processwire-core-updates-2.5.28/

Quote

If you have used MarkupCache in the past, you will likely want to consider using WireCache instead, as it can do quite a lot more.

 

  • Like 1
Link to comment
Share on other sites

12 hours ago, bernhard said:

Maybe I'm misunderstanding you or @Christof Kehr but why do you think ProCache would be no good idea here? Or why should the core cache be the better choice? Of course, ProCache is not the solution to all our problems, but this statement definitely needs a more detailed explanation ? 

I have not used ProCache yet, so I was just asking if its the right tool to deal with frequently changing content. Never said it would not be a solution - how could I?

But as @BitPoet said, there are general concerns with caching highly dynamic content.

There is already a lot for me to get into, to read and try out.
First I need to get a better understanding of features like autojoin and template caching.
Then I'll employ WireCache to reduce the number of mysql requests and bench again.
Finally I'll try ProCache - with the impressive metrics you posted it sounds must-have.

I'll post my findings afterwards here.

Link to comment
Share on other sites

2 hours ago, Christof Kehr said:

I have not used ProCache yet, so I was just asking if its the right tool to deal with frequently changing content. Never said it would not be a solution - how could I?

Sorry, I was unclear here ? I meant maybe I'm misunderstanding him (and why he thinks ProCache is no solution) or maybe I'm misunderstanding what exactly you are trying to build and therefore I do not understand Tom's statement.

 

3 hours ago, BitPoet said:

With highly dynamic sites, everything you know about caching best practices is at first no longer valid.

Is there any standard definition of "highly dynamic site"? Because I think this should be very clear before we talk about possible solutions... Hooks populating cache fields can also save you from hungry db requests, for example...

Link to comment
Share on other sites

56 minutes ago, bernhard said:

Is there any standard definition of "highly dynamic site"? Because I think this should be very clear before we talk about possible solutions... Hooks populating cache fields can also save you from hungry db requests, for example...

In my personal opinion the definition is rather fluid and depends on too many factors to give a generic rule, but I consider a site highly dynamic when multiple components of often visited templates (especially home and other landing pages) have user or group specific content and that content changes more often than on a daily basis. Whenever I see "social" or "communities" in a site's description, that's likely the case.

Link to comment
Share on other sites

If @BitPoet's description is fitting for the intended project I'd say there might be more fitting tools out there than ProcessWire (also depending on your skillset though). It's a great CMS and you can make it work in a lot of situations, but it does have it's issues at bigger scale projects like being quite prone to n+1 queries if you're not cautious and pitfalls like getting into trouble with a great diversity in templates/fields because those are all loaded on each request. Caching can help in places, but dynamic content is hard to cache and personally I'd always look for the options, where you can get away without caching first before depending on it working out elsewhere. 

  • Like 1
Link to comment
Share on other sites

Well, on the long run we expect highly dynamic content as @BitPoet has described it.
I say expect, because with communities you really never know.
That's why we are planning for a MVP and two pilot cities to start with.

My skillset is limited, I'm actually a Scrum PO and not a software developer in the first place.
So after the MVP will be a decision, either to stay with ProcessWire and get help by experienced developers as freelancers, or to employ a development team and build a custom system according to our needs.
Even for building the MVP some freelancer help will be needed, I think.

The reason why I asked long term questions here is:
If we know upfront that ProcessWire will not be the long term solution, we would setup the VMP in a way that the backed can be replaced while keeping the frontend (e.g. let ProcessWire deliver content via REST and build an independent Vue.js client as frontend).

That would be more effort and an anti-pattern for MVP where you always want to take the shortest path, but we are also considering this approach.

  • Like 1
Link to comment
Share on other sites

4 hours ago, Christof Kehr said:

That would be more effort and an anti-pattern for MVP where you always want to take the shortest path, but we are also considering this approach.

IMHO building your prototype with PW gives you a solid base from which you can go in every direction. You can build just a part in Vue and let the Vue app grow over time. There's a little bit of effort involved to tie PW's pages, fields and templates and Vue templates together smoothly without a lot of overhead, but an experienced Vue dev should be able to wire that up quickly and in a reusable way (Vue would be my choice there as well). Exposing PW pages and limited write actions through ajax isn't hard to do either. It's also a walk in the park to migrate PW content into less normalized "external" tables with a small bootstrap script if performance really requires it, so unlike other CMSes where you have all kinds of different things (posts, pages, parts, whatever) in different structures, you don't really lock yourself in with PW.

I agree with @LostKobrakai insofar as that there are in fact a few areas where a custom tool will get you further, like if you need a full-blown discussion forum. The choices there are rather limited if you want a good integration. Invision (the software this forum runs on) seems to be rather straight forward in that regard. I have tried my hand on a phpBB3 integration module and while the basics mostly work, it's really a PITA in some parts (like outdated examples and misleading documentation) and I've run out of time to get it into a really usable shape. There are probably a few Laravel forum components I've never heard of that can get you most of the way as a half way option. I think that will be the hardest part to decide, where will PW (or any other CMS or own development, in fact) ever only be the second best choice compared to a single purpose tool you can integrate.

  • Like 4
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...