Jump to content

About ProcessWire & Performance


Mobiletrooper
 Share

Recommended Posts

Hey Ryan, hey friends,

we, Mobile Trooper a digital agency based in Germany, use ProcessWire for an Enterprise-grade Intranet publishing portal which is under heavy development for over 3 years now. Over the years not only the user base grew but also the platform in general. We introduced lots and lots of features thanks to ProcessWire's absurd flexibility. We came along many CMS (or CMFs for that matter) that don't even come close to ProcessWire. Closest we came across was Locomotive (Rails-based) and Pimcore (PHP based).

So this is not your typical ProcessWire installation in terms of size.

Currently we count:

140 Templates (Some have 1 page, some have >6000 pages)

313 Fields

~ 15k Users (For an intranet portal? That's heavy.)

~ 195 431 Pages (At least that's the current AUTOINCREMENT)

 

I think we came to a point where ProcessWire isn't as scalable anymore as it used to be. Our latest research measured over 20 seconds of load time (the time PHP spent scambling the HTML together). That's unacceptable unfortunately. We've implemented common performance strategies like:

Spoiler

 

- Running latest PHP

- Enabling OPCache

- Implement WireCache / File and Database Caching

- Installed ProDrafts (but it doesn't work for our use case)

- Added autojoin to many fields (not all of them, just the ones that are used the most)

- Added mySQL indizes and tweaked our config a bit.

- Removed long running tasks like loops, background jobs etc.

 

 

 

We're running on fat machines (DB server has 32 gigs RAM, Prod Web server has 32gigs as well. Both are running on quadcores (xeons) hosted by Azure.

We have load balancing in place, but still, a single server needs up to 20 sec to respond to a single request averaging at around about 12 sec.

In our research we came across pages that sent over 1000 SQL queries with lots of JOINs. This is obviously needed because of PWs architecture (a field a table) but does this slow mySQL down much? For the start page we need to get somewhere around 60-80 pages, each page needs to be queried for ~12 fields to be displayed correctly, is this too much? There are many different fields involved like multiple Page-fields which hold tags, categories etc.

We installed Profiler Pro but it does not seem to show us the real bottleneck, it just says that everything is kinda slow and sums up to the grand total we mentioned above.

ProCache does not help us because every user is seeing something different, so we can cache some fragments but they usually measure at around 10ms. We can't spend time optimising if we can't expect an affordable benefit. Therefore we opted against ProCache and used our own module which generates these cache fragments lazily. 
That speeds up the whole page rendering to ~7 sec, this is acceptable compared to 20sec but still ridiculously long.

Our page consists of mainly dynamic parts changing every 2-5 minutes. It's different across multiple users based on their location, language and other preferences.

We also have about 120 people working on the processwire backend the whole day concurrently.

 

What do you guys think?

Here are my questions, hopefully we can collect these in a wiki or something because I'm sure more and more people will hit that break sooner than they hoped they would:

 

- Should we opt for optimising the database? Since >2k per request is a lot even for a mysql server, webserver cpu is basically idling at that time.

- Do you think at this point it makes sense to use ProcessWire as a simple REST API?

- In your experience, what fieldtypes are expensive? Page? RepeaterMatrix?

- Ryan, what do you consider as the primary bottleneck of processwire?

- Is the amount of fields too much? Would it be better if we would try to reuse fields as much as possible?

- Is there an option to hook onto ProcessWires SQL builder? So we can write custom SQL for some selectors?

 

Thanks and lots of wishes,

Pascal from Mobile Trooper

 

 

  • Like 3
Link to comment
Share on other sites

Hi Pascal,

sounds like a very interesting project! This might be an interesting read for you (I have to admit that I didn't understand correctly what they meant by "community" and "dynamic"): 

 

1 hour ago, Mobiletrooper said:

We also have about 120 people working on the processwire backend the whole day concurrently.

I'm just curious how you manage concurrent page edits here? Do you use some locking technique?

 

 

Link to comment
Share on other sites

Hi Pascal,

the question might be redundant, but have you ruled out network latency between the web and database server (e.g. by testing with a local copy)? Are you using InnoDB? If not, that might be a chance for a noticeable improvement.

Otherwise, you're at a size where you can't follow a simple optimization guide, and your approach with lazy generation of cache parts is certainly going to be one piece of the puzzle. Our corporate intranet is a little under half of the size of your site given the figures above, but we have a lot less (speak roughly 1000) users. We're using a mix of relatively straight forward markup cache with invalidation on page save and regeneration on access every n minutes/hours, and a few cron jobs that assemble more complex dynamic content and stuff it into memcache, so our approach is not that much different from yours.

It might make sense to hook into page save and store as much relevant page data as possible in a single (hidden) field in JSON and only load data that you have to get to through the page's accessors (multi language text fields, page references etc.) to reduce the number of queries, adding the delay to the save operation instead of every load.

2 hours ago, Mobiletrooper said:

Is there an option to hook onto ProcessWires SQL builder? So we can write custom SQL for some selectors?

What @bernhard forgot to mention is that his RockFinder module might be worth a look in that regard.

  • Like 3
Link to comment
Share on other sites

1 minute ago, BitPoet said:

What @bernhard forgot to mention is that his RockFinder module might be worth a look in that regard.

Actually I didn't forget to mention it but I think/thought that it might not be helpful in this case. The performance boost comes from not loading pages into memory and this is especially helpful when you want to show data as a grid. The query itself is a regular PW pages->find() query with lots of joins, so I'm not sure if that would make a difference in his szenario. I far as I understood his problem is the other way round: They do not need to display lots of pages in one grid, they do need to execute lots of different selectors in lots of different places (custom widgets, boxes or whatever you may call them, with custom data).

But yeah, you are right. It might be worth a look anyhow ? 

  • Like 1
Link to comment
Share on other sites

When thousands of queries (updating content) are bringing your system to its knees, InnoDB with transactions will bring a big quality of life improvement. Was just discovering this through this issue. Reading the issue comments, you can see Ryan is looking into making use of transactions more when they are available.

You can already use them in your own templates and modules: https://processwire.com/blog/posts/using-innodb-with-processwire/

Link to comment
Share on other sites

What version of PW are you on?  Newer the better?

Also, recently one thing that was driving me nuts was a bug related to ListerPro pages taking forever to load (you might not be able to view the link since it's in the private ListerPro forum):

The temporary work-around for that was to remove all the filters in my ListerPro.  For some reason, my ListerPro was loading all 2000 pages on my site on each page load causing a slowdown.

Is the site still slow when there's only a handful of people using it, as opposed to over 100+?  If so, that may indicate something.

Have you enabled debug mode in PW?  Anything stand out?

Link to comment
Share on other sites

Hey friends,

I will answer each of your questions below:

@bernhard thanks for the link. Some good insights on $page->rootPage->find and "*=" selectors. Your summary sums it up perfectly, we need to load lots of different pages and fire hundreds/thousands of different complex selectors each customized per user. But a big chunk of the page is actually displaying content in a grid, so your module will come in handy although memory on our web server is not the issue currently. For concurrent edits to be honest I caused some misconception here, they work on different parts on the site concurrently but never on a single page together, so mySQL basically queues the inserts and performs them in sequence.

 

@BitPoet we ruled that out already, latency averages at 0.500ms. We will implement better cache strategies later. The hidden JSON field is nice, this would at least reduce some joins and balances work between DB and Web. 

 

@Beluga We're giving InnoDB a shot now, thanks for the reminder. I will get back if that was the issue.

 

@Jonathan Lahijani we're running latest PW master (= 3.0.98). We also had issues with ListerPro being unusable when querying specific fields.
Site is slow in general, even without many users. We do not have debug mode enabled on prod.

 

I think InnoDB is worth a shot at this point,  also some cache strategies sound quite interesting. I will get back with more infos.

 

Edit: InnoDB is working smooth, but performance is still mediocre. I'm checking whether or not joins are the reasons for the slowdowns we are expecting and try to fetch data from a single field.

 

Thanks so far. 

  • Like 2
Link to comment
Share on other sites

I've had similar experiences on a project of mine, which was quite a bit smaller, but also quite a bit less spec'ed in terms of server resources. What I noticed was the big hit of loading all the templates/fields upfront for each request, which added quite a hefty ms count to requests for somethings that hardly ever changed and it needs to be done before any actual work related to the request can be started. About a year ago we decided to go for a rewrite though, but that might be a place to look into for optimization.

Also as with any other project using sql you want to look out for n+1 queries, which are actually quite easy to create with processwire as fields are loaded magically on demand.  You can use autojoin manually as option for $pages->find() if needed to mitigate those. I'd also care more about the number of sql requests and less about how many joins they use. Joins can be a performance hit and sometimes 2 queries might be quicker than one with a join, but 1000 queries for a single page request sound unnecessarily many.

  • Like 2
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...