Jump to content
Mobiletrooper

About ProcessWire & Performance

Recommended Posts

Hey Ryan, hey friends,

we, Mobile Trooper a digital agency based in Germany, use ProcessWire for an Enterprise-grade Intranet publishing portal which is under heavy development for over 3 years now. Over the years not only the user base grew but also the platform in general. We introduced lots and lots of features thanks to ProcessWire's absurd flexibility. We came along many CMS (or CMFs for that matter) that don't even come close to ProcessWire. Closest we came across was Locomotive (Rails-based) and Pimcore (PHP based).

So this is not your typical ProcessWire installation in terms of size.

Currently we count:

140 Templates (Some have 1 page, some have >6000 pages)

313 Fields

~ 15k Users (For an intranet portal? That's heavy.)

~ 195 431 Pages (At least that's the current AUTOINCREMENT)

 

I think we came to a point where ProcessWire isn't as scalable anymore as it used to be. Our latest research measured over 20 seconds of load time (the time PHP spent scambling the HTML together). That's unacceptable unfortunately. We've implemented common performance strategies like:

Spoiler

 

- Running latest PHP

- Enabling OPCache

- Implement WireCache / File and Database Caching

- Installed ProDrafts (but it doesn't work for our use case)

- Added autojoin to many fields (not all of them, just the ones that are used the most)

- Added mySQL indizes and tweaked our config a bit.

- Removed long running tasks like loops, background jobs etc.

 

 

 

We're running on fat machines (DB server has 32 gigs RAM, Prod Web server has 32gigs as well. Both are running on quadcores (xeons) hosted by Azure.

We have load balancing in place, but still, a single server needs up to 20 sec to respond to a single request averaging at around about 12 sec.

In our research we came across pages that sent over 1000 SQL queries with lots of JOINs. This is obviously needed because of PWs architecture (a field a table) but does this slow mySQL down much? For the start page we need to get somewhere around 60-80 pages, each page needs to be queried for ~12 fields to be displayed correctly, is this too much? There are many different fields involved like multiple Page-fields which hold tags, categories etc.

We installed Profiler Pro but it does not seem to show us the real bottleneck, it just says that everything is kinda slow and sums up to the grand total we mentioned above.

ProCache does not help us because every user is seeing something different, so we can cache some fragments but they usually measure at around 10ms. We can't spend time optimising if we can't expect an affordable benefit. Therefore we opted against ProCache and used our own module which generates these cache fragments lazily. 
That speeds up the whole page rendering to ~7 sec, this is acceptable compared to 20sec but still ridiculously long.

Our page consists of mainly dynamic parts changing every 2-5 minutes. It's different across multiple users based on their location, language and other preferences.

We also have about 120 people working on the processwire backend the whole day concurrently.

 

What do you guys think?

Here are my questions, hopefully we can collect these in a wiki or something because I'm sure more and more people will hit that break sooner than they hoped they would:

 

- Should we opt for optimising the database? Since >2k per request is a lot even for a mysql server, webserver cpu is basically idling at that time.

- Do you think at this point it makes sense to use ProcessWire as a simple REST API?

- In your experience, what fieldtypes are expensive? Page? RepeaterMatrix?

- Ryan, what do you consider as the primary bottleneck of processwire?

- Is the amount of fields too much? Would it be better if we would try to reuse fields as much as possible?

- Is there an option to hook onto ProcessWires SQL builder? So we can write custom SQL for some selectors?

 

Thanks and lots of wishes,

Pascal from Mobile Trooper

 

 

  • Like 3

Share this post


Link to post
Share on other sites

Hi Pascal,

sounds like a very interesting project! This might be an interesting read for you (I have to admit that I didn't understand correctly what they meant by "community" and "dynamic"): 

 

1 hour ago, Mobiletrooper said:

We also have about 120 people working on the processwire backend the whole day concurrently.

I'm just curious how you manage concurrent page edits here? Do you use some locking technique?

 

 

Share this post


Link to post
Share on other sites

Hi Pascal,

the question might be redundant, but have you ruled out network latency between the web and database server (e.g. by testing with a local copy)? Are you using InnoDB? If not, that might be a chance for a noticeable improvement.

Otherwise, you're at a size where you can't follow a simple optimization guide, and your approach with lazy generation of cache parts is certainly going to be one piece of the puzzle. Our corporate intranet is a little under half of the size of your site given the figures above, but we have a lot less (speak roughly 1000) users. We're using a mix of relatively straight forward markup cache with invalidation on page save and regeneration on access every n minutes/hours, and a few cron jobs that assemble more complex dynamic content and stuff it into memcache, so our approach is not that much different from yours.

It might make sense to hook into page save and store as much relevant page data as possible in a single (hidden) field in JSON and only load data that you have to get to through the page's accessors (multi language text fields, page references etc.) to reduce the number of queries, adding the delay to the save operation instead of every load.

2 hours ago, Mobiletrooper said:

Is there an option to hook onto ProcessWires SQL builder? So we can write custom SQL for some selectors?

What @bernhard forgot to mention is that his RockFinder module might be worth a look in that regard.

  • Like 3

Share this post


Link to post
Share on other sites
1 minute ago, BitPoet said:

What @bernhard forgot to mention is that his RockFinder module might be worth a look in that regard.

Actually I didn't forget to mention it but I think/thought that it might not be helpful in this case. The performance boost comes from not loading pages into memory and this is especially helpful when you want to show data as a grid. The query itself is a regular PW pages->find() query with lots of joins, so I'm not sure if that would make a difference in his szenario. I far as I understood his problem is the other way round: They do not need to display lots of pages in one grid, they do need to execute lots of different selectors in lots of different places (custom widgets, boxes or whatever you may call them, with custom data).

But yeah, you are right. It might be worth a look anyhow 🙂 

  • Like 1

Share this post


Link to post
Share on other sites

When thousands of queries (updating content) are bringing your system to its knees, InnoDB with transactions will bring a big quality of life improvement. Was just discovering this through this issue. Reading the issue comments, you can see Ryan is looking into making use of transactions more when they are available.

You can already use them in your own templates and modules: https://processwire.com/blog/posts/using-innodb-with-processwire/

Share this post


Link to post
Share on other sites

What version of PW are you on?  Newer the better?

Also, recently one thing that was driving me nuts was a bug related to ListerPro pages taking forever to load (you might not be able to view the link since it's in the private ListerPro forum):

The temporary work-around for that was to remove all the filters in my ListerPro.  For some reason, my ListerPro was loading all 2000 pages on my site on each page load causing a slowdown.

Is the site still slow when there's only a handful of people using it, as opposed to over 100+?  If so, that may indicate something.

Have you enabled debug mode in PW?  Anything stand out?

Share this post


Link to post
Share on other sites

Hey friends,

I will answer each of your questions below:

@bernhard thanks for the link. Some good insights on $page->rootPage->find and "*=" selectors. Your summary sums it up perfectly, we need to load lots of different pages and fire hundreds/thousands of different complex selectors each customized per user. But a big chunk of the page is actually displaying content in a grid, so your module will come in handy although memory on our web server is not the issue currently. For concurrent edits to be honest I caused some misconception here, they work on different parts on the site concurrently but never on a single page together, so mySQL basically queues the inserts and performs them in sequence.

 

@BitPoet we ruled that out already, latency averages at 0.500ms. We will implement better cache strategies later. The hidden JSON field is nice, this would at least reduce some joins and balances work between DB and Web. 

 

@Beluga We're giving InnoDB a shot now, thanks for the reminder. I will get back if that was the issue.

 

@Jonathan Lahijani we're running latest PW master (= 3.0.98). We also had issues with ListerPro being unusable when querying specific fields.
Site is slow in general, even without many users. We do not have debug mode enabled on prod.

 

I think InnoDB is worth a shot at this point,  also some cache strategies sound quite interesting. I will get back with more infos.

 

Edit: InnoDB is working smooth, but performance is still mediocre. I'm checking whether or not joins are the reasons for the slowdowns we are expecting and try to fetch data from a single field.

 

Thanks so far. 

  • Like 2

Share this post


Link to post
Share on other sites
2 hours ago, Mobiletrooper said:

We do not have debug mode enabled on prod.

You should be able to do this for superusers only, if you need to :-);

Read from here onward.

 

 

  • Like 1

Share this post


Link to post
Share on other sites

I've had similar experiences on a project of mine, which was quite a bit smaller, but also quite a bit less spec'ed in terms of server resources. What I noticed was the big hit of loading all the templates/fields upfront for each request, which added quite a hefty ms count to requests for somethings that hardly ever changed and it needs to be done before any actual work related to the request can be started. About a year ago we decided to go for a rewrite though, but that might be a place to look into for optimization.

Also as with any other project using sql you want to look out for n+1 queries, which are actually quite easy to create with processwire as fields are loaded magically on demand.  You can use autojoin manually as option for $pages->find() if needed to mitigate those. I'd also care more about the number of sql requests and less about how many joins they use. Joins can be a performance hit and sometimes 2 queries might be quicker than one with a join, but 1000 queries for a single page request sound unnecessarily many.

  • Like 1

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By jds43
      Hello,
      Does anyone have experience with migrating content from Django to Processwire? Or are there any suggestions for achieving this?
    • By Brawlz
      Hi,
      I hope this is the correct section for my problem.
      All I need is a connection to an external Database and a query gettings some data. I do this in a processwire Page-Template. I am honestly not sure if it is a problem with processwire or my code:
      $host = ‚XXXXX’; $user = ‚XXXXX‘; $pass = ‚XXXXX‘; $db = ‚XXXXX‘; $port = ‚3306‘; $mydb = new Database($host, $user, $pass, $db , $port);  $result = $mydb->query("SELECT * FROM char“);  while($row = $result->fetch_assoc()) {  print_r($row);  }  
      Produces the following error:
      Error: Exception: DB connect error 2002 - Connection timed out (in /customers/9/4/e/XXXX.de/httpd.www/wire/core/Database.php line 79)
       
      I also tried connecting without the $port variable but got the same error.
    • By Peter Knight
      Is there a way to make JPGs progressive by default via the API?
      I've added the following to my site/config.php file but user-uploaded images are often displayed as non progressive.
      $config->imageSizerOptions = array( 'upscaling' => true, // upscale if necessary to reach target size? 'cropping' => true, // crop if necessary to reach target size? 'autoRotation' => true, // automatically correct orientation? 'interlace' => true, // use interlaced JPEGs by default? (recommended) 'sharpening' => 'soft', // sharpening: none | soft | medium | strong 'quality' => 95, // quality: 1-100 where higher is better but bigger 'hidpiQuality' => 60, // Same as above quality setting, but specific to hidpi images 'defaultGamma' => 0.5, // defaultGamma: 0.5 to 4.0 or -1 to disable gamma correction (default=2.0) ); Thanks
    • By cosmicsafari
      Hi all,
      I have been asked by a client whether we can setup load balancing for their existing Processwire site.
      From my investigations on Google and within these forums, it definitely seems possible but as a newbie with a basic understanding of the subject im a bit lost.
      Does anyone know of any existing tutorials for settings up load balancing with PW?
      What items would need to be changed on their current stand alone install, is there a list of best practices worth consulting etc?
      As I understand it we would need to have some sort of copying mechanism (rsync script most likely) in order to make sure any uploaded assets are shared between the main server and the fallback ones, other than that im not sure what else would need to be ammended.
      Any thoughts/help would be greatly appreciated.
    • By mr-fan
      What i wanna achive is a simple counter like that count up on visit (this is no problem) AND save the specific date (year/month/day) of the count...
      in the end i will be able to get visits per day/per month/per year in a nice and dirty graph.
      Just to have a way better simple counter system.
      Should i only go with a complex setup of pages like this:
      --stats (home template for pageviews)
      ----2018 (year)
      ------08 (month)
      ---------29 ->page_views   (integers on every day template)
      ---------30 ->page_views
      Or just simple use:
      --stats (home template for pageviews)
      ---->count (template) that holds simple field page_views and a date field
      or could a fieldtype like tables (one table field for every month/year or so) be also a solution?
      Or a own SQL table special for this and use it in a module? I don't have any experience on this topic...
      What i have in mind of performance sideeffects on such a thing?
      Or is there a solution that works with PW?
      I wanna go the hard way and implement something like this:
      http://stats.simplepublisher.com/
      only directly within PW and use the API to get the data...maybe create a simple module from it later i don't know if i  could set it up right from the start 😉
      this is the reason for my questions on more experienced devs
      Kind regards mr-fan
       
×
×
  • Create New...