Jump to content

Recommended Posts

Posted

Hello,

We've recently been researching how to use ProcessWire in a horizontal scaling environment (multiple server instances using a load balanced, read replica databases), and ran an experiment using AWS Elastic Beanstalk. 

Getting read replica databases up and running was easy - it's built in to the core: https://processwire.com/blog/posts/pw-3.0.175/#how-to-use-it-in-processwire

Using multiple server instances throws up one big problem: how to keep the filesystem on multiple instances in sync given that ProcessWire doesn't currently support using an external service (like s3 or EFS) as the filesystem.

The solution that we came up with is to use various Cloudflare services (R2, Stream, Images) to serve file assets, and we've built a module to facilitate this:

We're not using this in production yet, but our tests on EB were successful, and we're confident this will solve the main part of this problem. However the Cloudflare Images service is still quite new and there's still features to be rolled out (e.g. webP for flexible variants) so it can't be considered a complete solution yet.

Additionally, we use ProCache and this presents an additional multi-instance problem - if the cache is cleared on one, how can we clear it on all? Our solution is to log clears in the database and use this to sync up clearing. We built another module:

Again this worked well in our test, but isn't yet being used in production.

The main purpose of this thread, aside from sharing these potential solutions, is to ask for and discuss other experiences of hosting ProcessWire in a horizontal scaling environment. What solutions did you come up with (if you want to share them) and are there other potential issues we maybe haven't thought about?

Cheers,

Chris

  • Like 9
  • Thanks 1
Posted

Wow... is all I can say right now for the moment.
What amount of traffic or hits/second are you awaiting for that kind of setup?

I built and ran pretty cheap and simple setups that handled up to about 30-50k hits*/day without noticable issues - ok, those sites were ProCached and running behind Cloudflare CDN (free tier), yet... it worked out. They probably could have handled even more.

Nothing of my projects here are scaling horizontally, vertically or in any other direction ? compared to your setup.

It's not within your league of setups by any measure - but here is how I built something that scaled back in my days very well:

  • JS files came from sub[1-3].domain.tld
    • super necessary parts were inlined
    • file_get_contents of custom JS came from external sources
  • CSS files came from sub[1-3].domain.tld
    • almost all (critical) CSS was inlined
    • file_get_contents of custom CSS came from external sources
  • IMGs came from assets[1-3].domain.tld
  • Cloudflare took care of GZIP and compressing and caching the output (not sure about brotli)
  • ProCache took care of the heavy load prior to everything else as 95% of the whole site/s were cached (pre-cached by using a Sitescraper after each release) with a very long lifetime
  • Asset and file handling were kind of static and strict without much options for custom solutions (wasn't really necessary for those sites) as the overall page setups were kind of minimal and simple (blog style - minimal differences)
  • files like JS, CSS, IMGs came from other services and not my host, actually everything from a subdomain came from other services as the hosting was too cheap to handle lots of requests - I used Github, Zeitgeist (which is Vercel now - I guess), and some other services I can't remember, for that

It was a bl**dy hell to make that work back then (BUT I had to save money I didn't have then) - but those were also one of my very first real projects with ProcessWire then (one of my first public 10 projects ever, and most of them were my own projects) - nowadays that setup would probably be still annoying in some parts, yet more feasible and easier to handle with way better results.

My issues back then were limited database and webserver connections (those were over limit pretty fast) at my hosting companies (HostN*n, Dream***, Host***, Blue***, A2***, and such - super cheap) so I split all assets to other services and made them work via subdomains.

In the very early days I only paid something between 0,99 USD/month for those sites. Later on 2,99 USD and even later 8,99 USD.
It only became faster and faster. About a year before selling/shutting down those projects I paid about 60 USD/month/project. STEEP!

Still the almost same setups could easily handle more than double/triple the hits*/day nowadays but with far better pagespeed results than ever before.

Till today I'm happy with these kind of setups for my projects.
The moment I reach at least 50k+ hits*/day with a project I return to that but with methods and services from today.

What I use nowadays (for whatever reason - you will find out ??

  • webgo
  • IONOS
  • Hetzner
  • Plusline Server
  • Netlify
  • Vercel
  • Cloudflare Pages
  • Cloudflare CDN
  • Cloudinary
  • Planetscale
  • Runway
  • Superbase

 

* real hits/users/sessions - no fake requests
** paid plans for super high traffic sites, otherwise free tiers

  • Like 5
  • 2 years later...
Posted

@nbcommunication you probably resolved this since 2023 but in my setup I use AWS with an auto scaling group (ASG) with EC2 instances and for certain folders like certain caches, Procache, assets/files and logs they just live on an EFS share symlinked in but all other CMS files/templates are local to each EC2.

The biggest throttle for us historically was trying to server EVERYTHING from EFS as there's a penalty even reading the main ProcessWire files from EFS that results in sluggishness and in extreme load situations the EFS bandwidth can get saturated unless you pay a lot for high throughput which nobody really wants to do when it's only occasionally spiky.

Serving everything LOCALLY on the EC2 instances in the ASG and only certain things like caches/logs that prefer to be shared across all instances via EFS mount is the solution as those files are negligible in terms of speed/performance - basically I found it was about the same as it all being on the local EC2 EBS volume. And for ProCache on EFS since it's static files it seems fine there (though I'm typing this having just seen your other module to resolve ProCache being local across this sort of setup).

I've still not finalised the best sync method but I'm very nearly to a point where the ASG "worker" nodes will automatically update when I run a simple script on the "master" node, with new nodes that come online checking automatically for a .build file containing a simple timestamp to sync if needed for each website - this is because you don't want to deploy a new ASG template for every minor update each day but still need each node on the same modules/templates/core obviously. When I crack the very last piece of this puzzle it will enable it to always be in sync and can just create new ASG templates as needed basically.

 

Posted

Sorry what I'm doing is different to EBS - just realised I'd mis-read your post mentioning EBS so your setup is different but still may be worth mentioning my own findings.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...