Jump to content
Karl_T

Processwire with AWS S3 and s3fs

Recommended Posts

Short question: is it possible that Processwire uses AWS S3 with s3fs as a remote file system(mount to asset folder)? Please advise anything that have to be take care of.

 

Background:

I am currently trying to make Processwire running inside AWS Beanstalk as I want to take advantages of the auto scaling function that my client wanted.

I have found a discussion here: 

 

By reading this and the link inside, I realize that to use the auto scaling I need to configure my web server to be stateless. So, I was looking for a method that can serve the purpose and then I found s3fs.  s3fs is a FUSE filesystem that allows you to mount an Amazon S3 bucket as a local filesystem, quoted from their github. I guess that mounting s3 bucket using s3fs to the asset folder should be the right thing to do.

My site needs many image upload as it is an e-commerce site using padloper while the admin always using the local file system for images. I have thought of using modules like AmazonS3Cloudfront or FieldtypeFileS3. It seems like those modules are not support my use case. s3fs suits better and more simple. One of my concerns is that I am not sure can the URL of the images can be generated by the default API like $image->url correctly. 

Before implementing this, I would like to ask the advise from anyone having implemented this with processwire as I am new to AWS. Is it possible? Any better alternatives?

I think if not implementing auto scaling, it is also good to separate asset away in some cases, like reducing requests. Thank you for your reading. 

Share this post


Link to post
Share on other sites

Never done it myself, but I know it's possible to decouple the image processing and keep the goodies on an API for manipulation, similar to PW's.

You can take a look at Cloudinary feature to upload an image by its URL. Also, there's IMGIX .

  • Like 5

Share this post


Link to post
Share on other sites

For documents such as PDFs, you can use the S3 options you already talked about to store them there, because you'll only need the file URL after all.

Remember to config Processwire to store the session on the database. :)

  • Like 2

Share this post


Link to post
Share on other sites

Thanks for your information. I want to keep everything managed inside PW admin though as what my client need is a simple and native drag and drop image upload. 

9 hours ago, Sérgio said:

Remember to config Processwire to store the session on the database. :)

Yes thanks for your reminder, I am planning to store the session on Redis for better performance. :)

Share this post


Link to post
Share on other sites

ProcessWire's image sizer engines can only work with files via the filesystem, so you'd need to mount S3 into it. Just keep in mind that this would slow down any image processing (download upfront and upload afterwards) and it means that all files need to be go through your mountpoint as files and images are in the same folders. 

Edit: And you should really evaluate to which degree your page does need image processing. If you can avoid it, you could just extend FieldtypeFile, which is way easier to work with than when you need the image sizer to work as well. 

  • Like 1

Share this post


Link to post
Share on other sites

After some time, I have tried using s3fs and another claimed faster replacement called goofys to mount s3 bucket to the asset folder. Processwire can work with s3fs, and has no problem with the image sizer fortunately, with the initial loading time increase from 100-200ms to 300-400ms, which is not quite acceptable. I guess the main reason for this is because the cache file is located inside the asset folder. And then I tried goofys. However, it takes over seconds to load the page. I don't know why it is the case. If anybody have experience with s3fs and goofys please share with me. Hopefully I can share my experience later on this topic.

Edit: With debug mode off, goofys greatly improve the loading speed from 1-5s to 300-400ms

Share this post


Link to post
Share on other sites
1 hour ago, Karl_T said:

I guess the main reason for this is because the cache file is located inside the asset folder

Have you tried linking s3fs to assets/files instead of the whole assets folder, just to see what difference that makes?

Also, I could imagine that it's not only the cache folder that may slow things down at some point but also session and log, both of which could benefit from something with less round trips (and that handles concurrent writes better) than s3fs. Sessions could (and should) thus probably be switched to SessionHandlerDB.

I think goofys is at a disadvantage since it doesn't have local caching. The scenarios where it overtakes s3fs (e.g. creating many files at once) fall short when s3fs can use its cache (that's why the benchmarks on the goofys page only compare uncached operations).

  • Like 1

Share this post


Link to post
Share on other sites

Thanks for your suggestion. It is really helpful. I have my debug mode enabled and only cache the assets/files folder now. The loading speed is 250-350ms. I am happy now! 

I guess the filecompiler in cache folder is the issue. Mounting the cache folder also break the whole processwire system some time showing syntax error with broken file, which is a disaster.

However, it is still a huge(100%) slow down compared to not using such mounting file system, which may result in a much lower concurrent request cap. I want to try the elastic file system, the native cloud shared file system, but it serves limited region right now.

Share this post


Link to post
Share on other sites

I want to try but unfortunately AWS do not provide this service in my region. I think it should be the best option regardless of the cost. :P

I have not try cloud front yet but would like to use if that can speed the site up a lot. The first response time is really not good now, around 300ms while network latency is around 140ms.

Share this post


Link to post
Share on other sites

Yeah, from what I understand, Any number of different ec2 servers could all run the processwire files and serve users with apache or nginx but connect to one file system for assets and one database.  (Aurora and EFS are both fully managed and fault tolerant so ec2 would be stateless.)

100 gigs of EFS is about $30 per month with unlimited requests and bandwith, so no additional charges.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By Lex Sanchez
      Hi everyone:
      I do not know if someone before using ProcessWire with AWS CloudFront, currently I have problems with the login, it does not work for any reason, when I check in the logs generated by ProcessWire, it only indicates This request was aborted because it appears to be forged. (in /wire/core/SessionCSRF.php line 190).
      I have allowed CloudFront to forward all headers, cookies and allow all methods (GET, POST, PUT).
      When I perform the same process from the ip server if it works or from the balancer.
    • By modifiedcontent
      Has anyone successfully installed Processwire on an Amazon EC2 instance/virtual server?
      Which configuration works; Amazon Linux or one of the other flavors?
      How do you get file permissions and the database working?
      Which lines in .htaccess cause problems on Amazon AWS? 
      What are the pitfalls to watch out for?
      Why can't I get it working...?
      The first problem I run into is an error message that the installer doesn't have write access and that I should manually rename the 'site-myprofile' folder to 'site'. Attempt to chmod all the files and folders to 777 don't seem to have any effect on that and some files do get written fine.
      But I keep ending up with inaccessable pages and fatal server errors. I am not asking you to solve my problem. I am curious what other people's experiences are with this. Can it be done or am I wasting my time?
       
    • By fbg13
      FieldtypeFileS3
      https://github.com/f-b-g-m/FieldtypeFileS3
      The module extends the default FieldtypeFile and InputfieldFile modules and adds few extra methods.
      For the most part it behaves just like the default files modules, the biggest difference is how you get the file's url. Instead of using $page->fieldname->eq(0)->url you use $page->fieldname->eq(0)->s3url(). Files are not stored locally, they are deleted when the page is saved, if page saving is ommited the file remains on the local server until the page is saved. Another difference is the file size, the default module get the file size directly from the local file, while here it's stored in the database.
      There is an option to store the files locally, its intented in case one wants to stop using S3 and change back to local storage. What it does is it changes the s3url() method to serve files from local server instead of S3, disables uploading to S3 and disables local file deletion on page save. It does not tranfer files from S3 to local server, that can be done with the aws-cli's through the sync function. Files stored on S3 have the same structure as they would have on the local server.
      -------------------------------------------------------- --------------------------------------------------------
      Been struggling with this for quite a while, but i think i finally managed to make it work/behave the way i wanted.

      All feedback is welcome!
    • By artaylor
      Hi all,
      I am trying to use Amazon S3 to store video files for a client. I am having trouble getting the SDK to work. I am sure it is a stupid error on my part but my head is sore from banging it against my desk and I thought I would finally ask for some help. 
      I am running PW 3.0.11 on NGINX.
      1. Amazon recommends using Composer to install the SDK. I was not sure where in the path to install the SDK so I put composer in the /site folder and installed the SDK there (putting vendor at the same level as modules), then I put the require and uses statements in _init.php. I always got an error saying it could not load the aws or s3 classes from the library.
      2. So, then I tried to use aws.phar. I put that in the /site directory but once again, no matter what I do, it will not load with the following error:
      require(): Failed opening required '/site/aws.phar'
      The file is there with proper permissions and the code for loading is:
      // --- amazon S3 stuff require $config->urls->site . 'aws.phar'; $s3 = new Aws\S3\S3Client([     'version' => 'latest',     'region'  => 'us-standard',   ]); So, here are my questions:
      1. In general, where is the correct place to put a php library? It is not a PW module so I assume it should not go in the modules folder.
      2. Should I use Composer to install the SDK? If so, where do I put the files? Should I add the AWS SDK to the main composer.json file or put it somewhere else?
      3. If I don't use Composer, where do I put the aws.phar file so that PW can load it?
      4. Should I not put the 'require' in the _init.php file and move it to another file (_func.php)?
      I am sure there is a massive face-palm in my future when this gets sorted out.
      Thanks
×
×
  • Create New...