Jump to content

Current major server outage - advice please!


AAD Web Team
 Share

Recommended Posts

Hello, Brodie from the Australian Antarctic Division here. We've got some kind of suspected cache issue casuing runaway disk usage, it's currently so bad that our websites are down (unless logged in) and I cannot SSH into the servers. A restart of both the database and hosting server have not helped either.

Pages load fine when logged in (since the cache is skipped) or when disabling the cache.

Below are some of various error messages we've started seeing, I suspect due to running out of disk space:

	Unable to write lock file: /site/assets/cache/LazyCronLock.cache
Error: Exception: Unable to copy: /site/assets/files/26153/keon-anzac.jpg => /site/assets/files/26151/keon-anzac-7.jpg (in wire/core/Pagefile.php line 236)
unlink: Unable to unlink file: /site/assets/cache/Page/45511/4df366e0700b7c24883b744b6cb250ee+https.cache

Has anyone had a similar issue before and knows something we could try to resolve it?

Link to comment
Share on other sites

There are a couple of things you could try in the PW admin area.

The first option would be to find a template that has the 'Clear cache for entire site' setting (or enable it). Find a page that uses it, and Save it.

image.png.9ad77ec457edc039d0de68dc5f0d5693.png

The next option, if that doesn't work, is to find a page (or pages) with large files added to them. Download a copy of the original files, and then delete them from the page. (You can re-add them later once the issue is resolved).

This might give you enough free space to gain access via SSH to clear the cache, or to install the ProcessCacheControl module which lets you clear it from within PW.

Whatever you do, though, try to be quick, to avoid it filling back up again before you have time to intervene.

  • Like 1
Link to comment
Share on other sites

Hi, it seem a server issue, but for now we cant see what is the root source of the disk being filled.

Do get an access to ssh again, which is the most important thing for now, you could use à logged user, and then in the admin, delete all logs file, then SSH asap. Once in the server, check /var/log and remove some old *.gz or the bigger to get more space, then investigate.

Try to make a backup or an image of the server if you can before doing root cmd.
 

 

  • Like 2
Link to comment
Share on other sites

Thanks @Craigand @flydevwe have now resolved the issue. Clearing the cache was good for temporarily unlocking space in blocks to get through.

Turns out the issue was a recent code change where the Pageimages class was used for handling a group of Pageimage's from across many pages (where we'd previously use an array) - we had not realised at the time that this leads to each image getting instantiated again (and therefore duplicated)!

  • Like 3
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...