djr Posted July 16, 2014 Posted July 16, 2014 Hello I've written a little module that backs up your ProcessWire site to Amazon S3 (might add support for other storage providers later, hence the generic name). Pete's ScheduleBackups was used as a starting point but has been overhauled somewhat. Still, it's far from perfect at the moment, but I guess you might find it useful. Essentially, you set up a cron job to load a page every day, and then the script creates a .tar.gz containing all of the site files and a dump of the database, then uploads it to an S3 bucket. Currently, only linux-based hosts are supported currently (hopefully, most of you). The module is available on github: https://github.com/DavidJRobertson/ProcessWire-ScheduleCloudBackups Zip download: https://github.com/DavidJRobertson/ProcessWire-ScheduleCloudBackups/archive/master.zip Let me know what you think EDIT: now available on the module directory @ http://modules.processwire.com/modules/schedule-cloud-backups 15
ryan Posted July 16, 2014 Posted July 16, 2014 This looks fantastic djr! I've had a need for something exactly like this and will definitely look forward to using it. I took a quick look through the code and think it looks very well put together. I do have a few minor suggestions: Rather than backing up the DB to the root path (where it is temporarily web accessible) I'd recommend backing it up to a non web accessible directory, like /site/assets/cache/. Likewise for the tar/gz file. Beyond just looking for "runbackup" in the request URI, I recommend designating a page that it will only run on. For instance, if you wanted it to only run on the homepage: $shouldBackup = $page->path === '/' && (strpos($_SERVER['REQUEST_URI'], self::RUN_BACKUP_PATH) !== FALSE) && $this->wire('input')->get->token && $this->wire('input')->get->token === $this->token; This might be a good module to experiment with conditional autoloads. In your getModuleInfo, you can do this: 'autoload' => function() { return (strpos($_SERVER['REQUEST_URI'], self::RUN_BACKUP_PATH) !== FALSE); } In truth, conditional autoloads are more reliable in PW 2.5 (a few minor issues have been fixed) so this may be a v2 kind of thing as well. In PW 2.5, you can also isolate the entire getModuleInfo() to a separate ModuleName.info.php file. Beyond just the token, it might be worthwhile to have an IP address limiter since there's a good chance one's CRON job is always going to be coming from the same IP. Though not sure it's totally necessary. In your docs file, I would mention if possible what command you recommend for the CRON job, for instance: wget quiet no-cache -O - http://www.your-site.com/runbackup?token=abc123 > /dev/null Lastly, might be good to mention that it requires exec/system access for executing commands (many hosts have these disabled by default, but there's usually a way to enable them). Please add to the modules directory when ready! Thanks for putting this together! 5
djr Posted July 17, 2014 Author Posted July 17, 2014 Thanks Ryan. Re your suggestions: (I've made some changes to the code) The data.sql file is already protected from web access by the default PW .htaccess file, and I've added a .htaccess file in the module directory to prevent access to the backup tarball. I've changed the shouldBackup check to be more specific (behaves the same as your suggestion, but simpler logic). I don't know what the issues around conditional autoloading in PW 2.4 are, so I'll leave that for now (?). I'll put IP whitelisting on the todo list, but I don't think it's essential right now, since it's unlikely anybody would be able to guess the secret token in the URL. The `wget` command for the cron job is displayed on the module config page (prefilled with URL). Would it be better to have the cron job run a PHP file directly rather than going through the web server? Not sure. I've added a little mention of the requirements in the readme. I've also adjusted the install method to check it can run tar and mysqldump. I'll submit it to the module directory shortly 7
jacmaes Posted July 17, 2014 Posted July 17, 2014 Quick question: does your module enable backup to Amazon Glacier, which is cheaper than S3? http://aws.amazon.com/glacier/ 1
ryan Posted July 17, 2014 Posted July 17, 2014 The data.sql file is already protected from web access by the default PW .htaccess file, and I've added a .htaccess file in the module directory to prevent access to the backup tarball. You are right–I'd forgotten we had that in the htaccess. I don't know what the issues around conditional autoloading in PW 2.4 are, so I'll leave that for now (?). Yes, I'd leave it for now. I just wanted to point them out because I think these will be beneficial for this module once 2.5 is stable. I'll put IP whitelisting on the todo list, but I don't think it's essential right now, since it's unlikely anybody would be able to guess the secret token in the URL. I agree, you don't need it for now. My default is always to double up on security, but thinking through it more it's probably not necessary here. I mention it as a possible future addition though just because the URLs hitting a website aren't always confidential. The token is only as private as the logs. For most of us, that's a non issue. For some it's a potential ddos entry point, but only if the token gets in the wrong hands. I think what you've got is just right for the majority, and if someone needed something more, like an IP limiter, then probably better to leave it to them to add in rather than making everyone else fuss with it. The `wget` command for the cron job is displayed on the module config page (prefilled with URL). Would it be better to have the cron job run a PHP file directly rather than going through the web server? Not sure. Sorry, I missed that wget was already there. There may be some benefits to having the cron job run the PHP File directly, but it would be more difficult for the user to setup (creating executable PHP shell scripts and such). Also, having initialization of the job URL accessible makes it easier for people to use external CRON services. As a result, I think sticking to the method you are using is better. Thanks for adding to the modules directory! 2
djr Posted July 17, 2014 Author Posted July 17, 2014 Quick question: does your module enable backup to Amazon Glacier, which is cheaper than S3? http://aws.amazon.com/glacier/ At the moment it doesn't have any knowledge of Glacier, but you should be able to use S3's Object Lifecycle Management system to automatically transfer backups from S3 to Glacier.
djr Posted July 17, 2014 Author Posted July 17, 2014 Now available on the module directory: http://modules.processwire.com/modules/schedule-cloud-backups 1
Tyssen Posted July 21, 2014 Posted July 21, 2014 When trying to create a backup from within the CP, I get: Error: Exception: Failed to create database dump. (in /site/modules/ProcessWire-ScheduleCloudBackups/ScheduleCloudBackups.module line 167) Is this to do with: tar and mysqldump must be present on your PATH because I'm not sure I have the ability to do anything about that on the host I'm testing it out on.
djr Posted July 23, 2014 Author Posted July 23, 2014 @tyssen, @jacmaes: Most likely the server doesn't have the mysqldump utility available. It's possible to add a pure-PHP fallback (Pete's ScheduleBackups did) but it will probably be considerably slower than real mysqldump. I'll see about adding it soon, but I'm a bit busy today.
djr Posted July 24, 2014 Author Posted July 24, 2014 @tyssen, @jacmaes: released 0.0.2 which has a pure-php fallback for mysqldump and tar. Give it a go 3
jacmaes Posted July 24, 2014 Posted July 24, 2014 Thanks for the update, djr, but now I'm getting this error : Error: Exception: Failed to create database dump. (in /var/www/.../site/modules/ScheduleCloudBackups/ScheduleCloudBackups.module line 81)
djr Posted July 24, 2014 Author Posted July 24, 2014 Oh. That tells me it's using the native mysqldump (not the php implementation), but it's still failing. Perhaps the file permissions don't allow creating a new file (data.sql) in the root of your site? I should probably add a check for that.
jacmaes Posted July 28, 2014 Posted July 28, 2014 The root folder of my site has permissions of 755, if that's what you're referring to.
adrian Posted August 20, 2014 Posted August 20, 2014 @djr - in case you haven't seen it yet, Ryan has added a WireDatabaseBackup class to the core: https://github.com/ryancramerdesign/ProcessWire/commit/52c09f5ef1980b7cad3876e8332254f3792e795d Maybe you can make use of this directly in your module once PW 2.5 is released. 3
RDC Posted June 29, 2016 Posted June 29, 2016 Hi djr Great plugin and thanks as I think backups are really important. I was trying to set up your module and I'm getting an error in the admin section. It installed okay and I was filling out the Amazon information (the admin page worked fine when the information was wrong) and once I got the Amazon information right the admin page started failing with the error: Fatal error: Call to a member function format() on boolean in /var/www/vhosts/62/500562/webspace/httpdocs/site/modules/ScheduleCloudBackups/ScheduleCloudBackups.module on line 418 the relevant lines in the module are: foreach ($objects as $object) { $ts = date_create_from_format(self::TIMESTAMP_FORMAT, basename($object['Key'], '.tar.gz')); $date = $ts->format('Y-m-d H:i:s'); and it's the last line that is line 418. Any ideas? Thanks very much Rob
RDC Posted June 29, 2016 Posted June 29, 2016 Hi djr I solved this myself - I had put a folder in the bucket called the same as the website name (ie website.ie). When I checked the count of $objects in that section of code there was one $objects and the basename($object['Key'], '.tar.gz') resulted in "website.ie" which resulted in $ts === false (ie the boolean) so line 418 couldn't have worked. I don't know if this is a huge coincidence but an is_bool() check on $ts and if === FALSE followed by some error handling would solve this for any future users. Thanks for the great plugin. Rob
RDC Posted June 29, 2016 Posted June 29, 2016 Hi David Another problem - when I try to run the backup I get a 404- page not found. The url I am using from the admin page is similar to: http://www.websitename.ie/runbackup?token=XXXX29924700591dd18a9633d17c8ea34c0b2 (changed to protect the highly secret information of the client's website!!!) I'm using PW version 2.7.2 [I've tried to completely reinstall the plugin but it doidn't make any difference] Thanks for the help Rob
Tyssen Posted February 1, 2017 Posted February 1, 2017 I'm getting the same thing. There's an issue on Github with the same problem from June 2015. So it seems this project is now dead. Is that the case? And if so, are there any other alternatives?
szabesz Posted February 1, 2017 Posted February 1, 2017 9 hours ago, Tyssen said: And if so, are there any other alternatives? Something like this? Soon to be released, I hope 2
Frank Vèssia Posted March 24, 2018 Posted March 24, 2018 Any update on this great module? Can't install on PW > 3 I get this error: Cannot declare class ComposerAutoloaderInit700022e1c519b28dbab39fa2456e3e43, because the name is already in use (line 5 of /home/nginx/domains/public/site/assets/cache/FileCompiler/vendor/composer/autoload.php)
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now