Jump to content

Module: ScheduleBackups


Pete
 Share

Recommended Posts

Hey all

Since the topic of backups comes along every so often I decided to write a module to encapsulate some code I use for backups on Linux installs.

It's not quite ready yet as I want a fallback option for Windows as well as an option for Linux that will work on shared hosting where you usually cant run system commands, but it is 80% complete for stage 1 and Linux backups work nicely.

The general idea is that you set the number of days to store backups (I standardised it to 1 day, 3 days, 1 week, 2 weeks, 1 month, 3 months, 6 months and 1 year rather than having it as an integer field because I think these fit the most common scenarios and I wanted to have a dropdown in my module config too ;) It defaults to 1 week, but depending on the size of the site and how much space you have you might want to increase or decrease the retention period.

The idea is that you are given a URL with a unique hash (generated at install) which you then pass off to a cron job or Windows Scheduler and this generates the backups.

It will be expanded on once I've got the backups working across different environments but the initial plan is to release a version that simply backs up the site and database, then a version that has a page where you can download backups from as well as an option to FTP/sync them to another server.

I don't want to tackle restores though as this would be difficult - you are logged into the admin whilst running the restore so I think the first thing it would do when it has restored the database is log you out, plus I don't want to make assumptions about replacing a user's /site/ folder so I think restores require some manual intervention to be honest. An alternative would be to do this anyway but rename your /site/ folder and take another database copy before restoring, but I'm getting into the realms of trying to be too clever then and anticipate what people are trying to do and ProcessWire is all about not making assumptions ;)

Fortunately I have access to a site with several gigs if uploaded files as well as a reasonably large database so I should be able to monitor how well it handles that on Linux and Windows, but smaller sites shouldn't take more than a minute to backup, often a matter of seconds.

I shall keep you posted.

  • Like 10
Link to comment
Share on other sites

I like the sound of this.

One of the most important aspects is to make the interface as non-tech as possible; with smaller sites/companies, they will quite probably be responsible for their own backups, so this should be kept very simple. Go to a page, hit the backup button.

Yes, I think restores are a potential issue - depending on why the site has to be restored, the archive may need to be scanned for malware or other problems first. (I have twice now had a site hacked and found the recent backup had issues - just bad timing, normally).

I think on a manual back up version (or even a cron-driven one) an email to all admins saying that the site has been backed up and go and grab the download is always a good idea. And where a manual back up has been triggered, being re-directed to a download page once it has finished (assuming you are still on the backup page) would also be good. Maybe a little modal window appearing on the admin, whichever page you are on.

The best version is always going to be to back up to a remote location - but some small users may simply not have that remote location to back up to, so making the local backup with lots of nags to download it is nice.

Good one, Pete!

Link to comment
Share on other sites

Keep it coming with the suggestions - there's a way to go yet before I'll add some of them but they're all welcome and will be considered once the basics are in place.

Step 1 is definitely to see if I can get backups working in multiple environments, so attached is version 0.0.1. Consider it Alpha and use it at your own risk, but I can't see why it would harm your installation. Just giving you the obligatory "you have been warned" speech :)

To install:

  1. Unzip the file, stick the folder in your /modules directory, install the module, set the retention period and read the instructions below the retention drop-down in the module config to manually run a backup for now
  2. Backups are stored in /site/modules/ScheduleBackups/backups in /site and /db folders respectively so you can monitor those
  3. There is currently no "backup succeeded" message or anything like that printed when it's done - just wait for the browser to stop loading the page for now, or for the zip/tar file to stop growing ;)

Some things to note:

  1. It does an OS check. This is because Windows can't run system() commands as they're a Linux thing, and if it finds you're running Windows it uses the ZipArchive class built into PHP to backup the site and simple mysqli queries to backup the database
  2. If it doesn't detect you're on Windows and can run system() commands then it does that which, from my past experience, is far quicker (plus it makes for nifty one-liner backup commands :)). It does some detection to see if safe mode is on and whether it can run system commands before backing up, so if safe mode is on or it can't run those commands then it falls back to using the functions mentioned in point 1.
  3. During installation, a unique hash is created and saved - this is because when we set it to run via a cron job/scheduled task we need a way of creating a backup without a logged-in superuser in attendance. The backup uses a URL that I doubt you would have as a page on your site (/runbackup) as well as this hash so it is extremely unlikely anyone will try and bring your server down by spamming that URL. Further checks will be added in later versions so it can't run more than once a day anyway or something like that.

Also, if anyone who is a better programmer than me wants to contribute then please feel free. Some of what I've written is likely amateurish in places.

Kudos to David Walsh for his excellent mySQL backup script and this excellent function on StackOverflow about recursively zipping folders that saved me from re-inventing the wheel in those two areas.

ScheduleBackups.zip

  • Like 4
Link to comment
Share on other sites

Just as a thought, is it worth, even at this very early stage, having a field for putting in an optional path for the backup?

Doesn't feel right putting the backup in the very directory that you might want to rescue if something goes wrong....

So, nice to put it above the web root, where possible.

Link to comment
Share on other sites

The problem there is that paths are inconsistent across hosting environments, but certainly if it's user-definable then it's people's own fault if it then doesn't work :D

I put it under the module folder itself for now because it's not web accessible either as the .htaccess already forbids accessing the modules dir directly but I do see your point so I can easily add that as an option.

Link to comment
Share on other sites

Yeah, I was going the "if you shove the wrong path in this field don't blame us if you suddenly make your entire private website available as a download to some idiot hacker with a Guy Fawkes mask" route.

PS: Make sure the directory actually exists.

Link to comment
Share on other sites

@Pete: have tested it on Windows with the zip-extension, in main it works very smooth!  ^-^

 
here are what I have explored:

  • A Typo in Line 271: you check for Backupfolder to exclude itself  strpos($file, 'ScheduleBackups\backups'
    but in folder site->modules there is no 'ScheduleBackups', only 'backups' for now.
     
  • when starting a backup with a zip-name that already exists, - the file will be used and updated, this is by design / behave of the zip-extension (it was first time I used it). Could be useful could be not, just wanted to note it.
     
  • the (memory) bottleneck is here: $zip->addFromString(basename($source), file_get_contents($source));
    because reading a hole file into memory, passes it over to zip-function where it gets compressed can lead in memory consumption of 3-4 times the filesize!
    When testing with 64MB available memory for php, I was not able to add a file bigger than 14MB. It craches the Apache for a second!
    Maybe there are functions with the zip-lib that provide passing files by chunks to the archive. I havn't checked at php.net.
    Maybe one could try to increase memory usage with ini_set().
     
  • Also I do not know how it behaves with memory usage when using system calls on Unix. Probably there may be a limit too. You can use php-function memory_get_usage() compared against ini_get('memory_limit') to be up to date of available memory resources  :)   I use little helper-class for that,
     
  • For now there is no output to screen when do a manual backup. If you want provide it in a simple text output for every file or only directory passed to archive you can disable output caching with these directives:

        if(function_exists('apache_setenv')) @apache_setenv('no-gzip', '1');
        @ini_set('zlib.output_compression', 'Off');
        @ini_set('output_buffering ', '0');
        @ini_set('implicit_flush', '1');
        @ob_implicit_flush(true);
        @ob_end_flush();

        echo 'some info'; @ob_flush();

     
  • These ones may be useful too:
        set_time_limit( 0 );
        ignore_user_abort( true );

     

All other is perfect to me. As I've said above: runs very smooth! This is a must have!

  • Like 3
Link to comment
Share on other sites

I think there are a lot of things for me to consider with memory usage and I know on one website of mine it might struggle as there are a few uploads that are over 100mb.

In these cases though, there should at least be enough memory to have uploaded that file in the first place so I might be able to do something like when it gets to backing up the /assets/files dir that it checks the max size of any file fields in PW first to get an idea of how big files might be, then iterate through them X files at a time depending on that and what the PHP environment will allow, flushing anything in memory as it goes. Problem with something like that is it makes the process slower, but on the bright side it is a good opportunity to be able to feed data back to the browser to show some sort of progress (processing 1-10 of 256 pages or something like that).

Some folders I actually need to make it skip are the /assets/sessions and /assets/logs as either of those could have numerous/large files that aren't necessary to backup.

I get the feeling the system command for Linux actually won't have a memory problem simply because it's like running it at the command line in a Shell window (sorry Linux folk, I'm sure my terminology is all over the place ;)). The obvious problem there is that the actual page could well time out, but the command will keep running in the background so you would have a hard job knowing if it's ready if it was run manually.

I think I can assume that aside from the site/assets/files directory everything else can be backed-up in a matter of seconds in terms of files. Even with 100 modules installed, they're all small files. Therefore if I have it give feedback once it's backed up the /wire directory as a whole as that should be a standard size more or less, then the /site directories one at a time and we can work it like that. It will actually give me a headache as I need to run more commands for Linux, but I know you can get it to pipe the successful results to a script even then, so I think for both Linux and Windows if it sends progress back to a database table specifically for backups we can easily poll that table every few seconds using AJAX and show which backups are in progress and which are complete as you get when running a backup via cPanel.

Tables are another one where I will have to think about number of rows as well I guess to make sure it's not trying to do too much at once, so maybe iterating through each table to do one at a time, checking the number of rows and then splitting them if required would be the way to go there.

It's all getting rather more complicated than I had originally intended the more I think about it, but I can hopefully make it better as a result. What I do know is that examples of code from the internet are really helping prevent me from re-inventing the wheel - hurrah for the Open Source community :)

  • Like 2
Link to comment
Share on other sites

Pete: why try to keep page alive when you could run it on background? Just show some message on page refresh, like when process was started if it's not finished yet. Maybe actions to restart backup if it seems to be unable to finish.

Link to comment
Share on other sites

I'm on my way out the door, so I'll make this quick:

It would be nice to have the option to rename the backup.zip, or is it automated timestamp?

Is this configurable to zip from the root up? or only user files and modules?

[i'll make sure to come back and read the entire post later today just incase this was already covered.]

Link to comment
Share on other sites

Great idea and module Pete! 

Regarding the backup directory, ProcessWire only prevents direct access to .php, .inc and .module files. As a result, it's still possible for some files to be accessible. But this is easy to get around. Just make your backups directory start with a period, like "/site/assets/.backups/" and then the directory will be completely blocked from web access. 

  • Like 1
Link to comment
Share on other sites

  • 7 months later...

I've installed this module on sites before and not had any trouble, but just went to add it to a site on a server running PHP 5.2.17 and it brought the whole site down, so I quickly removed it again and the site came back up.

Link to comment
Share on other sites

  • 10 months later...

Given the latest update on the dev branch (database backup functionality) I don't think it will be long before someone comes up with a nice interface for backups, but it won't be by me.

I'm sure something will appear soon though.

Link to comment
Share on other sites

  • 2 months later...

Hmm... you know you should probably have your memory tested when you don't remember writing a module like this one (was about to do it again) despite having posted in this topic a few short months ago.

Definitely losing the plot!

Link to comment
Share on other sites

  • 1 year later...

I tried to install this on a 2.7 site and got this:

Fatal error: Uncaught Error: Access to undeclared static property: ScheduleBackups::$fM in /home/user/www/site/modules/ScheduleBackups/ScheduleBackups.module:105 Stack trace: #0 /home/user/www/wire/core/Modules.php(2691): ScheduleBackups::getModuleConfigInputfields(Array) #1 /home/user/www/wire/core/Wire.php(398): Modules->___getModuleConfigInputfields('ScheduleBackups') #2 /home/user/www/wire/core/Wire.php(333): Wire->runHooks('getModuleConfig...', Array) #3 /home/user/www/wire/modules/Process/ProcessModule/ProcessModule.module(1144): Wire->__call('getModuleConfig...', Array) #4 /home/user/www/wire/modules/Process/ProcessModule/ProcessModule.module(1074): ProcessModule->renderEdit('ScheduleBackups', Array) #5 /home/user/www/wire/core/Wire.php(398): ProcessModule->___executeEdit() #6 /home/user/www/wire/core/Wire.php(333): Wire->runHooks('executeEdit', Array) #7 /home/user/www/wire/core/ProcessController.php(236): Wire->__call('executeEdit', Array) #8 /home/user/www/wire/core/Wire.php(398): P in /home/user/www/site/modules/ScheduleBackups/ScheduleBackups.module on line 105

Error: Uncaught Error: Access to undeclared static property: ScheduleBackups::$fM in /home/user/www/site/modules/ScheduleBackups/ScheduleBackups.module:105
Stack trace:
#0 /home/user/www/wire/core/Modules.php(2691): ScheduleBackups::getModuleConfigInputfields(Array)
#1 /home/user/www/wire/core/Wire.php(398): Modules->___getModuleConfigInputfields('ScheduleBackups')
#2 /home/user/www/wire/core/Wire.php(333): Wire->runHooks('getModuleConfig...', Array)
#3 /home/user/www/wire/modules/Process/ProcessModule/ProcessModule.module(1144): Wire->__call('getModuleConfig...', Array)
#4 /home/user/www/wire/modules/Process/ProcessModule/ProcessModule.module(1074): ProcessModule->renderEdit('ScheduleBackups', Array)
#5 /home/user/www/wire/core/Wire.php(398): ProcessModule->___executeEdit()
#6 /home/user/www/wire/core/Wire.php(333): Wire->runHooks('executeEdit', Array)
#7 /home/user/www/wire/core/ProcessController.php(236): Wire->__call('executeEdit', Array)
#8 /home/user/www/wire/core/Wire.php(398): P (line 105 of /home/user/www/site/modules/ScheduleBackups/ScheduleBackups.module)

It seems no other backup solution offers the feature to zip up the files. Well, I guess I could use the solution from Stackoverflow.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...