Jump to content
djr

Module: ScheduleCloudBackups (back up site to S3)

Recommended Posts

Hello

I've written a little module that backs up your ProcessWire site to Amazon S3 (might add support for other storage providers later, hence the generic name).

Pete's ScheduleBackups was used as a starting point but has been overhauled somewhat. Still, it's far from perfect at the moment, but I guess you might find it useful.

Essentially, you set up a cron job to load a page every day, and then the script creates a .tar.gz containing all of the site files and a dump of the database, then uploads it to an S3 bucket.

Currently, only linux-based hosts are supported currently (hopefully, most of you).

The module is available on github: https://github.com/DavidJRobertson/ProcessWire-ScheduleCloudBackups

Zip download: https://github.com/DavidJRobertson/ProcessWire-ScheduleCloudBackups/archive/master.zip

Let me know what you think

EDIT: now available on the module directory @ http://modules.processwire.com/modules/schedule-cloud-backups

  • Like 15

Share this post


Link to post
Share on other sites

This looks fantastic djr! I've had a need for something exactly like this and will definitely look forward to using it. I took a quick look through the code and think it looks very well put together. I do have a few minor suggestions:

Rather than backing up the DB to the root path (where it is temporarily web accessible) I'd recommend backing it up to a non web accessible directory, like /site/assets/cache/. Likewise for the tar/gz file. 

Beyond just looking for "runbackup" in the request URI, I recommend designating a page that it will only run on. For instance, if you wanted it to only run on the homepage:

$shouldBackup = $page->path === '/' && 
  (strpos($_SERVER['REQUEST_URI'], self::RUN_BACKUP_PATH) !== FALSE) &&
  $this->wire('input')->get->token &&
  $this->wire('input')->get->token === $this->token;
This might be a good module to experiment with conditional autoloads. In your getModuleInfo, you can do this:
'autoload' => function() {
  return (strpos($_SERVER['REQUEST_URI'], self::RUN_BACKUP_PATH) !== FALSE); 
}
In truth, conditional autoloads are more reliable in PW 2.5 (a few minor issues have been fixed) so this may be a v2 kind of thing as well. In PW 2.5, you can also isolate the entire getModuleInfo() to a separate ModuleName.info.php file. 

Beyond just the token, it might be worthwhile to have an IP address limiter since there's a good chance one's CRON job is always going to be coming from the same IP. Though not sure it's totally necessary. 

In your docs file, I would mention if possible what command you recommend for the CRON job, for instance:

wget quiet no-cache -O - http://www.your-site.com/runbackup?token=abc123 > /dev/null

Lastly, might be good to mention that it requires exec/system access for executing commands (many hosts have these disabled by default, but there's usually a way to enable them). 

Please add to the modules directory when ready! Thanks for putting this together! 

  • Like 5

Share this post


Link to post
Share on other sites

Thanks Ryan.

Re your suggestions: (I've made some changes to the code)

  • The data.sql file is already protected from web access by the default PW .htaccess file, and I've added a .htaccess file in the module directory to prevent access to the backup tarball.
  • I've changed the shouldBackup check to be more specific (behaves the same as your suggestion, but simpler logic).
  • I don't know what the issues around conditional autoloading in PW 2.4 are, so I'll leave that for now (?).
  • I'll put IP whitelisting on the todo list, but I don't think it's essential right now, since it's unlikely anybody would be able to guess the secret token in the URL.
  • The `wget` command for the cron job is displayed on the module config page (prefilled with URL). Would it be better to have the cron job run a PHP file directly rather than going through the web server? Not sure.
  • I've added a little mention of the requirements in the readme. I've also adjusted the install method to check it can run tar and mysqldump.

I'll submit it to the module directory shortly :)

  • Like 7

Share this post


Link to post
Share on other sites
The data.sql file is already protected from web access by the default PW .htaccess file, and I've added a .htaccess file in the module directory to prevent access to the backup tarball.

You are right–I'd forgotten we had that in the htaccess. 

I don't know what the issues around conditional autoloading in PW 2.4 are, so I'll leave that for now (?).

Yes, I'd leave it for now. I just wanted to point them out because I think these will be beneficial for this module once 2.5 is stable. 

I'll put IP whitelisting on the todo list, but I don't think it's essential right now, since it's unlikely anybody would be able to guess the secret token in the URL.

I agree, you don't need it for now. My default is always to double up on security, but thinking through it more it's probably not necessary here. I mention it as a possible future addition though just because the URLs hitting a website aren't always confidential. The token is only as private as the logs. For most of us, that's a non issue. For some it's a potential ddos entry point, but only if the token gets in the wrong hands. I think what you've got is just right for the majority, and if someone needed something more, like an IP limiter, then probably better to leave it to them to add in rather than making everyone else fuss with it. 

The `wget` command for the cron job is displayed on the module config page (prefilled with URL). Would it be better to have the cron job run a PHP file directly rather than going through the web server? Not sure.

Sorry, I missed that wget was already there. There may be some benefits to having the cron job run the PHP File directly, but it would be more difficult for the user to setup (creating executable PHP shell scripts and such). Also, having initialization of the job URL accessible makes it easier for people to use external CRON services. As a result, I think sticking to the method you are using is better. 

Thanks for adding to the modules directory! 

  • Like 2

Share this post


Link to post
Share on other sites

When trying to create a backup from within the CP, I get:

Error: Exception: Failed to create database dump. (in /site/modules/ProcessWire-ScheduleCloudBackups/ScheduleCloudBackups.module line 167)

Is this to do with:

 

tar and mysqldump must be present on your PATH

because I'm not sure I have the ability to do anything about that on the host I'm testing it out on.

Share this post


Link to post
Share on other sites

I'm having the exact same issue as Tyssen. 

Share this post


Link to post
Share on other sites

@tyssen, @jacmaes:

Most likely the server doesn't have the mysqldump utility available.

It's possible to add a pure-PHP fallback (Pete's ScheduleBackups did) but it will probably be considerably slower than real mysqldump. I'll see about adding it soon, but I'm a bit busy today.

Share this post


Link to post
Share on other sites

@tyssen, @jacmaes: released 0.0.2 which has a pure-php fallback for mysqldump and tar. Give it a go :)

  • Like 3

Share this post


Link to post
Share on other sites

Thanks for the update, djr, but now I'm getting this error  :( :

Error: Exception: Failed to create database dump. (in /var/www/.../site/modules/ScheduleCloudBackups/ScheduleCloudBackups.module line 81)

Share this post


Link to post
Share on other sites

Oh. That tells me it's using the native mysqldump (not the php implementation), but it's still failing.

Perhaps the file permissions don't allow creating a new file (data.sql) in the root of your site? I should probably add a check for that. 

Share this post


Link to post
Share on other sites

The root folder of my site has permissions of 755, if that's what you're referring to. 

Share this post


Link to post
Share on other sites

Hi djr

Great plugin and thanks as I think backups are really important.

I was trying to set up your module and I'm getting an error in the admin section. It installed okay and I was filling out the Amazon information (the admin page worked fine when the information was wrong) and once I got the Amazon information right the admin page started failing with the error:

Fatal error: Call to a member function format() on boolean in /var/www/vhosts/62/500562/webspace/httpdocs/site/modules/ScheduleCloudBackups/ScheduleCloudBackups.module on line 418

the relevant lines in the module are:

                foreach ($objects as $object) {
                    $ts = date_create_from_format(self::TIMESTAMP_FORMAT, basename($object['Key'], '.tar.gz'));
                    $date = $ts->format('Y-m-d H:i:s');


and it's the last line that is line 418.

Any ideas?

 

Thanks very much

Rob

Share this post


Link to post
Share on other sites

Hi djr

I solved this myself - I had put a folder in the bucket called the same as the website name (ie website.ie). When I checked the count of $objects in that section of code there was one $objects and the basename($object['Key'], '.tar.gz') resulted in "website.ie" which resulted in $ts === false (ie the boolean) so line 418 couldn't have worked.

I don't know if this is a huge coincidence but an is_bool() check on $ts and if === FALSE followed by some error handling would solve this for any future users.

 

Thanks for the great plugin.

Rob

Share this post


Link to post
Share on other sites

Hi David

 

Another problem - when I try to run the backup I get a 404- page not found. The url I am using from the admin page is similar to:

 

http://www.websitename.ie/runbackup?token=XXXX29924700591dd18a9633d17c8ea34c0b2

(changed to protect the highly secret information of the client's website!!!)

I'm using PW version 2.7.2

[I've tried to completely reinstall the plugin but it doidn't make any difference]

Thanks for the help

Rob

 

 

Share this post


Link to post
Share on other sites

I'm getting the same thing. There's an issue on Github with the same problem from June 2015.

So it seems this project is now dead. Is that the case? And if so, are there any other alternatives?

Share this post


Link to post
Share on other sites
9 hours ago, Tyssen said:

And if so, are there any other alternatives?

Something like this?

Soon to be released, I hope :) 

  • Like 2

Share this post


Link to post
Share on other sites

Any update on this great module? Can't install on PW > 3  :(
I get this error: Cannot declare class ComposerAutoloaderInit700022e1c519b28dbab39fa2456e3e43, because the name is already in use (line 5 of /home/nginx/domains/public/site/assets/cache/FileCompiler/vendor/composer/autoload.php)
 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By joshua
      This module is (yet another) way for implementing a cookie management solution.
      Of course there are several other possibilities:
      - https://processwire.com/talk/topic/22920-klaro-cookie-consent-manager/
      - https://github.com/webmanufaktur/CookieManagementBanner
      - https://github.com/johannesdachsel/cookiemonster
      - https://www.oiljs.org/
      - ... and so on ...
      In this module you can configure which kind of cookie categories you want to manage:

      You can also enable the support for respecting the Do-Not-Track (DNT) header to don't annoy users, who already decided for all their browsing experience.
      Currently there are four possible cookie groups:
      - Necessary (always enabled)
      - Statistics
      - Marketing
      - External Media
      All groups can be renamed, so feel free to use other cookie group names. I just haven't found a way to implement a "repeater like" field as configurable module field ...
      When you want to load specific scripts ( like Google Analytics, Google Maps, ...) only after the user's content to this specific category of cookies, just use the following script syntax:
      <script type="optin" data-type="text/javascript" data-category="statistics" data-src="/path/to/your/statistic/script.js"></script> <script type="optin" data-type="text/javascript" data-category="marketing" data-src="/path/to/your/mareketing/script.js"></script> <script type="optin" data-type="text/javascript" data-category="external_media" data-src="/path/to/your/external-media/script.js"></script> <script type="optin" data-type="text/javascript" data-category="marketing">console.log("Inline scripts are also working!");</script> The type has to be "optin" to get recognized by PrivacyWire, the data-attributes are giving hints, how the script shall be loaded, if the data-category is within the cookie consents of the user. These scripts are loaded asynchronously after the user made the decision.
      If you want to give the users the possibility to change their consent, you can use the following Textformatter:
      [[privacywire-choose-cookies]] It's planned to add also other Textformatters to opt-out of specific cookie groups or delete the whole consent cookie.
      You can also add a custom link to output the banner again with a link / button with following class:
      <a href="#" class="privacywire-show-options">Show Cookie Options</a> <button class="privacywire-show-options">Show Cookie Options</button> This module is still in development, but we already use it on several production websites.
      You find it here: PrivacyWire Git Repo
      Download as .zip
      I would love to hear your feedback 🙂
      CHANGELOG
      0.0.5 Multi-language support included completely (also in TextFormatter). Added possibility to async load other assets (e.g. <img type="optin" data-category="marketing" data-src="https://via.placeholder.com/300x300">) 0.0.4 Added possibility to add an imprint link to the banner 0.0.3 Multi-language support for module config (still in development) 0.0.2 First release 0.0.1 Early development
    • By MoritzLost
      This is a new module that provides a simple solution to clearing all your cache layers at once, and an extensible interface to perform various cache-related actions.
      The simple motivation behind this module was that I was tired of manually clearing caches in several places after deploying a change on a live site. The basic purpose of this module is a simple Clear all caches link in the Setup menu which clears out all caches, no matter where they hide. You can customize what exactly the module does through it's configuration menu:
      Expire or delete all cache entries in the database, or selectively clear caches by namespace ($cache API) Clear the the template render cache. Clear out specific folders inside your site's cache directory (/site/assets/cache) Refresh version strings for static assets to bust client-side browser caches (this requires some setup, see the full documentation for details). This is the basic function of the module. However, you can also add different cache management action through the API and execute them through the module's interface. For this advanced usage, the module provides:
      An interface to see all available cache actions and execute them. A system log and logging output on the module page to see verify what the module is doing. A CacheControlTools class with utility functions to clear out different caches. An API to add cache actions, execute them programmatically and even modify the default action. Permission management, allowing you granular control over which user roles can execute which actions. The complete documentation can be found in the module's README.
      Beta release
      Note that I consider this a Beta release. Since the module is relatively aggressive in deleting some caches, I would advise you to install in on a test environment before using it on a live site.
      Let me know if you're getting any errors, have trouble using the module or if you have suggestions for improvement!
      In particular, can someone let me know if this module causes any problems with the ProCache module? I don't own or use it, so I can't check. As far as I can tell, ProCache uses a folder inside the cache directory to cache static pages, so my module should be able to clear the ProCache site cache as well, I'd appreciate it if someone can test that for me.
      Future plans
      If there is some interest in this, I plan to expand this to a more general cache management solution. I particular, I would like to add additional cache actions. Some ideas that came to mind:
      Warming up the template render cache for publicly accessible pages. Removing all active user sessions. Let me know if you have more suggestions!
      Links
      https://github.com/MoritzLost/ProcessCacheControl ProcessCacheControl in the Module directory

    • By David Karich
      Admin Page Tree Multiple Sorting
      ClassName: ProcessPageListMultipleSorting
      Extend the ordinary sort of children of a template in the admin page tree with multiple properties. For each template, you can define your own rule. Write each template (template-name) in a row, followed by a colon and then the additional field names for sorting.
      Example: All children of the template "blog" to be sorted in descending order according to the date of creation, then descending by modification date, and then by title. Type:
      blog: -created, -modified, title  Installation
      Copy the files for this module to /site/modules/ProcessPageListMultipleSorting/ In admin: Modules > Check for new modules. Install Module "Admin Page Tree Multible Sorting". Alternative in ProcessWire 2.4+
      Login to ProcessWire backend and go to Modules Click tab "New" and enter Module Class Name: "ProcessPageListMultipleSorting" Click "Download and Install"   Compatibility   I have currently tested the module only under PW 2.6+, but think that it works on older versions too. Maybe someone can give a feedback.     Download   PW-Repo: http://modules.processwire.com/modules/process-page-list-multiple-sorting/ GitHub: https://github.com/FlipZoomMedia/Processwire-ProcessPageListMultipleSorting     I hope someone can use the module. Have fun and best regards, David
    • By dimitrios
      Hello,
      this module can publish content of a Processwire page on a Facebook page, triggered by saving the Processwire page.
      To set it up, configure the module with a Facebook app ID, secret and a Page ID. Following is additional configuration on Facebook for developers:
      Minimum Required Facebook App configuration:
      on Settings -> Basics, provide the App Domains, provide the Site URL, on Settings -> Advanced, set the API version (has been tested up to v3.3), add Product: Facebook Login, on Facebook Login -> Settings, set Client OAuth Login: Yes, set Web OAuth Login: Yes, set Enforce HTTPS: Yes, add "https://www.example.com/processwire/page/" to field Valid OAuth Redirect URIs. This module is configurable as follows:
      Templates: posts can take place only for pages with the defined templates. On/Off switch: specify a checkbox field that will not allow the post if checked. Specify a message and/or an image for the post.
      Usage
      edit the desired PW page and save; it will post right after the initial Facebook log in and permission granting. After that, an access token is kept.
       
      Download
      PW module directory: http://modules.processwire.com/modules/auto-fb-post/ Github: https://github.com/kastrind/AutoFbPost   Note: Facebook SDK for PHP is utilized.


×
×
  • Create New...