Jump to content
djr

Module: ScheduleCloudBackups (back up site to S3)

Recommended Posts

Hello

I've written a little module that backs up your ProcessWire site to Amazon S3 (might add support for other storage providers later, hence the generic name).

Pete's ScheduleBackups was used as a starting point but has been overhauled somewhat. Still, it's far from perfect at the moment, but I guess you might find it useful.

Essentially, you set up a cron job to load a page every day, and then the script creates a .tar.gz containing all of the site files and a dump of the database, then uploads it to an S3 bucket.

Currently, only linux-based hosts are supported currently (hopefully, most of you).

The module is available on github: https://github.com/DavidJRobertson/ProcessWire-ScheduleCloudBackups

Zip download: https://github.com/DavidJRobertson/ProcessWire-ScheduleCloudBackups/archive/master.zip

Let me know what you think

EDIT: now available on the module directory @ http://modules.processwire.com/modules/schedule-cloud-backups

  • Like 15

Share this post


Link to post
Share on other sites

This looks fantastic djr! I've had a need for something exactly like this and will definitely look forward to using it. I took a quick look through the code and think it looks very well put together. I do have a few minor suggestions:

Rather than backing up the DB to the root path (where it is temporarily web accessible) I'd recommend backing it up to a non web accessible directory, like /site/assets/cache/. Likewise for the tar/gz file. 

Beyond just looking for "runbackup" in the request URI, I recommend designating a page that it will only run on. For instance, if you wanted it to only run on the homepage:

$shouldBackup = $page->path === '/' && 
  (strpos($_SERVER['REQUEST_URI'], self::RUN_BACKUP_PATH) !== FALSE) &&
  $this->wire('input')->get->token &&
  $this->wire('input')->get->token === $this->token;
This might be a good module to experiment with conditional autoloads. In your getModuleInfo, you can do this:
'autoload' => function() {
  return (strpos($_SERVER['REQUEST_URI'], self::RUN_BACKUP_PATH) !== FALSE); 
}
In truth, conditional autoloads are more reliable in PW 2.5 (a few minor issues have been fixed) so this may be a v2 kind of thing as well. In PW 2.5, you can also isolate the entire getModuleInfo() to a separate ModuleName.info.php file. 

Beyond just the token, it might be worthwhile to have an IP address limiter since there's a good chance one's CRON job is always going to be coming from the same IP. Though not sure it's totally necessary. 

In your docs file, I would mention if possible what command you recommend for the CRON job, for instance:

wget quiet no-cache -O - http://www.your-site.com/runbackup?token=abc123 > /dev/null

Lastly, might be good to mention that it requires exec/system access for executing commands (many hosts have these disabled by default, but there's usually a way to enable them). 

Please add to the modules directory when ready! Thanks for putting this together! 

  • Like 5

Share this post


Link to post
Share on other sites

Thanks Ryan.

Re your suggestions: (I've made some changes to the code)

  • The data.sql file is already protected from web access by the default PW .htaccess file, and I've added a .htaccess file in the module directory to prevent access to the backup tarball.
  • I've changed the shouldBackup check to be more specific (behaves the same as your suggestion, but simpler logic).
  • I don't know what the issues around conditional autoloading in PW 2.4 are, so I'll leave that for now (?).
  • I'll put IP whitelisting on the todo list, but I don't think it's essential right now, since it's unlikely anybody would be able to guess the secret token in the URL.
  • The `wget` command for the cron job is displayed on the module config page (prefilled with URL). Would it be better to have the cron job run a PHP file directly rather than going through the web server? Not sure.
  • I've added a little mention of the requirements in the readme. I've also adjusted the install method to check it can run tar and mysqldump.

I'll submit it to the module directory shortly :)

  • Like 7

Share this post


Link to post
Share on other sites
The data.sql file is already protected from web access by the default PW .htaccess file, and I've added a .htaccess file in the module directory to prevent access to the backup tarball.

You are right–I'd forgotten we had that in the htaccess. 

I don't know what the issues around conditional autoloading in PW 2.4 are, so I'll leave that for now (?).

Yes, I'd leave it for now. I just wanted to point them out because I think these will be beneficial for this module once 2.5 is stable. 

I'll put IP whitelisting on the todo list, but I don't think it's essential right now, since it's unlikely anybody would be able to guess the secret token in the URL.

I agree, you don't need it for now. My default is always to double up on security, but thinking through it more it's probably not necessary here. I mention it as a possible future addition though just because the URLs hitting a website aren't always confidential. The token is only as private as the logs. For most of us, that's a non issue. For some it's a potential ddos entry point, but only if the token gets in the wrong hands. I think what you've got is just right for the majority, and if someone needed something more, like an IP limiter, then probably better to leave it to them to add in rather than making everyone else fuss with it. 

The `wget` command for the cron job is displayed on the module config page (prefilled with URL). Would it be better to have the cron job run a PHP file directly rather than going through the web server? Not sure.

Sorry, I missed that wget was already there. There may be some benefits to having the cron job run the PHP File directly, but it would be more difficult for the user to setup (creating executable PHP shell scripts and such). Also, having initialization of the job URL accessible makes it easier for people to use external CRON services. As a result, I think sticking to the method you are using is better. 

Thanks for adding to the modules directory! 

  • Like 2

Share this post


Link to post
Share on other sites

When trying to create a backup from within the CP, I get:

Error: Exception: Failed to create database dump. (in /site/modules/ProcessWire-ScheduleCloudBackups/ScheduleCloudBackups.module line 167)

Is this to do with:

 

tar and mysqldump must be present on your PATH

because I'm not sure I have the ability to do anything about that on the host I'm testing it out on.

Share this post


Link to post
Share on other sites

I'm having the exact same issue as Tyssen. 

Share this post


Link to post
Share on other sites

@tyssen, @jacmaes:

Most likely the server doesn't have the mysqldump utility available.

It's possible to add a pure-PHP fallback (Pete's ScheduleBackups did) but it will probably be considerably slower than real mysqldump. I'll see about adding it soon, but I'm a bit busy today.

Share this post


Link to post
Share on other sites

@tyssen, @jacmaes: released 0.0.2 which has a pure-php fallback for mysqldump and tar. Give it a go :)

  • Like 3

Share this post


Link to post
Share on other sites

Thanks for the update, djr, but now I'm getting this error  :( :

Error: Exception: Failed to create database dump. (in /var/www/.../site/modules/ScheduleCloudBackups/ScheduleCloudBackups.module line 81)

Share this post


Link to post
Share on other sites

Oh. That tells me it's using the native mysqldump (not the php implementation), but it's still failing.

Perhaps the file permissions don't allow creating a new file (data.sql) in the root of your site? I should probably add a check for that. 

Share this post


Link to post
Share on other sites

The root folder of my site has permissions of 755, if that's what you're referring to. 

Share this post


Link to post
Share on other sites

Hi djr

Great plugin and thanks as I think backups are really important.

I was trying to set up your module and I'm getting an error in the admin section. It installed okay and I was filling out the Amazon information (the admin page worked fine when the information was wrong) and once I got the Amazon information right the admin page started failing with the error:

Fatal error: Call to a member function format() on boolean in /var/www/vhosts/62/500562/webspace/httpdocs/site/modules/ScheduleCloudBackups/ScheduleCloudBackups.module on line 418

the relevant lines in the module are:

                foreach ($objects as $object) {
                    $ts = date_create_from_format(self::TIMESTAMP_FORMAT, basename($object['Key'], '.tar.gz'));
                    $date = $ts->format('Y-m-d H:i:s');


and it's the last line that is line 418.

Any ideas?

 

Thanks very much

Rob

Share this post


Link to post
Share on other sites

Hi djr

I solved this myself - I had put a folder in the bucket called the same as the website name (ie website.ie). When I checked the count of $objects in that section of code there was one $objects and the basename($object['Key'], '.tar.gz') resulted in "website.ie" which resulted in $ts === false (ie the boolean) so line 418 couldn't have worked.

I don't know if this is a huge coincidence but an is_bool() check on $ts and if === FALSE followed by some error handling would solve this for any future users.

 

Thanks for the great plugin.

Rob

Share this post


Link to post
Share on other sites

Hi David

 

Another problem - when I try to run the backup I get a 404- page not found. The url I am using from the admin page is similar to:

 

http://www.websitename.ie/runbackup?token=XXXX29924700591dd18a9633d17c8ea34c0b2

(changed to protect the highly secret information of the client's website!!!)

I'm using PW version 2.7.2

[I've tried to completely reinstall the plugin but it doidn't make any difference]

Thanks for the help

Rob

 

 

Share this post


Link to post
Share on other sites

I'm getting the same thing. There's an issue on Github with the same problem from June 2015.

So it seems this project is now dead. Is that the case? And if so, are there any other alternatives?

Share this post


Link to post
Share on other sites
9 hours ago, Tyssen said:

And if so, are there any other alternatives?

Something like this?

Soon to be released, I hope :) 

  • Like 2

Share this post


Link to post
Share on other sites

Any update on this great module? Can't install on PW > 3  :(
I get this error: Cannot declare class ComposerAutoloaderInit700022e1c519b28dbab39fa2456e3e43, because the name is already in use (line 5 of /home/nginx/domains/public/site/assets/cache/FileCompiler/vendor/composer/autoload.php)
 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By Gadgetto
      SnipWire - Snipcart integration for ProcessWire
      Snipcart is a powerful 3rd party, developer-first HTML/JavaScript shopping cart platform. SnipWire is the missing link between Snipcart and the content management framework ProcessWire.
      With SnipWire, you can quickly turn any ProcessWire site into a Snipcart online shop. The SnipWire plugin helps you to get your store up and running in no time. Detailed knowledge of the Snipcart system is not required.
      SnipWire is free and open source licensed under Mozilla Public License 2.0! A lot of work and effort has gone into development. It would be nice if you could donate an amount to support further development:

      Status update links (inside this thread) for SnipWire development
      2020-04-06 -- SnipWire 0.8.6 (beta) released! Adds support for Snipcart subscriptions and also fixes some problems 2020-03-21 -- SnipWire 0.8.5 (beta) released! Improves SnipWires webhooks interface and provides some other fixes and additions 2020-03-03 -- SnipWire 0.8.4 (beta) released! Improves compatibility for Windows based Systems. 2020-03-01 -- SnipWire 0.8.3 (beta) released! The installation and uninstallation process has been heavily revised. 2020-02-08 -- SnipWire 0.8.2 (beta) released! Added a feature to change the cart and catalogue currency by GET, POST or SESSION param 2020-02-03 -- SnipWire 0.8.1 (beta) released! All custom classes moved into their own namespaces. 2020-02-01 -- SnipWire is now available via ProcessWire's module directory! 2020-01-30 -- SnipWire 0.8.0 (beta) first public release! (module just submitted to the PW modules directory) 2020-01-28 -- added Custom Order Fields feature (first SnipWire release version is near!) 2020-01-21 -- Snipcart v3 - when will the new cart system be implemented? 2020-01-19 -- integrated taxes provider finished (+ very flexible shipping taxes handling) 2020-01-14 -- new date range picker, discount editor, order notifiactions, order statuses, and more ... 2019-11-15 -- orders filter, order details, download + resend invoices, refunds 2019-10-18 -- list filters, REST API improvements, new docs platform, and more ... 2019-08-08 -- dashboard interface, currency selector, managing Orders, Customers and Products, Added a WireTabs, refinded caching behavior 2019-06-15 -- taxes provider, shop templates update, multiCURL implementation, and more ... 2019-06-02 -- FieldtypeSnipWireTaxSelector 2019-05-25 -- SnipWire will be free and open source Plugin Key Features
      Fast and simple store setup Full integration of the Snipcart dashboard into the ProcessWire backend (no need to leave the ProcessWire admin area) Browse and manage orders, customers, discounts, abandoned carts, and more Multi currency support Custom order and cart fields Process refunds and send customer notifications from within the ProcessWire backend Process Abandoned Carts + sending messages to customers from within the ProcessWire backend Complete Snipcart webhooks integration (all events are hookable via ProcessWire hooks) Integrated taxes provider (which is more flexible then Snipcart own provider) Useful Links
      SnipWire in PW modules directory SnipWire Docs (please note that the documentation is a work in progress) SnipWire @GitHub (feature requests and suggestions for improvement are welcome - I also accept pull requests) Snipcart Website  
      ---- INITIAL POST FROM 2019-05-25 ----
       
    • By bernhard
      #######################
      Please use the new RockFinder2
      #######################
      WHY?
      This module was built to fill the gap between simple $pages->find() operations and complex SQL queries.
      The problem with $pages->find() is that it loads all pages into memory and that can be a problem when querying multiple thousands of pages. Even $pages->findMany() loads all pages into memory and therefore is a lot slower than regular SQL.
      The problem with SQL on the other hand is, that the queries are quite complex to build. All fields are separate tables, some repeatable fields use multiple rows for their content that belong to only one single page, you always need to check for the page status (which is not necessary on regular find() operations and therefore nobody is used to that).
      In short: It is far too much work to efficiently and easily get an array of data based on PW pages and fields and I need that a lot for my RockGrid module to build all kinds of tabular data.

      Basic Usage

       
      Docs & Download
      https://modules.processwire.com/modules/rock-finder/
      https://github.com/BernhardBaumrock/RockFinder
       
      Changelog
      180817, v1.0.6, support for joining multiple finders 180810, v1.0.5, basic support for options fields 180528, v1.0.4, add custom select statement option 180516, change sql query method, bump version to 1.0.0 180515, multilang bugfix 180513, beta release <180513, preview/discussion took place here: https://processwire.com/talk/topic/18983-rocksqlfinder-highly-efficient-and-flexible-sql-finder-module/
    • By MoritzLost
      TrelloWire
      This is a module that allows you to automatically create Trello cards for ProcessWire pages and update them when the pages are updated. This allows you to setup connected workflows. Card properties and change handling behaviour can be customized through the extensive module configuration. Every action the module performs is hookable, so you can modify when and how cards are created as much as you need to. The module also contains an API-component that makes it easy to make requests to the Trello API and build your own connected ProcessWire-Trello workflows.
      Features
      All the things the module can do for you without any custom code: Create a new card on Trello whenever a page is added or published (you can select applicable templates). Configure the target board, target list, name and description for new cards. Add default labels and checklists to the card. Update the card whenever the page is updated (optional). When the status of the card changes (published / unpublished, hidden / unhidden, trashed / restored or deleted), move the card to a different list or archive or delete it (configurable). You can extend this through hooks in many ways: Modifiy when and how cards are created. Modify the card properties (Target board & list, title, description, et c.) before they are sent to Trello. Create your own workflows by utilizing an API helper class with many convenient utility methods to access the Trello API directly. Feedback & Future Plans
      Let me know what you think! In particular:
      If you find any bugs report them here or on Github, I'll try to fix them. This module was born out of a use-case for a client project where we manage new form submissions through Trello. I'm not sure how many use-cases there are for this module. If you do use it, tell me about it! The Trello API is pretty extensive, I'll try to add some more helper methods to the TrelloWireApi class (let me know if you need anything in particular). I'll think about how the module can support different workflows that include Twig – talk to me if you have a use-case! Next steps could be a dashboard to manage pages that are connected to a Trello card, or a new section in the settings tab to manage the Trello connection. But it depends on whether there is any interest in this 🙂 Links
      Repository on Github Complete module documentation (getting started, configuration & API documentation) [Module directory pending approval] Module configuration

    • By MoritzLost
      Process Cache Control
      This module provides a simple solution to clearing all your cache layers at once, and an extensible interface to perform various cache-related actions.
      The simple motivation behind this module was that I was tired of manually clearing caches in several places after deploying a change on a live site. The basic purpose of this module is a simple Clear all caches link in the Setup menu which clears out all caches, no matter where they hide. You can customize what exactly the module does through it's configuration menu:
      Expire or delete all cache entries in the database, or selectively clear caches by namespace ($cache API) Clear the the template render cache. Clear out specific folders inside your site's cache directory (/site/assets/cache) Clear the ProCache page render cache (if your site is using ProCache) Refresh version strings for static assets to bust client-side browser caches (this requires some setup, see the full documentation for details). This is the basic function of the module. However, you can also add different cache management action through the API and execute them through the module's interface. For this advanced usage, the module provides:
      An interface to see all available cache actions and execute them. A system log and logging output on the module page to see verify what the module is doing. A CacheControlTools class with utility functions to clear out different caches. An API to add cache actions, execute them programmatically and even modify the default action. Permission management, allowing you granular control over which user roles can execute which actions. The complete documentation can be found in the module's README.
      Plans for improvements
      If there is some interest in this, I plan to expand this to a more general cache management solution. I particular, I would like to add additional cache actions. Some ideas that came to mind:
      Warming up the template render cache for publicly accessible pages. Removing all active user sessions. Let me know if you have more suggestions!
      Links
      https://github.com/MoritzLost/ProcessCacheControl ProcessCacheControl in the Module directory CHANGELOG in the repository Screenshots


    • By Macrura
      PrevNextTabs Module
      Github: https://github.com/outflux3/PrevNextTabs
      Processwire helper modules for adding page navigation within the editor.
      Overview
      This is a very simple module that adds Previous and Next links inline with the tabs on the page editor. Hovering over the tab shows the title of the previous or next page (using the admin's built in jqueryUI tooltips.)
      Usage
      This module is typically used during development where you or your editors need to traverse through pages for the purpose of proofing, flagging and/or commenting. Rather than returning to the page tree or lister, they can navigate with these links.
      Warnings
      If you are using PW version 2.6.1 or later, the system will prevent you from leaving the page if you have unsaved edits.
      For earlier versions, to avoid accidentally losing changes made to a page that might occur if a user accidentally clicks on one of these, make sure to have the Form Save Reminder module installed.
      http://modules.processwire.com/modules/prev-next-tabs/
×
×
  • Create New...