Jump to content
djr

Module: ScheduleCloudBackups (back up site to S3)

Recommended Posts

Hello

I've written a little module that backs up your ProcessWire site to Amazon S3 (might add support for other storage providers later, hence the generic name).

Pete's ScheduleBackups was used as a starting point but has been overhauled somewhat. Still, it's far from perfect at the moment, but I guess you might find it useful.

Essentially, you set up a cron job to load a page every day, and then the script creates a .tar.gz containing all of the site files and a dump of the database, then uploads it to an S3 bucket.

Currently, only linux-based hosts are supported currently (hopefully, most of you).

The module is available on github: https://github.com/DavidJRobertson/ProcessWire-ScheduleCloudBackups

Zip download: https://github.com/DavidJRobertson/ProcessWire-ScheduleCloudBackups/archive/master.zip

Let me know what you think

EDIT: now available on the module directory @ http://modules.processwire.com/modules/schedule-cloud-backups

  • Like 15

Share this post


Link to post
Share on other sites

This looks fantastic djr! I've had a need for something exactly like this and will definitely look forward to using it. I took a quick look through the code and think it looks very well put together. I do have a few minor suggestions:

Rather than backing up the DB to the root path (where it is temporarily web accessible) I'd recommend backing it up to a non web accessible directory, like /site/assets/cache/. Likewise for the tar/gz file. 

Beyond just looking for "runbackup" in the request URI, I recommend designating a page that it will only run on. For instance, if you wanted it to only run on the homepage:

$shouldBackup = $page->path === '/' && 
  (strpos($_SERVER['REQUEST_URI'], self::RUN_BACKUP_PATH) !== FALSE) &&
  $this->wire('input')->get->token &&
  $this->wire('input')->get->token === $this->token;
This might be a good module to experiment with conditional autoloads. In your getModuleInfo, you can do this:
'autoload' => function() {
  return (strpos($_SERVER['REQUEST_URI'], self::RUN_BACKUP_PATH) !== FALSE); 
}
In truth, conditional autoloads are more reliable in PW 2.5 (a few minor issues have been fixed) so this may be a v2 kind of thing as well. In PW 2.5, you can also isolate the entire getModuleInfo() to a separate ModuleName.info.php file. 

Beyond just the token, it might be worthwhile to have an IP address limiter since there's a good chance one's CRON job is always going to be coming from the same IP. Though not sure it's totally necessary. 

In your docs file, I would mention if possible what command you recommend for the CRON job, for instance:

wget quiet no-cache -O - http://www.your-site.com/runbackup?token=abc123 > /dev/null

Lastly, might be good to mention that it requires exec/system access for executing commands (many hosts have these disabled by default, but there's usually a way to enable them). 

Please add to the modules directory when ready! Thanks for putting this together! 

  • Like 5

Share this post


Link to post
Share on other sites

Thanks Ryan.

Re your suggestions: (I've made some changes to the code)

  • The data.sql file is already protected from web access by the default PW .htaccess file, and I've added a .htaccess file in the module directory to prevent access to the backup tarball.
  • I've changed the shouldBackup check to be more specific (behaves the same as your suggestion, but simpler logic).
  • I don't know what the issues around conditional autoloading in PW 2.4 are, so I'll leave that for now (?).
  • I'll put IP whitelisting on the todo list, but I don't think it's essential right now, since it's unlikely anybody would be able to guess the secret token in the URL.
  • The `wget` command for the cron job is displayed on the module config page (prefilled with URL). Would it be better to have the cron job run a PHP file directly rather than going through the web server? Not sure.
  • I've added a little mention of the requirements in the readme. I've also adjusted the install method to check it can run tar and mysqldump.

I'll submit it to the module directory shortly :)

  • Like 7

Share this post


Link to post
Share on other sites
The data.sql file is already protected from web access by the default PW .htaccess file, and I've added a .htaccess file in the module directory to prevent access to the backup tarball.

You are right–I'd forgotten we had that in the htaccess. 

I don't know what the issues around conditional autoloading in PW 2.4 are, so I'll leave that for now (?).

Yes, I'd leave it for now. I just wanted to point them out because I think these will be beneficial for this module once 2.5 is stable. 

I'll put IP whitelisting on the todo list, but I don't think it's essential right now, since it's unlikely anybody would be able to guess the secret token in the URL.

I agree, you don't need it for now. My default is always to double up on security, but thinking through it more it's probably not necessary here. I mention it as a possible future addition though just because the URLs hitting a website aren't always confidential. The token is only as private as the logs. For most of us, that's a non issue. For some it's a potential ddos entry point, but only if the token gets in the wrong hands. I think what you've got is just right for the majority, and if someone needed something more, like an IP limiter, then probably better to leave it to them to add in rather than making everyone else fuss with it. 

The `wget` command for the cron job is displayed on the module config page (prefilled with URL). Would it be better to have the cron job run a PHP file directly rather than going through the web server? Not sure.

Sorry, I missed that wget was already there. There may be some benefits to having the cron job run the PHP File directly, but it would be more difficult for the user to setup (creating executable PHP shell scripts and such). Also, having initialization of the job URL accessible makes it easier for people to use external CRON services. As a result, I think sticking to the method you are using is better. 

Thanks for adding to the modules directory! 

  • Like 2

Share this post


Link to post
Share on other sites

When trying to create a backup from within the CP, I get:

Error: Exception: Failed to create database dump. (in /site/modules/ProcessWire-ScheduleCloudBackups/ScheduleCloudBackups.module line 167)

Is this to do with:

 

tar and mysqldump must be present on your PATH

because I'm not sure I have the ability to do anything about that on the host I'm testing it out on.

Share this post


Link to post
Share on other sites

I'm having the exact same issue as Tyssen. 

Share this post


Link to post
Share on other sites

@tyssen, @jacmaes:

Most likely the server doesn't have the mysqldump utility available.

It's possible to add a pure-PHP fallback (Pete's ScheduleBackups did) but it will probably be considerably slower than real mysqldump. I'll see about adding it soon, but I'm a bit busy today.

Share this post


Link to post
Share on other sites

@tyssen, @jacmaes: released 0.0.2 which has a pure-php fallback for mysqldump and tar. Give it a go :)

  • Like 3

Share this post


Link to post
Share on other sites

Thanks for the update, djr, but now I'm getting this error  :( :

Error: Exception: Failed to create database dump. (in /var/www/.../site/modules/ScheduleCloudBackups/ScheduleCloudBackups.module line 81)

Share this post


Link to post
Share on other sites

Oh. That tells me it's using the native mysqldump (not the php implementation), but it's still failing.

Perhaps the file permissions don't allow creating a new file (data.sql) in the root of your site? I should probably add a check for that. 

Share this post


Link to post
Share on other sites

The root folder of my site has permissions of 755, if that's what you're referring to. 

Share this post


Link to post
Share on other sites

Hi djr

Great plugin and thanks as I think backups are really important.

I was trying to set up your module and I'm getting an error in the admin section. It installed okay and I was filling out the Amazon information (the admin page worked fine when the information was wrong) and once I got the Amazon information right the admin page started failing with the error:

Fatal error: Call to a member function format() on boolean in /var/www/vhosts/62/500562/webspace/httpdocs/site/modules/ScheduleCloudBackups/ScheduleCloudBackups.module on line 418

the relevant lines in the module are:

                foreach ($objects as $object) {
                    $ts = date_create_from_format(self::TIMESTAMP_FORMAT, basename($object['Key'], '.tar.gz'));
                    $date = $ts->format('Y-m-d H:i:s');


and it's the last line that is line 418.

Any ideas?

 

Thanks very much

Rob

Share this post


Link to post
Share on other sites

Hi djr

I solved this myself - I had put a folder in the bucket called the same as the website name (ie website.ie). When I checked the count of $objects in that section of code there was one $objects and the basename($object['Key'], '.tar.gz') resulted in "website.ie" which resulted in $ts === false (ie the boolean) so line 418 couldn't have worked.

I don't know if this is a huge coincidence but an is_bool() check on $ts and if === FALSE followed by some error handling would solve this for any future users.

 

Thanks for the great plugin.

Rob

Share this post


Link to post
Share on other sites

Hi David

 

Another problem - when I try to run the backup I get a 404- page not found. The url I am using from the admin page is similar to:

 

http://www.websitename.ie/runbackup?token=XXXX29924700591dd18a9633d17c8ea34c0b2

(changed to protect the highly secret information of the client's website!!!)

I'm using PW version 2.7.2

[I've tried to completely reinstall the plugin but it doidn't make any difference]

Thanks for the help

Rob

 

 

Share this post


Link to post
Share on other sites

I'm getting the same thing. There's an issue on Github with the same problem from June 2015.

So it seems this project is now dead. Is that the case? And if so, are there any other alternatives?

Share this post


Link to post
Share on other sites
9 hours ago, Tyssen said:

And if so, are there any other alternatives?

Something like this?

Soon to be released, I hope :) 

  • Like 2

Share this post


Link to post
Share on other sites

Any update on this great module? Can't install on PW > 3  :(
I get this error: Cannot declare class ComposerAutoloaderInit700022e1c519b28dbab39fa2456e3e43, because the name is already in use (line 5 of /home/nginx/domains/public/site/assets/cache/FileCompiler/vendor/composer/autoload.php)
 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By David Karich
      ProcessWire InputfieldRepeaterMatrixDuplicate
      Thanks to the great ProModule "RepeaterMatrix" I have the possibility to create complex repeater items. With it I have created a quite powerful page builder. Many different content modules, with many more possible design options. The RepeaterMatrix module supports the cloning of items, but only within the same page. Now I often have the case that very design-intensive pages and items are created. If you want to use a content module on a different page (e.g. in the same design), you have to rebuild each item manually every time.
      This module extends the commercial ProModule "RepeaterMatrix" by the function to duplicate repeater items from one page to another page. The condition is that the target field is the same matrix field from which the item is duplicated. This module is currently understood as proof of concept. There are a few limitations that need to be considered. The intention of the module is that this functionality is integrated into the core of RepeaterMatrix and does not require an extra module.
      Check out the screencast
      What the module can do
      Duplicate multible repeater items from one page to another No matter how complex the item is Full support for file and image fields Multilingual support Support of Min and Max settings Live synchronization of clipboard between multiple browser tabs. Copy an item and simply switch the browser tab to the target page and you will immediately see the past button Support of multiple RepeaterMatrix fields on one page Configurable which roles and fields are excluded Configurable dialogs for copy and paste Duplicated items are automatically pasted to the end of the target field and set to hidden status so that changes are not directly published Automatic clipboard update when other items are picked Automatically removes old clipboard data if it is not pasted within 6 hours Delete clipboard itself by clicking the selected item again Benefit: unbelievably fast workflow and content replication What the module can't do
      Before an item can be duplicated in its current version, the source page must be saved. This means that if you make changes to an item and copy this, the old saved state will be duplicated Dynamic loading is currently not possible. Means no AJAX. When pasting, the target page is saved completely No support for nested repeater items. Currently only first level items can be duplicated. Means a repeater field in a repeater field cannot be duplicated. Workaround: simply duplicate the parent item Dynamic reloading and adding of repeater items cannot be registered. Several interfaces and events from the core are missing. The initialization occurs only once after the page load event Changelog
      2.0.0
      Feature: Copy multiple items at once! The fundament for copying multiple items was created by @Autofahrn - THX! Feature: Optionally you can disable the copy and/or paste dialog Bug fix: A fix suggestion when additional and normal repeater fields are present was contributed by @joshua - THX! 1.0.4
      Bug fix: Various bug fixes and improvements in live synchronization Bug fix: Items are no longer inserted when the normal save button is clicked. Only when the past button is explicitly clicked Feature: Support of multiple repeater fields in one page Feature: Support of repeater Min/Max settings Feature: Configurable roles and fields Enhancement: Improved clipboard management Enhancement: Documentation improvement Enhancement: Corrected few typos #1 1.0.3
      Feature: Live synchronization Enhancement: Load the module only in the backend Enhancement: Documentation improvement 1.0.2
      Bug fix: Various bug fixes and improvements in JS functions Enhancement: Documentation improvement Enhancement: Corrected few typos 1.0.1
      Bug fix: Various bug fixes and improvements in the duplication process 1.0.0
      Initial release Support this module
      If this module is useful for you, I am very thankful for your small donation: Donate 5,- Euro (via PayPal – or an amount of your choice. Thank you!)
      Download this module (Version 2.0.0)
      > Github: https://github.com/FlipZoomMedia/InputfieldRepeaterMatrixDuplicate
      > PW module directory: https://modules.processwire.com/modules/inputfield-repeater-matrix-duplicate/
      > Old stable version (1.0.4): https://github.com/FlipZoomMedia/InputfieldRepeaterMatrixDuplicate/releases/tag/1.0.4
    • By Robin S
      A new module that hasn't had a lot of testing yet. Please do your own testing before deploying on any production website.
      Custom Paths
      Allows any page to have a custom path/URL.
      Note: Custom Paths is incompatible with the core LanguageSupportPageNames module. I have no experience working with LanguageSupportPageNames or multi-language sites in general so I'm not in a position to work out if a fix is possible. If anyone with multi-language experience can contribute a fix it would be much appreciated!
      Screenshot

      Usage
      The module creates a field named custom_path on install. Add the custom_path field to the template of any page you want to set a custom path for. Whatever path is entered into this field determines the path and URL of the page ($page->path and $page->url). Page numbers and URL segments are supported if these are enabled for the template, and previous custom paths are managed by PagePathHistory if that module is installed.
      The custom_path field appears on the Settings tab in Page Edit by default but there is an option in the module configuration to disable this if you want to position the field among the other template fields.
      If the custom_path field is populated for a page it should be a path that is relative to the site root and that starts with a forward slash. The module prevents the same custom path being set for more than one page.
      The custom_path value takes precedence over any ProcessWire path. You can even override the Home page by setting a custom path of "/" for a page.
      It is highly recommended to set access controls on the custom_path field so that only privileged roles can edit it: superuser-only is recommended.
      It is up to the user to set and maintain suitable custom paths for any pages where the module is in use. Make sure your custom paths are compatible with ProcessWire's $config and .htaccess settings, and if you are basing the custom path on the names of parent pages you will probably want to have a strategy for updating custom paths if parent pages are renamed or moved.
      Example hooks to Pages::saveReady
      You might want to use a Pages::saveReady hook to automatically set the custom path for some pages. Below are a couple of examples.
      1. In this example the start of the custom path is fixed but the end of the path will update dynamically according to the name of the page:
      $pages->addHookAfter('saveReady', function(HookEvent $event) { $page = $event->arguments(0); if($page->template == 'my_template') { $page->custom_path = "/some-custom/path-segments/$page->name/"; } }); 2. The Custom Paths module adds a new Page::realPath method/property that can be used to get the "real" ProcessWire path to a page that might have a custom path set. In this example the custom path for news items is derived from the real ProcessWire path but a parent named "news-items" is removed:
      $pages->addHookAfter('saveReady', function(HookEvent $event) { $page = $event->arguments(0); if($page->template == 'news_item') { $page->custom_path = str_replace('/news-items/', '/', $page->realPath); } }); Caveats
      The custom paths will be used automatically for links created in CKEditor fields, but if you have the "link abstraction" option enabled for CKEditor fields (Details > Markup/HTML (Content Type) > HTML Options) then you will see notices from MarkupQA warning you that it is unable to resolve the links.
      Installation
      Install the Custom Paths module.
      Uninstallation
      The custom_path field is not automatically deleted when the module is uninstalled. You can delete it manually if the field is no longer needed.
       
      https://github.com/Toutouwai/CustomPaths
      https://modules.processwire.com/modules/custom-paths/
    • By teppo
      Hey folks!
      I'm happy to finally introduce a project I've been working on for quite a while now: it's called Wireframe, and it is an output framework for ProcessWire.
      Note that I'm posting this in the module development area, maily because this project is still in rather early stage. I've built a couple of sites with it myself, and parts of the codebase have been powering some pretty big and complex sites for many years now, but this should still be considered a soft launch 🙂
      --
      Long story short, Wireframe is a module that provides the "backbone" for building sites (and apps) with ProcessWire using an MVC (or perhaps MVVM... one of those three or four letter acronyms anyway) inspired methodology. You could say that it's an output strategy, but I prefer the term "output framework", since in my mind the word "strategy" means something less tangible. A way of doing things, rather than a tool that actually does things.
      Wireframe (the module) provides a basic implementation for some familiar MVC concepts, such as Controllers and a View layer – the latter of which consists of layouts, partials, and template-specific views. There's no "model" layer, since in this context ProcessWire is the model. As a module Wireframe is actually quite simple – not even nearly the biggest one I've built – but there's still quite a bit of stuff to "get", so I've put together a demo & documentation site for it at https://wireframe-framework.com/.
      In addition to the core module, I'm also working on a couple of site profiles based on it. My current idea is actually to keep the module very light-weight, and implement most of the "opinionated" stuff in site profiles and/or companion modules. For an example MarkupMenu (which I released a while ago) was developed as one of those "companion modules" when I needed a menu module to use on the site profiles.
      Currently there are two public site profiles based on Wireframe:
      site-wireframe-docs is the demo&docs site mentioned above, just with placeholder content replaced with placeholder content. It's not a particularly complex site, but I believe it's still a pretty nice way to dig into the Wireframe module. site-wireframe-boilerplate is a boilerplate (or starter) site profile based on the docs site. This is still very much a work in progress, but essentially I'm trying to build a flexible yet full-featured starter profile you can just grab and start building upon. There will be a proper build process for resources, it will include most of the basic features one tends to need from site to site, etc. --
      Requirements and getting started:
      Wireframe can be installed just like any ProcessWire module. Just clone or download it to your site/modules/ directory and install. It doesn't, though, do a whole lot of stuff on itself – please check out the documentation site for a step-by-step guide on setting up the directory structure, adding the "bootstrap file", etc. You may find it easier to install one of the site profiles mentioned above, but note that this process involves the use of Composer. In the case of the site profiles you can install ProcessWire as usual and download or clone the site profile directory into your setup, but after that you should run "composer install" to get all the dependencies – including the Wireframe module – in place. Hard requirements for Wireframe are ProcessWire 3.0.112 and PHP 7.1+. The codebase is authored with current PHP versions in mind, and while running it on 7.0 may be possible, anything below that definitely won't work. A feature I added just today to the Wireframe module is that in case ProcessWire has write access to your site/templates/ directory, you can use the module settings screen to create the expected directories automatically. Currently that's all, and the module won't – for an example – create Controllers or layouts for you, so you should check out the site profiles for examples on these. (I'm probably going to include some additional helper features in the near future.)
      --
      This project is loosely based on an earlier project called pw-mvc, i.e. the main concepts (such as Controllers and the View layer) are very similar. That being said, Wireframe is a major upgrade in terms of both functionality and architecture: namespaces and autoloader support are now baked in, the codebase requires PHP 7, Controllers are classes extending \Wireframe\Controller (instead of regular "flat" PHP files), implementation based on a module instead of a collection of drop-in files, etc.
      While Wireframe is indeed still in a relatively early stage (0.3.0 was launched today, in case version numbers matter) for the most part I'm happy with the way it works, and likely won't change it too drastically anytime soon – so feel free to give it a try, and if you do, please let me know how it went. I will continue building upon this project, and I am also constantly working on various side projects, such as the site profiles and a few unannounced helper modules.
      I should probably add that while Wireframe is not hard to use, it is more geared towards those interested in "software development" type methodology. With future updates to the module, the site profiles, and the docs I hope to lower the learning curve, but certain level of "developer focus" will remain. Although of course the optimal outcome would be if I could use this project to lure more folks towards that end of the spectrum... 🙂
      --
      Please let me know what you think – and thanks in advance!
    • By tcnet
      PageViewStatistic for ProcessWire is a module to log page visits of the CMS. The records including some basic information like IP-address, browser, operating system, requested page and originate page. Please note that this module doesn't claim to be the best or most accurate.
      Advantages
      One of the biggest advantage is that this module doesn't require any external service like Google Analytics or similar. You don't have to modify your templates either. There is also no JavaScript or image required.
      Disadvantages
      There is only one disadvantage. This module doesn't record visits if the browser loads the page from its browser cache. To prevent the browser from loading the page from its cache, add the following meta tags to the header of your page:
      <meta http-equiv="Cache-Control" content="no-cache, no-store, must-revalidate" /> <meta http-equiv="Pragma" content="no-cache" /> <meta http-equiv="Expires" content="0" /> How to use
      The records can be accessed via the Setup-menu of the CMS backend. The first dropdown control changes the view mode. There are 4 different view modes.
      View mode "Day" shows all visits of the selected day individually with IP-address, browser, operating system, requested page and originate page. Click the update button to see new added records. View mode "Month" shows the total of all visitors per day from the first to the last day of the selected month. View mode "Year" shows the total of all visitors per month from the first to the last month of the selected year. View mode "Total" shows the total of all visitors per year for all recorded years. Please note that multiple visits from the same IP address within the selected period are counted as a single visitor.
      Settings
      You can access the module settings by clicking the Configuration button at the bottom of the records page. The settings page is also available in the menu: Modules->Configure->ProcessPageViewStat.
      IP2Location
      This module uses the IP2Location database from: http://www.ip2location.com. This database is required to obtain the country from the IP address. IP2Location updates this database at the begin of every month. The settings of ProcessPageViewStat offers the ability to automatically download the database monthly. Please note, that automatically download will not work if your webspace doesn't allow allow_url_fopen.
      Dragscroll
      This module uses DragScroll. A JavaScript available from: http://github.com/asvd/dragscroll. Dragscroll adds the ability in view mode "Day" to drag the records horizontally with the mouse pointer.
      parseUserAgentStringClass
      This module uses the PHP class parseUserAgentStringClass available from: http://www.toms-world.org/blog/parseuseragentstring/. This class is required to filter out the browser type and operating system from the server request.
      Special Feature
      PageViewStatistic for ProcessWire can record the time a visitor viewed the page. This feature is deactivated by default. To activate open the module configuration page and activate "Record view time". If activated you will find a new column "S." in the records which means the time of view in seconds. With every page request, a Javascript code is inserted directly after the <body> tag. Every time the visitor switches to another tab or closes the tab, this script reports the number of seconds the tab was visible. The initial page request is recorded only as a hyphen (-).
       
    • By MoritzLost
      This module allows you to integrate hCaptcha bot / spam protection into ProcessWire forms. hCaptcha is a great alternative to Google ReCaptcha, especially if you are in the EU and need to comply with privacy regulations.

      The development of this module is sponsored by schwarzdesign.
      The module is built as an Inputfield, allowing you to integrate it into any ProcessWire form you want. It's primarily intended for frontend forms and can be added to Form Builder forms for automatic spam protection. There's a step-by-step guide for adding the hCaptcha widget to Form Builder forms in the README, as well as instructions for API usage.
      Features
      Inputfield that displays an hCaptcha widget in ProcessWire forms. The inputfield verifies the hCaptcha response upon submission, and adds a field error if it is invalid. All hCaptcha configuration options for the widget (theme, display size etc) can be changed through the inputfield configuration, as well as programmatically. hCaptcha script options can be changed through a hook. Error messages can be translated through ProcessWire's site translations. hCaptcha secret keys and site-keys can be set for each individual inputfield or globally in your config.php. Error codes and failures are logged to help you find configuration errors. Please check the README for setup instructions.
      Links
      Github Repository and documentation InputfieldHCaptcha in the module directory Screenshots (configuration)

      Screenshots (hCaptcha widget)

       
       

       
×
×
  • Create New...