Jump to content
rot

Pre-Release: Remote Backup

Recommended Posts

I spent way too much of my spare time with trying to produce an overly complex site backup module. Anyway - it is here in a pre-release state. I somehow have to get rid of the monster.

Features:

Use Storage Providers

There are two base classes for Storage modules and three reference implementations:

  • Remote Storage Driver
    This is a baseclass for construcing plug-in modules that allow to send data to a remote storage.
    You need to extend all abstract functions: connect, disconnect, upload and getConfigFieldset

    Implemented Examples
    • Storage Mail
      Sends a backup as mail attachment. If the file size exceeds a set limit it will get split. It uses PHPMailer library as
      WireMail does not support attachments.

      @todo: For now this mails all in a single smtp session - maybe thats not so safe?
  • Remote Directory Driver
    This is a baseclass for construcing plug-in modules that allow to send data to a remote storage and list and delete old files.
    You need to extend all abstract functions: connect, disconnect, upload, find, size, mdate, delete and getConfigFieldset.
    Implemented Examples
    • Storage FTP
      Allows to connect to an ftp server and upload, list and delete files.
      Uses standard php ftp functions.
    • Storage Google Drive
      Allows to connect to google drive server and upload, list and delete files. Uses the php google api.
      You have to create a Service account with the google developers console and add the key file to the plugin directory (or another directory if you specify a relative or absolute path to that file).
      s. https://developers.google.com/identity/protocols/OAuth2ServiceAccount#creatinganaccount
      I don't use the OAuth token process because it is not more secure. Once there is a renew token (which is necessary to avoid user interaction) it is as powerful and insecure as a keyfile. It is just more complex as it needs a callback url for registering.

      @todo? In case you can prove otherwise I will implement the callback registration.

Run from the web or the command line

It's allways better to have a regular cron job running. But sometimes you might need webcron

  • Command Line
    You just need to call backup.php with the id of a backup job and it will be run
  • Web Cron
    There is a token that starts the backup job from the web if passed as a url parameter.
    You can specify whether you want logging the http stream or not. 
    You can also specify whether you want a job to be repeated within a certain timespan. This is for using unreliable webcron services by hitting the backup multiple times.
    @todo Consider integration of cron.pw
    @todo I use the init function of an automatically loaded module as a hook. This seems a bit strange. Is there better ways to do that?
     

Log to mail, file and admin

You can recieve logs by mail (on success or failure), log to a file and see log in a an admin page:

post-3268-0-56445800-1429030079_thumb.pn

Configure

I built a admin interface that - besides the log viewer - features a list of jobs:

post-3268-0-81485900-1429030644_thumb.pn

 

and an editor for the job (which is too extensive to be described in detail):

post-3268-0-26791900-1429030651_thumb.pn

 

Dicussion

I am not too sure on how to solve the issues indicated with @todo.

My main concern are the hooking (init of an autoload module for the moment) and locking (none, no singleton for the moment).

As for hooking I only know of the alternative of using a page where one would have (afaik) to use a special template as the admin template is secured or hook into the security functions (which would probably call for a singleton module).

Concerning the locking issue I think it might be good if the Admin Class would lock if it is updateing something. For the moment this is the same class that runs the backups thus it would also lock the admin if there is a backup running. And it would lock the whole site if it is on autoload (as I use the init hook).

Lastly I should reconsider the logging and maybe try to better integrate it with processwire logging.

I would appreciate comments and suggestionsn on these issues.

I appreciate your test results. Don't be took frutsrated if something goes wrong, this is at an early stage but afaik it should be running.

Please find the modulle on:
https://github.com/romanseidl/remote-backup

  • Like 14

Share this post


Link to post
Share on other sites

I use my own log file(s) by using the LogFile class. Is that all there is? There is this $this->message and $this->error functions. Maybe one should implement them too. At the moment I only implement log() and it does not forward to the Wire baseclass logger.

I will look at LazyCron Thx.

For the moment I seem to lose the database connection on long runninng jobs which is a problem because I want to save the results to the database:

Exception: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away (in /home/.sites/306/site1268/web/wire/core/Modules.php line 2416)
This error message was shown because you are logged in as a Superuser. Error has been logged.xception: 
SQLSTATE[HY000]: General error: 2006 MySQL server has gone away (in /home/.sites/306/site1268/web/wire/core/Modules.php line 2416)
This error message was shown because you are logged in as a Superuser. Error has been logged.

Any ideas on how to avoid that?

Share this post


Link to post
Share on other sites

Hi @rot,

a real big monster! :)

Regarding to lost connections, you also should think about querying max_execution_time on start of a backup job.

And regardless if you have unlimited time, maybe better to run those jobs in batches?

Which results do you want to store to the database? Summary or each single action?

Share this post


Link to post
Share on other sites

These message and error functions are coming from the Notice class and are not related to the logs, but you can set the flag Notice::log or Notice::logOnly, so they get logged to the messages or errors log. You could extend this to add your own notices, which will be logged to your log file and show up as notice. The FileLog.php is essentially what the api has to offer. I can't see what more you're expecting from it. It's about writing log-messages to a file. 

Share this post


Link to post
Share on other sites
a real big monster! :)

Regarding to lost connections, you also should think about querying max_execution_time on start of a backup job.

And regardless if you have unlimited time, maybe better to run those jobs in batches?

Which results do you want to store to the database? Summary or each single action?

max_execution_time is set (by calling set_time_limit() - it probably only works if the php setup allows so - ill have to check). You can set that in the admin:)

post-3268-0-04055600-1429038003_thumb.pn

The Script runs until it is nearly finished. Then I want to save the result (info the job was successful and the log) to the database. So its probably what you call a "summary".

So maybe this is a seperate database timeout? Does processwire open a mysql connection for every request? Maybe that connection just dies after some minutes of doing nothing.

These message and error functions are coming from the Notice class and are not related to the logs, but you can set the flag Notice::log or Notice::logOnly, so they get logged to the messages or errors log. You could extend this to add your own notices, which will be logged to your log file and show up as notice. The FileLog.php is essentially what the api has to offer. I can't see what more you're expecting from it. It's about writing log-messages to a file. 

There can always be more :) Like e.g. class based loggin in a LOG4J style. But more is not always better.

At the moment I don't log to files by default (which might be a bad Idea considering the database timeouts...) but I used to log to an own log file.

  • Like 1

Share this post


Link to post
Share on other sites

The problem was a database timeout.

I fixed the timout problem by using the following to reconnect if needed.

    $data = $backup->run($web);
    //force reconnect if database has been lost due to long job runtime
    try {
            $this->setJobData($id, $data);
    } catch (Exception $e) {
        $this->modules->wire('database')->closeConnection();
            $this->setJobData($id, $data);
    }

Not the most beautiful solution.

I would prefer to have a $db->ping() method as there was with mysqli.

Maybe that would be a good extension to the DBO?

I created a pull request:

https://github.com/ryancramerdesign/ProcessWire/pull/1097

Its not dead important but I think it would be nice.

  • Like 1

Share this post


Link to post
Share on other sites

Concerning the locking issue: Is there even something like locking? Or am I just confused? As thre is no middleware (confuses me :)) there should be no problem with setting such a module to singular and still serve multiple requests in paralell. All locking there is is in the db. Or am I wrong?

I studied $modules->getModule() and it shows that :

  • $modules->get tries to find the module in the WireArray of $modules that should contain all installed modules.
  • Those who are on autoload have been initialized.
  • If it is NOT singular it gets recreated. And if it has not been initialized (because it is not on autoload or not singular) it get initialized.

This seem to imply that a module that is on autoload and not singular produces an autoload instance and then a new one for each $modules->get().

LazyCron is autoload but not singular and it hooks onto:

$this->addHookAfter('ProcessPageView::finished', $this, 'afterPageView');

Maybe it is not important that it is not singular as LazyCron is should not be directly called but hooked:

$this->addHook('LazyCron::every30Minutes', $this, 'myHook'); 

Also concerning the hook I looked at "Jumplinks" and this hooks to "Page not Found" which seems pretty nice for something that should try not to slow down the regular page view process:

$this->addHookBefore('ProcessPageView::pageNotFound', $this, 'scanAndRedirect', array('priority' => 10));

Funny enough I could not find out what Jumplinks is doing in terms of liefecycle. Probably it is using the getMolduleInfo defaults and thus it is not singular (but is it autoload? i suppose it has to be)

Share this post


Link to post
Share on other sites

I have also been considering to use a file for this kind of thing.

Probably it would be the easiest way also to provide some kind of "safe transactions" as for my case this is about long running processes that might die somewhere in the making. But I could also write that to the db as I am using that anyway to get process config info.

So I would have to set a start flag (in the db or to a file) and consider the job as dead if it doesn't reply within a certain amount of time.

  • Like 1

Share this post


Link to post
Share on other sites

EasyCron is a Web Cron service. A Web Cron repeatedly requests a user provided URL at a given interval and processes the result of the request.[/font][/color]

Edited by adrian
Removed link - I don't think that service is much help for this module.

Share this post


Link to post
Share on other sites

@rot, thanks a lot for this module!

I've tested both the FTP and Google Drive options for a site of mine hosted on a Digital Ocean VPS, and it worked flawlessly. It took less than a minute to create the backup files and upload the 650 MBs ZIP to Google Drive, which is quite impressive. I must say that setting up the Google Drive option was hard due to the lack of documentation, but since it's a pre-release, that's expected. 

  • Like 1

Share this post


Link to post
Share on other sites

Hi @rot,

Thanks for this module. Are you still using this? I'm wondering because I couldn't find it in the module repository.

It works really great. I've got some minor fixes (i.e. errors on 3.x due to namespaces, errors since my PW install is in a sub-folder) for which I will send a Pull Request.

Share this post


Link to post
Share on other sites

Hi arjen!

Yes, I am still using the module but with an old processwire install.  I just never published it to the repository.

Just send me the pull requests and I will recheck and finally publish the module. I did't find the time to do so when I first did it and then I just forgot.

It would be a waste if it works fine and others don't find it.

  • Like 3

Share this post


Link to post
Share on other sites
3 hours ago, rot said:

It would be a waste if it works fine and others don't find it.

Definitely. I was looking for something else in the forum and stumbled upon your module. It does exactly what another backup script (PHPBU) is doing, but now from within the admin panel.

  • Like 2

Share this post


Link to post
Share on other sites

Hi I just installed your module on pw version 3.0.36.

Immediately after installation, on the Admin > Remote Backup page, I get this error:

Notice: Undefined variable: out in /srv/users/serverpilot/apps/missions-pw/public/site/assets/cache/FileCompiler/site/modules/ProcessRemoteBackup/ProcessRemoteBackup.module on line 158

any ideas what may be happening? 

Share this post


Link to post
Share on other sites

Hi @rastographics, this module won't work out of the box with PW 3.x. You need to do some changes to make it work. I don't know which one from memory, but I got a copy running locally at home. I might find some time tonight to update the code and send @rot the Pull Request I promised.

  • Like 2

Share this post


Link to post
Share on other sites

No problem. I want to install this on several sites we are running. Looks like it has to be done this weekend. Will get back to you.

  • Like 1

Share this post


Link to post
Share on other sites

So, finally got some time to get back to setting up new backup services and it turns out from > 3.0.42 (latest stable) fixes the Filecompiler issues ;) I think Ryan made some changes regarding the file compiler a while ago. 

For now I've changed:

  1. FTP with TLS/SSL support
  2. Dynamically loading of ProcessWire bases on the current folder

I have sent a pull request. 

  • Like 3

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By adrian
      Tracy Debugger for ProcessWire
      The ultimate “swiss army knife” debugging and development tool for the ProcessWire CMF/CMS

       
      Integrates and extends Nette's Tracy debugging tool and adds 35+ custom tools designed for effective ProcessWire debugging and lightning fast development
      The most comprehensive set of instructions and examples is available at: https://adrianbj.github.io/TracyDebugger
      Modules Directory: http://modules.processwire.com/modules/tracy-debugger/
      Github: https://github.com/adrianbj/TracyDebugger
      A big thanks to @tpr for introducing me to Tracy and for the idea for this module and for significant feedback, testing, and feature suggestions.
    • By adrian
      This module allows you to automatically rename file (including image) uploads according to a configurable format
      This module lets you define as many rules as you need to determine how uploaded files will be named and you can have different rules for different pages, templates, fields, and file extensions, or one rule for all uploads. Renaming works for files uploaded via the admin interface and also via the API, including images added from remote URLs.   Github: https://github.com/adrianbj/CustomUploadNames
      Modules Directory: http://modules.processwire.com/modules/process-custom-upload-names/
      Renaming Rules
      The module config allows you to set an unlimited number of Rename Rules. You can define rules to specific fields, templates, pages, and file extensions. If a rule option is left blank, the rule with be applied to all fields/templates/pages/extensions. Leave Filename Format blank to prevent renaming for a specific field/template/page combo, overriding a more general rule. Rules are processed in order, so put more specific rules before more general ones. You can drag to change the order of rules as needed. The following variables can be used in the filename format: $page, $template, $field, and $file. For some of these (eg. $field->description), if they haven't been filled out and saved prior to uploading the image, renaming won't occur on upload, but will happen on page save (could be an issue if image has already been inserted into RTE/HTML field before page save). Some examples: $page->title mysite-{$template->name}-images $field->label $file->description {$page->name}-{$file->filesize}-kb prefix-[Y-m-d_H-i-s]-suffix (anything inside square brackets is is considered to be a PHP date format for the current date/time) randstring[n] (where n is the number of characters you want in the string) ### (custom number mask, eg. 001 if more than one image with same name on a page. This is an enhanced version of the automatic addition of numbers if required) If 'Rename on Save' is checked files will be renamed again each time a page is saved (admin or front-end via API). WARNING: this setting will break any direct links to the old filename, which is particularly relevant for images inserted into RTE/HTML fields. The Filename Format can be defined using plain text and PW $page variable, for example: mysite-{$page->path} You can preserve the uploaded filename for certain rules. This will allow you to set a general renaming rule for your entire site, but then add a rule for a specific page/template/field that does not rename the uploaded file. Just simply build the rule, but leave the Filename Format field empty. You can specify an optional character limit (to nearest whole word) for the length of the filename - useful if you are using $page->path, $path->name etc and have very long page names - eg. news articles, publication titles etc. NOTE - if you are using ProcessWire's webp features, be sure to use the useSrcExt because if you have jpg and png files on the same page and your rename rules result in the same name, you need to maintain the src extension so they are kept as separate files.
      $config->webpOptions = array(     'useSrcExt' => false, // Use source file extension in webp filename? (file.jpg.webp rather than file.webp) ); Acknowledgments
      The module config settings make use of code from Pete's EmailToPage module and the renaming function is based on this code from Ryan: http://processwire.com/talk/topic/3299-ability-to-define-convention-for-image-and-file-upload-names/?p=32623 (also see this post for his thoughts on file renaming and why it is the lazy way out - worth a read before deciding to use this module). 
       
       
      NOTE:
      This should not be needed on most sites, but I work with lots of sites that host PDFs and photos/vectors that are available for download and I have always renamed the files on upload because clients will often upload files with horrible meaningless filenames like:
      Final ReportV6 web version for John Feb 23.PDF

    • By Mike Rockett
      Jumplinks for ProcessWire
      Release: 1.5.56
      Composer: rockett/jumplinks
      Jumplinks is an enhanced version of the original ProcessRedirects by Antti Peisa.
      The Process module manages your permanent and temporary redirects (we'll call these "jumplinks" from now on, unless in reference to redirects from another module), useful for when you're migrating over to ProcessWire from another system/platform. Each jumplink supports wildcards, shortening the time needed to create them.
      Unlike similar modules for other platforms, wildcards in Jumplinks are much easier to work with, as Regular Expressions are not fully exposed. Instead, parameters wrapped in curly braces are used - these are described in the documentation.
      Under Development: 2.0, to be powered by FastRoute
      As of version 1.5.0, Jumplinks requires at least ProcessWire 2.6.1 to run.
      View on GitLab
      Download via the Modules Directory
      Read the docs
      Features
      The most prominent features include:
      Basic jumplinks (from one fixed route to another) Parameter-based wildcards with "Smart" equivalents Mapping Collections (for converting ID-based routes to their named-equivalents without the need to create multiple jumplinks) Destination Selectors (for finding and redirecting to pages containing legacy location information) Timed Activation (activate and/or deactivate jumplinks at specific times) 404-Monitor (for creating jumplinks based on 404 hits) Additionally, the following features may come in handy:
      Stale jumplink management Legacy domain support for slow migrations An importer (from CSV or ProcessRedirects) Feedback & Feature Requests
      I’d love to know what you think of this module. Please provide some feedback on the module as a whole, or even regarding smaller things that make it whole. Also, please feel free to submit feature requests and their use-cases.
      Note: Features requested so far have been added to the to-do list, and will be added to 2.0, and not the current dev/master branches.
      Open Source

      Jumplinks is an open-source project, and is free to use. In fact, Jumplinks will always be open-source, and will always remain free to use. Forever. If you would like to support the development of Jumplinks, please consider making a small donation via PayPal.
      Enjoy! :)
×
×
  • Create New...