Jump to content

Pre-Release: Remote Backup


rot
 Share

Recommended Posts

I spent way too much of my spare time with trying to produce an overly complex site backup module. Anyway - it is here in a pre-release state. I somehow have to get rid of the monster.

Features:

Use Storage Providers

There are two base classes for Storage modules and three reference implementations:

  • Remote Storage Driver
    This is a baseclass for construcing plug-in modules that allow to send data to a remote storage.
    You need to extend all abstract functions: connect, disconnect, upload and getConfigFieldset

    Implemented Examples
    • Storage Mail
      Sends a backup as mail attachment. If the file size exceeds a set limit it will get split. It uses PHPMailer library as
      WireMail does not support attachments.

      @todo: For now this mails all in a single smtp session - maybe thats not so safe?
  • Remote Directory Driver
    This is a baseclass for construcing plug-in modules that allow to send data to a remote storage and list and delete old files.
    You need to extend all abstract functions: connect, disconnect, upload, find, size, mdate, delete and getConfigFieldset.
    Implemented Examples
    • Storage FTP
      Allows to connect to an ftp server and upload, list and delete files.
      Uses standard php ftp functions.
    • Storage Google Drive
      Allows to connect to google drive server and upload, list and delete files. Uses the php google api.
      You have to create a Service account with the google developers console and add the key file to the plugin directory (or another directory if you specify a relative or absolute path to that file).
      s. https://developers.google.com/identity/protocols/OAuth2ServiceAccount#creatinganaccount
      I don't use the OAuth token process because it is not more secure. Once there is a renew token (which is necessary to avoid user interaction) it is as powerful and insecure as a keyfile. It is just more complex as it needs a callback url for registering.

      @todo? In case you can prove otherwise I will implement the callback registration.

Run from the web or the command line

It's allways better to have a regular cron job running. But sometimes you might need webcron

  • Command Line
    You just need to call backup.php with the id of a backup job and it will be run
  • Web Cron
    There is a token that starts the backup job from the web if passed as a url parameter.
    You can specify whether you want logging the http stream or not. 
    You can also specify whether you want a job to be repeated within a certain timespan. This is for using unreliable webcron services by hitting the backup multiple times.
    @todo Consider integration of cron.pw
    @todo I use the init function of an automatically loaded module as a hook. This seems a bit strange. Is there better ways to do that?
     

Log to mail, file and admin

You can recieve logs by mail (on success or failure), log to a file and see log in a an admin page:

post-3268-0-56445800-1429030079_thumb.pn

Configure

I built a admin interface that - besides the log viewer - features a list of jobs:

post-3268-0-81485900-1429030644_thumb.pn

 

and an editor for the job (which is too extensive to be described in detail):

post-3268-0-26791900-1429030651_thumb.pn

 

Dicussion

I am not too sure on how to solve the issues indicated with @todo.

My main concern are the hooking (init of an autoload module for the moment) and locking (none, no singleton for the moment).

As for hooking I only know of the alternative of using a page where one would have (afaik) to use a special template as the admin template is secured or hook into the security functions (which would probably call for a singleton module).

Concerning the locking issue I think it might be good if the Admin Class would lock if it is updateing something. For the moment this is the same class that runs the backups thus it would also lock the admin if there is a backup running. And it would lock the whole site if it is on autoload (as I use the init hook).

Lastly I should reconsider the logging and maybe try to better integrate it with processwire logging.

I would appreciate comments and suggestionsn on these issues.

I appreciate your test results. Don't be took frutsrated if something goes wrong, this is at an early stage but afaik it should be running.

Please find the modulle on:
https://github.com/romanseidl/remote-backup

  • Like 14
Link to comment
Share on other sites

I use my own log file(s) by using the LogFile class. Is that all there is? There is this $this->message and $this->error functions. Maybe one should implement them too. At the moment I only implement log() and it does not forward to the Wire baseclass logger.

I will look at LazyCron Thx.

For the moment I seem to lose the database connection on long runninng jobs which is a problem because I want to save the results to the database:

Exception: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away (in /home/.sites/306/site1268/web/wire/core/Modules.php line 2416)
This error message was shown because you are logged in as a Superuser. Error has been logged.xception: 
SQLSTATE[HY000]: General error: 2006 MySQL server has gone away (in /home/.sites/306/site1268/web/wire/core/Modules.php line 2416)
This error message was shown because you are logged in as a Superuser. Error has been logged.

Any ideas on how to avoid that?

Link to comment
Share on other sites

Hi @rot,

a real big monster! :)

Regarding to lost connections, you also should think about querying max_execution_time on start of a backup job.

And regardless if you have unlimited time, maybe better to run those jobs in batches?

Which results do you want to store to the database? Summary or each single action?

Link to comment
Share on other sites

These message and error functions are coming from the Notice class and are not related to the logs, but you can set the flag Notice::log or Notice::logOnly, so they get logged to the messages or errors log. You could extend this to add your own notices, which will be logged to your log file and show up as notice. The FileLog.php is essentially what the api has to offer. I can't see what more you're expecting from it. It's about writing log-messages to a file. 

Link to comment
Share on other sites

a real big monster! :)

Regarding to lost connections, you also should think about querying max_execution_time on start of a backup job.

And regardless if you have unlimited time, maybe better to run those jobs in batches?

Which results do you want to store to the database? Summary or each single action?

max_execution_time is set (by calling set_time_limit() - it probably only works if the php setup allows so - ill have to check). You can set that in the admin:)

post-3268-0-04055600-1429038003_thumb.pn

The Script runs until it is nearly finished. Then I want to save the result (info the job was successful and the log) to the database. So its probably what you call a "summary".

So maybe this is a seperate database timeout? Does processwire open a mysql connection for every request? Maybe that connection just dies after some minutes of doing nothing.

These message and error functions are coming from the Notice class and are not related to the logs, but you can set the flag Notice::log or Notice::logOnly, so they get logged to the messages or errors log. You could extend this to add your own notices, which will be logged to your log file and show up as notice. The FileLog.php is essentially what the api has to offer. I can't see what more you're expecting from it. It's about writing log-messages to a file. 

There can always be more :) Like e.g. class based loggin in a LOG4J style. But more is not always better.

At the moment I don't log to files by default (which might be a bad Idea considering the database timeouts...) but I used to log to an own log file.

  • Like 1
Link to comment
Share on other sites

The problem was a database timeout.

I fixed the timout problem by using the following to reconnect if needed.

    $data = $backup->run($web);
    //force reconnect if database has been lost due to long job runtime
    try {
            $this->setJobData($id, $data);
    } catch (Exception $e) {
        $this->modules->wire('database')->closeConnection();
            $this->setJobData($id, $data);
    }

Not the most beautiful solution.

I would prefer to have a $db->ping() method as there was with mysqli.

Maybe that would be a good extension to the DBO?

I created a pull request:

https://github.com/ryancramerdesign/ProcessWire/pull/1097

Its not dead important but I think it would be nice.

  • Like 1
Link to comment
Share on other sites

Concerning the locking issue: Is there even something like locking? Or am I just confused? As thre is no middleware (confuses me :)) there should be no problem with setting such a module to singular and still serve multiple requests in paralell. All locking there is is in the db. Or am I wrong?

I studied $modules->getModule() and it shows that :

  • $modules->get tries to find the module in the WireArray of $modules that should contain all installed modules.
  • Those who are on autoload have been initialized.
  • If it is NOT singular it gets recreated. And if it has not been initialized (because it is not on autoload or not singular) it get initialized.

This seem to imply that a module that is on autoload and not singular produces an autoload instance and then a new one for each $modules->get().

LazyCron is autoload but not singular and it hooks onto:

$this->addHookAfter('ProcessPageView::finished', $this, 'afterPageView');

Maybe it is not important that it is not singular as LazyCron is should not be directly called but hooked:

$this->addHook('LazyCron::every30Minutes', $this, 'myHook'); 

Also concerning the hook I looked at "Jumplinks" and this hooks to "Page not Found" which seems pretty nice for something that should try not to slow down the regular page view process:

$this->addHookBefore('ProcessPageView::pageNotFound', $this, 'scanAndRedirect', array('priority' => 10));

Funny enough I could not find out what Jumplinks is doing in terms of liefecycle. Probably it is using the getMolduleInfo defaults and thus it is not singular (but is it autoload? i suppose it has to be)

Link to comment
Share on other sites

I have also been considering to use a file for this kind of thing.

Probably it would be the easiest way also to provide some kind of "safe transactions" as for my case this is about long running processes that might die somewhere in the making. But I could also write that to the db as I am using that anyway to get process config info.

So I would have to set a start flag (in the db or to a file) and consider the job as dead if it doesn't reply within a certain amount of time.

  • Like 1
Link to comment
Share on other sites

EasyCron is a Web Cron service. A Web Cron repeatedly requests a user provided URL at a given interval and processes the result of the request.[/font][/color]

Edited by adrian
Removed link - I don't think that service is much help for this module.
Link to comment
Share on other sites

@rot, thanks a lot for this module!

I've tested both the FTP and Google Drive options for a site of mine hosted on a Digital Ocean VPS, and it worked flawlessly. It took less than a minute to create the backup files and upload the 650 MBs ZIP to Google Drive, which is quite impressive. I must say that setting up the Google Drive option was hard due to the lack of documentation, but since it's a pre-release, that's expected. 

  • Like 1
Link to comment
Share on other sites

  • 1 year later...

Hi @rot,

Thanks for this module. Are you still using this? I'm wondering because I couldn't find it in the module repository.

It works really great. I've got some minor fixes (i.e. errors on 3.x due to namespaces, errors since my PW install is in a sub-folder) for which I will send a Pull Request.

Link to comment
Share on other sites

Hi arjen!

Yes, I am still using the module but with an old processwire install.  I just never published it to the repository.

Just send me the pull requests and I will recheck and finally publish the module. I did't find the time to do so when I first did it and then I just forgot.

It would be a waste if it works fine and others don't find it.

  • Like 3
Link to comment
Share on other sites

3 hours ago, rot said:

It would be a waste if it works fine and others don't find it.

Definitely. I was looking for something else in the forum and stumbled upon your module. It does exactly what another backup script (PHPBU) is doing, but now from within the admin panel.

  • Like 2
Link to comment
Share on other sites

  • 4 weeks later...

Hi I just installed your module on pw version 3.0.36.

Immediately after installation, on the Admin > Remote Backup page, I get this error:

Notice: Undefined variable: out in /srv/users/serverpilot/apps/missions-pw/public/site/assets/cache/FileCompiler/site/modules/ProcessRemoteBackup/ProcessRemoteBackup.module on line 158

any ideas what may be happening? 

Link to comment
Share on other sites

  • 1 month later...

So, finally got some time to get back to setting up new backup services and it turns out from > 3.0.42 (latest stable) fixes the Filecompiler issues ;) I think Ryan made some changes regarding the file compiler a while ago. 

For now I've changed:

  1. FTP with TLS/SSL support
  2. Dynamically loading of ProcessWire bases on the current folder

I have sent a pull request. 

  • Like 3
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...