Jump to content
rot

Pre-Release: Remote Backup

Recommended Posts

I spent way too much of my spare time with trying to produce an overly complex site backup module. Anyway - it is here in a pre-release state. I somehow have to get rid of the monster.

Features:

Use Storage Providers

There are two base classes for Storage modules and three reference implementations:

  • Remote Storage Driver
    This is a baseclass for construcing plug-in modules that allow to send data to a remote storage.
    You need to extend all abstract functions: connect, disconnect, upload and getConfigFieldset

    Implemented Examples
    • Storage Mail
      Sends a backup as mail attachment. If the file size exceeds a set limit it will get split. It uses PHPMailer library as
      WireMail does not support attachments.

      @todo: For now this mails all in a single smtp session - maybe thats not so safe?
  • Remote Directory Driver
    This is a baseclass for construcing plug-in modules that allow to send data to a remote storage and list and delete old files.
    You need to extend all abstract functions: connect, disconnect, upload, find, size, mdate, delete and getConfigFieldset.
    Implemented Examples
    • Storage FTP
      Allows to connect to an ftp server and upload, list and delete files.
      Uses standard php ftp functions.
    • Storage Google Drive
      Allows to connect to google drive server and upload, list and delete files. Uses the php google api.
      You have to create a Service account with the google developers console and add the key file to the plugin directory (or another directory if you specify a relative or absolute path to that file).
      s. https://developers.google.com/identity/protocols/OAuth2ServiceAccount#creatinganaccount
      I don't use the OAuth token process because it is not more secure. Once there is a renew token (which is necessary to avoid user interaction) it is as powerful and insecure as a keyfile. It is just more complex as it needs a callback url for registering.

      @todo? In case you can prove otherwise I will implement the callback registration.

Run from the web or the command line

It's allways better to have a regular cron job running. But sometimes you might need webcron

  • Command Line
    You just need to call backup.php with the id of a backup job and it will be run
  • Web Cron
    There is a token that starts the backup job from the web if passed as a url parameter.
    You can specify whether you want logging the http stream or not. 
    You can also specify whether you want a job to be repeated within a certain timespan. This is for using unreliable webcron services by hitting the backup multiple times.
    @todo Consider integration of cron.pw
    @todo I use the init function of an automatically loaded module as a hook. This seems a bit strange. Is there better ways to do that?
     

Log to mail, file and admin

You can recieve logs by mail (on success or failure), log to a file and see log in a an admin page:

post-3268-0-56445800-1429030079_thumb.pn

Configure

I built a admin interface that - besides the log viewer - features a list of jobs:

post-3268-0-81485900-1429030644_thumb.pn

 

and an editor for the job (which is too extensive to be described in detail):

post-3268-0-26791900-1429030651_thumb.pn

 

Dicussion

I am not too sure on how to solve the issues indicated with @todo.

My main concern are the hooking (init of an autoload module for the moment) and locking (none, no singleton for the moment).

As for hooking I only know of the alternative of using a page where one would have (afaik) to use a special template as the admin template is secured or hook into the security functions (which would probably call for a singleton module).

Concerning the locking issue I think it might be good if the Admin Class would lock if it is updateing something. For the moment this is the same class that runs the backups thus it would also lock the admin if there is a backup running. And it would lock the whole site if it is on autoload (as I use the init hook).

Lastly I should reconsider the logging and maybe try to better integrate it with processwire logging.

I would appreciate comments and suggestionsn on these issues.

I appreciate your test results. Don't be took frutsrated if something goes wrong, this is at an early stage but afaik it should be running.

Please find the modulle on:
https://github.com/romanseidl/remote-backup

  • Like 14

Share this post


Link to post
Share on other sites

I use my own log file(s) by using the LogFile class. Is that all there is? There is this $this->message and $this->error functions. Maybe one should implement them too. At the moment I only implement log() and it does not forward to the Wire baseclass logger.

I will look at LazyCron Thx.

For the moment I seem to lose the database connection on long runninng jobs which is a problem because I want to save the results to the database:

Exception: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away (in /home/.sites/306/site1268/web/wire/core/Modules.php line 2416)
This error message was shown because you are logged in as a Superuser. Error has been logged.xception: 
SQLSTATE[HY000]: General error: 2006 MySQL server has gone away (in /home/.sites/306/site1268/web/wire/core/Modules.php line 2416)
This error message was shown because you are logged in as a Superuser. Error has been logged.

Any ideas on how to avoid that?

Share this post


Link to post
Share on other sites

Hi @rot,

a real big monster! :)

Regarding to lost connections, you also should think about querying max_execution_time on start of a backup job.

And regardless if you have unlimited time, maybe better to run those jobs in batches?

Which results do you want to store to the database? Summary or each single action?

Share this post


Link to post
Share on other sites

These message and error functions are coming from the Notice class and are not related to the logs, but you can set the flag Notice::log or Notice::logOnly, so they get logged to the messages or errors log. You could extend this to add your own notices, which will be logged to your log file and show up as notice. The FileLog.php is essentially what the api has to offer. I can't see what more you're expecting from it. It's about writing log-messages to a file. 

Share this post


Link to post
Share on other sites
a real big monster! :)

Regarding to lost connections, you also should think about querying max_execution_time on start of a backup job.

And regardless if you have unlimited time, maybe better to run those jobs in batches?

Which results do you want to store to the database? Summary or each single action?

max_execution_time is set (by calling set_time_limit() - it probably only works if the php setup allows so - ill have to check). You can set that in the admin:)

post-3268-0-04055600-1429038003_thumb.pn

The Script runs until it is nearly finished. Then I want to save the result (info the job was successful and the log) to the database. So its probably what you call a "summary".

So maybe this is a seperate database timeout? Does processwire open a mysql connection for every request? Maybe that connection just dies after some minutes of doing nothing.

These message and error functions are coming from the Notice class and are not related to the logs, but you can set the flag Notice::log or Notice::logOnly, so they get logged to the messages or errors log. You could extend this to add your own notices, which will be logged to your log file and show up as notice. The FileLog.php is essentially what the api has to offer. I can't see what more you're expecting from it. It's about writing log-messages to a file. 

There can always be more :) Like e.g. class based loggin in a LOG4J style. But more is not always better.

At the moment I don't log to files by default (which might be a bad Idea considering the database timeouts...) but I used to log to an own log file.

  • Like 1

Share this post


Link to post
Share on other sites

The problem was a database timeout.

I fixed the timout problem by using the following to reconnect if needed.

    $data = $backup->run($web);
    //force reconnect if database has been lost due to long job runtime
    try {
            $this->setJobData($id, $data);
    } catch (Exception $e) {
        $this->modules->wire('database')->closeConnection();
            $this->setJobData($id, $data);
    }

Not the most beautiful solution.

I would prefer to have a $db->ping() method as there was with mysqli.

Maybe that would be a good extension to the DBO?

I created a pull request:

https://github.com/ryancramerdesign/ProcessWire/pull/1097

Its not dead important but I think it would be nice.

  • Like 1

Share this post


Link to post
Share on other sites

Concerning the locking issue: Is there even something like locking? Or am I just confused? As thre is no middleware (confuses me :)) there should be no problem with setting such a module to singular and still serve multiple requests in paralell. All locking there is is in the db. Or am I wrong?

I studied $modules->getModule() and it shows that :

  • $modules->get tries to find the module in the WireArray of $modules that should contain all installed modules.
  • Those who are on autoload have been initialized.
  • If it is NOT singular it gets recreated. And if it has not been initialized (because it is not on autoload or not singular) it get initialized.

This seem to imply that a module that is on autoload and not singular produces an autoload instance and then a new one for each $modules->get().

LazyCron is autoload but not singular and it hooks onto:

$this->addHookAfter('ProcessPageView::finished', $this, 'afterPageView');

Maybe it is not important that it is not singular as LazyCron is should not be directly called but hooked:

$this->addHook('LazyCron::every30Minutes', $this, 'myHook'); 

Also concerning the hook I looked at "Jumplinks" and this hooks to "Page not Found" which seems pretty nice for something that should try not to slow down the regular page view process:

$this->addHookBefore('ProcessPageView::pageNotFound', $this, 'scanAndRedirect', array('priority' => 10));

Funny enough I could not find out what Jumplinks is doing in terms of liefecycle. Probably it is using the getMolduleInfo defaults and thus it is not singular (but is it autoload? i suppose it has to be)

Share this post


Link to post
Share on other sites

I have also been considering to use a file for this kind of thing.

Probably it would be the easiest way also to provide some kind of "safe transactions" as for my case this is about long running processes that might die somewhere in the making. But I could also write that to the db as I am using that anyway to get process config info.

So I would have to set a start flag (in the db or to a file) and consider the job as dead if it doesn't reply within a certain amount of time.

  • Like 1

Share this post


Link to post
Share on other sites

EasyCron is a Web Cron service. A Web Cron repeatedly requests a user provided URL at a given interval and processes the result of the request.[/font][/color]

Edited by adrian
Removed link - I don't think that service is much help for this module.

Share this post


Link to post
Share on other sites

@rot, thanks a lot for this module!

I've tested both the FTP and Google Drive options for a site of mine hosted on a Digital Ocean VPS, and it worked flawlessly. It took less than a minute to create the backup files and upload the 650 MBs ZIP to Google Drive, which is quite impressive. I must say that setting up the Google Drive option was hard due to the lack of documentation, but since it's a pre-release, that's expected. 

  • Like 1

Share this post


Link to post
Share on other sites

Hi @rot,

Thanks for this module. Are you still using this? I'm wondering because I couldn't find it in the module repository.

It works really great. I've got some minor fixes (i.e. errors on 3.x due to namespaces, errors since my PW install is in a sub-folder) for which I will send a Pull Request.

Share this post


Link to post
Share on other sites

Hi arjen!

Yes, I am still using the module but with an old processwire install.  I just never published it to the repository.

Just send me the pull requests and I will recheck and finally publish the module. I did't find the time to do so when I first did it and then I just forgot.

It would be a waste if it works fine and others don't find it.

  • Like 3

Share this post


Link to post
Share on other sites
3 hours ago, rot said:

It would be a waste if it works fine and others don't find it.

Definitely. I was looking for something else in the forum and stumbled upon your module. It does exactly what another backup script (PHPBU) is doing, but now from within the admin panel.

  • Like 2

Share this post


Link to post
Share on other sites

Hi I just installed your module on pw version 3.0.36.

Immediately after installation, on the Admin > Remote Backup page, I get this error:

Notice: Undefined variable: out in /srv/users/serverpilot/apps/missions-pw/public/site/assets/cache/FileCompiler/site/modules/ProcessRemoteBackup/ProcessRemoteBackup.module on line 158

any ideas what may be happening? 

Share this post


Link to post
Share on other sites

Hi @rastographics, this module won't work out of the box with PW 3.x. You need to do some changes to make it work. I don't know which one from memory, but I got a copy running locally at home. I might find some time tonight to update the code and send @rot the Pull Request I promised.

  • Like 2

Share this post


Link to post
Share on other sites

No problem. I want to install this on several sites we are running. Looks like it has to be done this weekend. Will get back to you.

  • Like 1

Share this post


Link to post
Share on other sites

So, finally got some time to get back to setting up new backup services and it turns out from > 3.0.42 (latest stable) fixes the Filecompiler issues ;) I think Ryan made some changes regarding the file compiler a while ago. 

For now I've changed:

  1. FTP with TLS/SSL support
  2. Dynamically loading of ProcessWire bases on the current folder

I have sent a pull request. 

  • Like 3

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By MoritzLost
      This is a new module that provides a simple solution to clearing all your cache layers at once, and an extensible interface to perform various cache-related actions.
      The simple motivation behind this module was that I was tired of manually clearing caches in several places after deploying a change on a live site. The basic purpose of this module is a simple Clear all caches link in the Setup menu which clears out all caches, no matter where they hide. You can customize what exactly the module does through it's configuration menu:
      Expire or delete all cache entries in the database, or selectively clear caches by namespace ($cache API) Clear the the template render cache. Clear out specific folders inside your site's cache directory (/site/assets/cache) Refresh version strings for static assets to bust client-side browser caches (this requires some setup, see the full documentation for details). This is the basic function of the module. However, you can also add different cache management action through the API and execute them through the module's interface. For this advanced usage, the module provides:
      An interface to see all available cache actions and execute them. A system log and logging output on the module page to see verify what the module is doing. A CacheControlTools class with utility functions to clear out different caches. An API to add cache actions, execute them programmatically and even modify the default action. Permission management, allowing you granular control over which user roles can execute which actions. The complete documentation can be found in the module's README.
      Beta release
      Note that I consider this a Beta release. Since the module is relatively aggressive in deleting some caches, I would advise you to install in on a test environment before using it on a live site.
      Let me know if you're getting any errors, have trouble using the module or if you have suggestions for improvement!
      In particular, can someone let me know if this module causes any problems with the ProCache module? I don't own or use it, so I can't check. As far as I can tell, ProCache uses a folder inside the cache directory to cache static pages, so my module should be able to clear the ProCache site cache as well, I'd appreciate it if someone can test that for me.
      Future plans
      If there is some interest in this, I plan to expand this to a more general cache management solution. I particular, I would like to add additional cache actions. Some ideas that came to mind:
      Warming up the template render cache for publicly accessible pages. Removing all active user sessions. Let me know if you have more suggestions!
      Links
      https://github.com/MoritzLost/ProcessCacheControl ProcessCacheControl in the Module directory

    • By joshua
      This module is (yet another) way for implementing a cookie management solution.
      Of course there are several other possibilities:
      - https://processwire.com/talk/topic/22920-klaro-cookie-consent-manager/
      - https://github.com/webmanufaktur/CookieManagementBanner
      - https://github.com/johannesdachsel/cookiemonster
      - https://www.oiljs.org/
      - ... and so on ...
      In this module you can configure which kind of cookie categories you want to manage:

      You can also enable the support for respecting the Do-Not-Track (DNT) header to don't annoy users, who already decided for all their browsing experience.
      Currently there are four possible cookie groups:
      - Necessary (always enabled)
      - Statistics
      - Marketing
      - External Media
      All groups can be renamed, so feel free to use other cookie group names. I just haven't found a way to implement a "repeater like" field as configurable module field ...
      When you want to load specific scripts ( like Google Analytics, Google Maps, ...) only after the user's content to this specific category of cookies, just use the following script syntax:
      <script type="optin" data-type="text/javascript" data-category="statistics" data-src="/path/to/your/statistic/script.js"></script> <script type="optin" data-type="text/javascript" data-category="marketing" data-src="/path/to/your/mareketing/script.js"></script> <script type="optin" data-type="text/javascript" data-category="external_media" data-src="/path/to/your/external-media/script.js"></script> <script type="optin" data-type="text/javascript" data-category="marketing">console.log("Inline scripts are also working!");</script> The type has to be "optin" to get recognized by PrivacyWire, the data-attributes are giving hints, how the script shall be loaded, if the data-category is within the cookie consents of the user. These scripts are loaded asynchronously after the user made the decision.
      If you want to give the users the possibility to change their consent, you can use the following Textformatter:
      [[privacywire-choose-cookies]] It's planned to add also other Textformatters to opt-out of specific cookie groups or delete the whole consent cookie.
      You can also add a custom link to output the banner again with a link / button with following class:
      <a href="#" class="privacywire-show-options">Show Cookie Options</a> <button class="privacywire-show-options">Show Cookie Options</button> This module is still in development, but we already use it on several production websites.
      You find it here: https://github.com/blaueQuelle/privacywire/tree/master
      Download: https://github.com/blaueQuelle/privacywire/archive/master.zip
      I would love to hear your feedback 🙂
      Edit: Updated URLs to master tree of git repo
       
    • By David Karich
      Admin Page Tree Multiple Sorting
      ClassName: ProcessPageListMultipleSorting
      Extend the ordinary sort of children of a template in the admin page tree with multiple properties. For each template, you can define your own rule. Write each template (template-name) in a row, followed by a colon and then the additional field names for sorting.
      Example: All children of the template "blog" to be sorted in descending order according to the date of creation, then descending by modification date, and then by title. Type:
      blog: -created, -modified, title  Installation
      Copy the files for this module to /site/modules/ProcessPageListMultipleSorting/ In admin: Modules > Check for new modules. Install Module "Admin Page Tree Multible Sorting". Alternative in ProcessWire 2.4+
      Login to ProcessWire backend and go to Modules Click tab "New" and enter Module Class Name: "ProcessPageListMultipleSorting" Click "Download and Install"   Compatibility   I have currently tested the module only under PW 2.6+, but think that it works on older versions too. Maybe someone can give a feedback.     Download   PW-Repo: http://modules.processwire.com/modules/process-page-list-multiple-sorting/ GitHub: https://github.com/FlipZoomMedia/Processwire-ProcessPageListMultipleSorting     I hope someone can use the module. Have fun and best regards, David
    • By dimitrios
      Hello,
      this module can publish content of a Processwire page on a Facebook page, triggered by saving the Processwire page.
      To set it up, configure the module with a Facebook app ID, secret and a Page ID. Following is additional configuration on Facebook for developers:
      Minimum Required Facebook App configuration:
      on Settings -> Basics, provide the App Domains, provide the Site URL, on Settings -> Advanced, set the API version (has been tested up to v3.3), add Product: Facebook Login, on Facebook Login -> Settings, set Client OAuth Login: Yes, set Web OAuth Login: Yes, set Enforce HTTPS: Yes, add "https://www.example.com/processwire/page/" to field Valid OAuth Redirect URIs. This module is configurable as follows:
      Templates: posts can take place only for pages with the defined templates. On/Off switch: specify a checkbox field that will not allow the post if checked. Specify a message and/or an image for the post.
      Usage
      edit the desired PW page and save; it will post right after the initial Facebook log in and permission granting. After that, an access token is kept.
       
      Download
      PW module directory: http://modules.processwire.com/modules/auto-fb-post/ Github: https://github.com/kastrind/AutoFbPost   Note: Facebook SDK for PHP is utilized.


×
×
  • Create New...