Jump to content

Recommended Posts

Ok, so I have a new project that has PW as the selected framework of choice. Here are some of the constraints I'm working with:

1) The corporate database will replicate all the other branches data.

2) Each branch will have it's own web and database servers, thus run their own instance of ProcessWire and all data will be saved within the intranet.

3) Each branch will be able to access the other branches install via VPN. The web servers/code base will be synced using Git.

4) Each branch's database server will have a Symmetricds node installed. Via VPN, each node will push/pull to the corporate node, keeping the databases synced. Symmetricds allows for table and column specific updates so the whole db won't be getting replicated on each pass.

5) The database sync needs to be bidirectional, as in Multiple Master Replication. Reason being, the company plans on having certain departments add pages directly to the corporate account that will affect the other branches. For instance, the Human Resource department will handle adding/removing users and maintaining records on them. Each user created by HR will need to be able to access any branch's data when on the VPN. So user pages would need to be pushed to the branch db's. Here's what it would look like:

// Inside Branch 1's Building:

Client 1------via intranet---------------Web Server 1-----DB Server 1

// Connecting to remote Branch 2 from Branch 1

Client 1------via internet (VPN)------Web Server 2-----DB Server 2

// Database syncing

DB 1-------------via VPN-------Corporate DB 1       and vise versa (bidirectional)

DB 2-------------via VPN-------Corporate DB 2

The schema between all databases will be exactly the same. So, initially, my game plan was to change mySQL's auto_increment_increment and auto_increment_offset on each db instance, keeping the different branches on separate indexes when pages and fields are stored. Hopefully, this would cause a mostly conflict free environment. 

Also, this limits my approach on dealing with expansion. Once 9 offices are established, I'm going to have to start getting creative with the increment patterns. I suggested a cloud based approach but they insist on using their current infrastructure. My other idea was to write transaction and conflict rules into Symmetricds and allow it to make changes to page ids, etc and any references to such in the db. The real danger is when there is a mass building of pages when the VPN is not available, like an internet outage. Once the internet is back online, there could be a lot of pages that need to be replicated; with the likelihood that many pages end up with the same exact page id's, etc. 

Anyway, does anyone here have any advice as to what they would do? I'm open to hear opinions of what others have done in this same sort of scenario before I choose my path.

Share this post


Link to post
Share on other sites

First a few questions.

How probable is it that the VPN is down?

When the VPN is down, how long is it usually down?

Approximately how many writes are there during the day in the branch-offices?

How probable is it that there will be more than 9 offices?

Also I'm curious, why have you selected SymmetricDS over Tungsten Replicator?

The real danger is when there is a mass building of pages when the VPN is not available, like an internet outage. Once the internet is back online, there could be a lot of pages that need to be replicated; with the likelihood that many pages end up with the same exact page id's, etc.

I don't see how this would be a problem when you are using auto_increment_increment and auto_increment_offset.

You might have read this but it's a good article nonetheless: http://scale-out-blog.blogspot.fi/2012/04/if-you-must-deploy-multi-master.html

Edited by sforsman
  • Like 1

Share this post


Link to post
Share on other sites

@sforsman

The VPN shouldn't be down often. The only time it realistically should be offline is when there are problems with the ISP or internal networking issues. Maybe even for the split moment that new certs/keys are issued and updated. So the anticipation is that we maintain on online average of 99% or better throughout the year. Even still, that is about 87 hours of anticipated downtime.

The corporate node will be scheduled to pull every 10 minutes or sooner.

It's very probable that the company will have more than 9 offices in the next 2 years, they have 6 already.

I don't see how this would be a problem when you are using auto_increment_increment and auto_increment_offset.

I'm sorry, I meant to put that in the paragraph below it where I said I would write custom definitions to handle the page indexes.

Symmetricds was selected because the company already uses it for its existing databases and I have familiarity with it (though not in an all-master topology). I'm not biased and open to using different software as long as the learning curve isn't mountain steep; time is of the essence. I have never used Tungsten, though I have read a number of articles that referred to it and it seems much along the lines of how Symmetricds functions. Any advantages of using Tungsten vs Symmetricds would be appreciated. 

Thanks for the link. I've never read it before but it is quite informative and a great refresher. I like #5, "make the triggers harmless...". That should be fun to work through when I begin testing. I'll be keeping the original .sql file on hand for immediate importing. 

Share this post


Link to post
Share on other sites

@MindFull

A few more questions.

The items that are synced, are they only page- and field-related? Or also templates? Are all pages being synced or just a subset of them (i.e. pages with a specific template or parent)?

If new fields or templates are added at the branch-office, would they be synced back to the corporate DB?

How about modules and their configuration (i.e. the modules-table)? Are they synced from the corporate DB to the branch DBs?

Is something else being synced as well, apart from the data/code pulled from Git and the PW-related tables/rows?

I want to understand the complete scenario before making any suggestions.

The VPN shouldn't be down often.

In that case, would it be possible to consider locking/disabling branch-writes during this? I will come to my point later.

The corporate node will be scheduled to pull every 10 minutes or sooner.

Sorry for my being unclear, I meant that if you consider the average day at the branch office, how many writing operations are there related to the tables that are being synced (e.g. UPDATE/INSERT)? This is obviously related to the previous question.

I'm sorry, I meant to put that in the paragraph below it where I said I would write custom definitions to handle the page indexes.

I thought that might be the case, now it makes perfect sense :)

Symmetricds was selected because the company already uses it for its existing databases and I have familiarity with it

This is a good enough reason!

Any advantages of using Tungsten vs Symmetricds would be appreciated. 

Too early to say yet.

Thanks for the link. I've never read it before but it is quite informative and a great refresher. I like #5, "make the triggers harmless...". That should be fun to work through when I begin testing. I'll be keeping the original .sql file on hand for immediate importing. 

I'm glad you liked it! It's a good thing that at least the PW-core doesn't use triggers.

Share this post


Link to post
Share on other sites

@sforsman

Templates and modules will be included when a separate set of sync triggers are turned on in the config of Symmetricds. Typically, they won't be part of everyday syncing as they won't be changing or being added very often and only the corporate office will have permissions to add/edit/remove fields, templates, and modules. Yes, the changes will be pushed from corporate to all the branches.

There might be a separate table to handle some possible infinite row data that we will be writing a separate class for to handle querying its contents.  All other pages will be batched and channeled according to their relation to one another. There will be rows that will be exempt from updates after initial insert, like the page created timestamp and some other template defined ones. But, for the most part, most of the front-end submitted data will be subject to sync.

On average, there will be about 300 writes daily in each branch office. There are 3 full time shifts taking up all 24 hrs of the day.

As far as preventing branch writes, it just can't be allowed. The employees collect client data on a frequent basis and need to be able to store it immediately or they risk forgetting the details as the day goes on. That's the company's initial reasoning to having the local DB: VPN goes down, they have can keep going with the bulk of their work which is data collection of their clients locally.

Tomorrow I begin planning the data flow and transformation charts and I'm not particularly thrilled about that part of this project - lol, I hate diagramming in UML!!!!

BTW: Thanks for taking the time sforsman!

Share this post


Link to post
Share on other sites

@MindFull:

Thanks for the updated info, it was enlightening. 

Is the VPN down more often than their Internet-connection? The reasons behind these questions is that if you need 10+ masters, I would suggest a technique Flickr, for an example, uses. You can actually check their post about it here: http://code.flickr.net/2010/02/08/ticket-servers-distributed-unique-primary-keys-on-the-cheap/ - it's a really simple method :) If only the VPN-downtime is a problem, you could consider running the ticket-servers on the Internet.

If things cannot be reliant on the Internet-connection either, then in my opinion, the simplest solution would be using the same method described in the article above, but just using it locally.

Either way, such implementation would require hooking to the saving of new pages and just prefixing all IDs with a prefix assigned for each of the PW-instances (read from the site's wire('config') for an example). A three digit prefix would probably be enough to cover all of the customer's expansion needs (forever). Just an example

Corporate instance 1: 100 + <local ticket ID>

Corporate instance 2: 101 + <local ticket ID>

Branch instance 1: 200 + <local ticket ID>

Branch instance 2: 201 + <local ticket ID>

...and so on. This would work without any hacks to PW, because you are allowed to set the ID for new pages.

UML, urgh, I feel you there!

  • Like 4

Share this post


Link to post
Share on other sites

Either way, such implementation would require hooking to the saving of new pages and just prefixing all IDs with a prefix assigned for each of the PW-instances (read from the site's wire('config') for an example). A three digit prefix would probably be enough to cover all of the customer's expansion needs (forever).

My goodness, I was so caught up in the db side of things that I completely forgot to think about approaching this through the API!! This alleviates having to keep track of the auto increments in mySql - awesome! I love this forum and this API!

Thanks for your input sforsman! I feel much more confident with this approach than my previous ideas. Very simple and elegant, no wonder I overlooked it!  I have a tendency of doing things the hard way, :lol: . I'm going to write the hooks for this tomorrow when I'm back @ work and see how things pan out.

BTW, have you had the opportunity to try both Tungsten and Symmetricds? What's your opinion on Tungsten in comparison to Symmetricds? 

  • Like 1

Share this post


Link to post
Share on other sites

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By Juergen
      Hello @ all,
      I am creating a new inputfield/fieldtype to store opening hours, but I am struggeling to save values from multiple dynamic created inputfields in 1 column of the database.
      Scenario:
      The user can enter one or more opening times per day in a UI.
      Fe:
      Monday open from 08:00 to 12:00 and from 14:00 to 17:00 Tuesday open from 08:00 to 12:00 and from 14:00 to 19:00 and so on
      Via a little JavaScript you can add as much opening times as you need per day - the additional inputfield will be created dynamically.
      After form submission all the values are in the POST array -> this works (see example below):
      ProcessWire\WireInputData Object ( [openinghours_mo-0-start] => 09:00 [openinghours_mo-0-finish] => 13:00 [openinghours_mo-1-start] => 14:00 [openinghours_mo-1-finish] => 18:00 [openinghours_mo-2-start] => 21:00 [openinghours_mo-2-finish] => 23:00 [openinghours_tu-0-start] => 09:00 [openinghours_tu-0-finish] => 13:00 [openinghours_tu-1-start] => 14:00 [openinghours_tu-1-finish] => 18:00 [openinghours_we-0-start] => 09:00 [openinghours_we-0-finish] => 13:00 [openinghours_we-1-start] => 14:00 [openinghours_we-1-finish] => 18:00 [openinghours_th-0-start] => 09:00 [openinghours_th-0-finish] => 13:00 [openinghours_th-1-start] => 14:00 [openinghours_th-1-finish] => 18:00 [openinghours_fr-0-start] => 09:00 [openinghours_fr-0-finish] => 13:00 [openinghours_fr-1-start] => 14:00 [openinghours_fr-1-finish] => 18:00 [openinghours_sa-0-start] => [openinghours_sa-0-finish] => [openinghours_so-0-start] => [openinghours_so-0-finish] => ) The property name is always the name attribute of the field 😉 . If the property is empty means closed on that day.
      Now I need to combine all those values into 1 array (or json array) and store it in the database in 1 column called 'hours' in my case (see screenshot below):

      In my ___processInput(WireInputData $input) method I have tried to make it work like this:
      public function ___processInput(WireInputData $input): self { $name = $this->attr('name'); $value = $this->attr('value'); //input object includes always every input on the page, so lets filter out only inputs from this field //we need to do this, because the number of values is variable - so extract only values that starts with $name.'_' $nameAttributes = []; foreach($input as $key=>$value){ if(substr($key, 0, strlen($name.'_')) === $name.'_'){ $nameAttributes[$key] = $value; } } // loop through all inputfields of this fieldtype $time_values = []; foreach($nameAttributes as $nameAttr => $value) { $time_values[$nameAttr] = $value; } } //save it in the database $input->set('hours', serialize($time_values)); return $this; } The only important part of this code is the last part with the serialize function.
      After saving it will create a record in the database, but the value is always NULL (default value) (see below).

      Checking $time_values returns all the values, but printing out "$this" shows me that the property "hours" inside the Openinghours object is empty (see below) - so the mistake must be there, but I dont know where?!?!?!?
      [title] => Home [openinghours] => ProcessWire\OpeningHours Object ( [data] => Array ( [hours] => ) ) If I check the sleepValue() method or the sanitizeValue() - they are also empty. So it seems that the values will not reach these methods. I havent found a clear documentation of whats going on behind the saving process of an inputfield.
      As far as I know the saving process starts with the form submission. The values are in the POST array and will be processed by the processInput() method. Before they will be saved in the database they will be sanitized by the sanitizeValue() mehtod and afterwards they will be prepared for storage in the sleepValue() method.  The last step is the storage itself.
      Has someone an idea what is missing by storing values from multiple fields into 1 database column or has someone a working example of such a scenario on github to help me out.
      A clear explanation of the storage process will be also helpful.
      Thanks and best regards
    • By Pip
      Hi Everyone 
      I've been working on Processwire for two months now. Structuring the website as needed. Unsure why but I'm getting this one now. Seems my fields has crashed. 
      I've tried googling some answers but can't seem to find a step by step guide on how to rectify this. Any advise? Practically new on this. 
      TIA. 

    • By Rodd
      Hi everyone!
      I have a website in a production environment and I want to duplicate it in a local environment. I exported the content of the website (with the 'Site Profile Exporter' module) but I cannot use it actually. I've got an issue with the database. I imported this one in MAMP then.

      I also exported the pages (with the 'ProcessPagesExportImport' module), but I cannot import it to my local website because the fields don't exist. So I created this fields, but I have this error :
      How can I use the elements that already exist and are presents in my database? How can I duplicate correctly the templates, fields and pages?
      Thanks by advance
      PS: Sorry if my english is bad
       
    • By DooM
      Hello guys,
      I'm trying to figure out how to sync fields and templates between staging and production environments.
      I've found Migrations module by Lostkobrakai, but with use of it all the fields and templates must be created by API, which is kind of uncomfortable.
      I also tried ProcessDatabaseBackups module which can export only certain tables, but I don't think it's the best practice to do that.
      How do you guys solve this problem? It's very annoying to setup everything three times (dev, staging, production).
      Thanks a lot :)
    • By iipa
      Hi everybody!
      I have been reading about Multisite, but it kinda bugs me that every topic talks about having both admin and database same for multiple sites.
      I have a project where customer tests it by adding content to the site, while I still need to do some changes here and there in code, maybe some in database. If something crashes for a while, customer can't keep testing, which is a bit problematic.
      Is there any way that I could have two separate versions of one site ("production" and development) that share the same database, but are otherwise independent? Just the thought of having to migrate database every time I want to show client something new gives me anxiety 😁
×
×
  • Create New...