Jump to content

"Continuous integration" of Field and Template changes


mindplay.dk

Recommended Posts

EDIT: This project has been put on ice - I don't work with ProcessWire in my day job anymore, so this project is looking for a new maintainer. Knowing that, you can decide whether it's worthwhile reading through 7 pages of posts :-)

EDIT: The source code has been dumped on GitHub - feel free to fork and have at it.

There's one thing about ProcessWire that pains me, and I've brought this up before - it's the same problem I have with e.g. Drupal...

Because the meta-data (Configuration, Fields and Templates) is stored inside the database, once you have a live site and a development site, moving changes from the development site to the live site is, well, not really possible.

Repeating all the changes by hand on the live site is simply not an option I'm willing to consider.

Telling the client to back off the site and give me a day or two to make the changes, and then moving the whole database after making and testing the changes on a development site, is really a pretty poor solution, too.

I had heard some talk about a module in development, which would make it possible to import/export Fields and Templates? It sounds like that would mostly solve the problem.

Ideally though, I'd really like a solution that records changes to Fields and Templates, and allows me to continuously integrate changes from one server to another.

So I started hacking out a module, but I'm not sure if it's going to work at all, if it's even a good idea, or if it's worth the effort. I'm looking for feedback on the idea as such, more than the code I wrote, which isn't real pretty right now. Anyway, have a look:

https://gist.github.com/b7269bb7bd814ecf54fb

If you install this, create a "data" folder under the module's folder - migration files will be written there.

Basic overview of the idea and code:

  • The module hooks into Fields::load() and takes a "snapshot" of the current Field properties and settings on start-up.
  • It also hooks into ProcessField::fieldSave() and when a field is saved, it compares it's properties and settings to the snapshot it took at startup - if changes were made, it writes the previous name and updated properties into a file.
  • The same thing is not implemented for Templates yet, but would be.
  • The migration-files are named like "YYYY-mm-dd-HH-mm-ss.json", so that they can be sorted and applied in order.
  • Each file contains a JSON representation of a method-call - currently only updateField() which would repeat a previous set of changes and apply them to another installation of a site. (not implemented)

So basically, the module would record changes to Fields and Templates, and would be able to repeat them.

How those files get from one system to another is less of a concern - would be cool if servers could exchange migrations semi-automatically, using some kind of private key system, but initially, simply copying the files by hand would suffice.

I'm more concerned about the whole idea of capturing changes and repeating them this way.

Any thoughts? Is this approach wrong for some reason? Would this even work?

  • Like 11
Link to comment
Share on other sites

Great thinking mindplay! I agree that this is one of the major pain points on system that alter the database schema on-the-fly. I think your solutions is pretty similar that Laravel has (altough there it won't record those migration files, since it is pure framework, you would just write them). But system would run those on right order based on filename date.

Link to comment
Share on other sites

I'd love hear from @ryan on this one, before investing a whole lot more time in this - potential pitfalls, situations where this might or might not work?

Like for one, since this is an event-driven architecture, and some modules are going to tie into the actual controllers to do certain things - if I repeat a change at the API-level, say, updating a Field, that won't trigger the same controller-level hooks, and potentially the result might not be the same... (I guess maybe I should hook into Field and Template save-operations at the lowest possible level, to make sure that the results of any side-effects caused by other hooks are captured??)

Link to comment
Share on other sites

I also wonder if this is even the right approach, or if this is more of a back-fix for something that can't really be handled directly by the framework, as-is...

What I mean by that is, perhaps I'm trying to solve an architectural problem that should actually be solved at the architectural level, rather than by a module?

For instance, if meta-data (Fields, Templates, configuration) in general was kept in flat files, that would solve part of the problem. If meta-data was shielded from direct modifications, and could only be modified using command-objects of some sort, whenever those command-objects are run/applied to the meta-model, you could serialize and store those objects, and later on you could repeat those operations simply by unserializing those command-objects and running them.

I wonder if it would make more sense for me to attempt to build that into the architecture on a low level, rather than building a module that tries to work around the absence of a pattern that would make these (and possibly other) operations much simpler?

Link to comment
Share on other sites

This has always been one of my biggest issues with using a CMS - the whole deployment process and integrating updates from development - it always seems so dirty. If you can solve this problem it would be a massive help to myself and many other developers.

I am usually only interested in migrating template field changes so I've always thought that flat file configurations would be the simplest way to manage this. But I understand that any sort of database model changes will always be difficult to manage. I am guessing the final solution will probably have to involve some sort of backup incase something goes wrong - maybe you could use something like capistrano (http://capistranorb.com/). I've used capifony previously and it's a nice solution to just type one command line to deploy updates.

  • Like 1
Link to comment
Share on other sites

I don't know if this is possible or not, but I wonder if MySQL has any events/triggers you can set on ALTER commands. If there are then perhaps you can install something in the DB itself that can log schema changes via ALTER or CREATE commands to a 'structure_change_log' table (or whatever).

Link to comment
Share on other sites

I don't know if this is possible or not, but I wonder if MySQL has any events/triggers you can set on ALTER commands. If there are then perhaps you can install something in the DB itself that can log schema changes via ALTER or CREATE commands to a 'structure_change_log' table (or whatever).

That was actually one of my early ideas, but it suffers from the same problem as one of my later ideas: capturing the actual form-submissions, and then later simply repeating the changes by programmatically re-submitting the same post-data. Theoretically, doing so would reproduce the exact same result, with the same side-effects from hooks being run etc.

It sounds great and very simple at a glance - just repeating your SQL statement. But it won't work, because either forms or UPDATE statements are based on numeric keys. If I create a new Field or Template on the other system, the next ID will increase, which means you'll have overlapping indices when you deploy changes from another system.

So the updates must be keyed by name, not by ID - or you will have exact same problems as before, needing to freeze changes completely on one system before starting a round of changes on another system... and if so, you might as well deploy by just taking a snapshot of the database...

Link to comment
Share on other sites

I am guessing the final solution will probably have to involve some sort of backup incase something goes wrong

Ideally, the system should perform a "dry run", validating the changes prior to physically applying the changes.

I'd prefer to be able to connect up two systems by exchanging a private key of some sort - and being able to deploy via an admin screen, rather than having to drop to a command-line, which isn't always practical in a situation where somebody needs to make a quick correction to a live site...

Link to comment
Share on other sites

I'd love hear from @ryan on this one, before investing a whole lot more time in this - potential pitfalls, situations where this might or might not work?

Nice job. So far I can't think of any major pitfalls other than with Page reference fields. That is one fieldtype that has a parent_id property (or at least it can). But I don't see this as a major problem, as it would just require the person going in and double checking that it's pointed to the right place after the field is updated. Either that, or the module could be configured to disregard that particular property. We have this same issue with Form Builder export/import from site-to-site when a form contains Page reference fields. It's not been a problem--so long as you know there may be some loss of translation there, it's easy to correct. So overall I think the approach you are taking here seems like a good one.

Link to comment
Share on other sites

  • 2 weeks later...
Thanks, Ryan. Any thoughts on adopting a more robust (command-pattern) approach to the meta-model in general?

Can you clarify? I understand what you mean by command pattern, but not sure I totally follow the question in this context.

Link to comment
Share on other sites

I brought this up a long time ago, in this thread - in a nutshell, the thing that I've been working on (on and off) is a generic model of a model ~ hence the term "meta-model", which I'm not even sure if that's the real or correct term, or whether that's even a common thing.

Meta-models have lots of potential applications, but probably it's primary appeal is code-generation. There is of course no reason you can't consume a meta-model at run-time, which is what PW does. In PW, your meta-model is basically the Templates and Fields.

The problem is, you make changes to the meta-model directly, which means you do not have a useful history of how the meta-model evolved.

To use a practical example, in my generic meta-model, I have types like "class", "property" and "annotation" - just very generic constructs that are applicable to (or can be made useful in) just about any programming language.

Let's say I have an object that represents a "property" in the model, for example - let's say that $property->name is currently "first_name".

In my meta-model, the $name property is read-only - you can't change it directly. No values can be changed directly. Instead, if you want to change the name, you have to create and submit a command, so for example:

$metamodel->add(new PropertyNameChange($property, 'last_name'));

PropertyNameChange is a command object capable of actually changing the name of a property. To reiterate, there is no other way to change the name of a property in the meta-model.

When that command-object is added to the meta-model, it is automatically executed, so the change actually gets applied - but the command-object itself also gets serialized and appended to the model's history. This enables me to repeat any change made to the meta-model, in sequence.

This is very similar to how database schema migrations work - except you're not limited to migrating changes to the database schema, which really is an implementation artifact. You can now migrate changes to the entire model, and the entire history of the model becomes repeatable.

Another important difference, is that the meta-model (and command history) gets written to flat files, rather than to the database. This enables me to make changes to a model, and check those changes into source-control, then repeat the commands on another system to continuously integrate the changes.

Applying the same idea to ProcessWire, that would mean blocking Templates and Fields from direct modifications, writing serialized copies of Templates and Fields, and serialized command history of changes made to both, into flat files. This would be event-driven, so that you could hook into commands being added or executed...

It has always been my philosophy that writing data and meta-data into the same back-end (e.g. database) isn't right... I think the problem with most CMS is that nobody bothers to make the distinction between data and meta-data - we tend to think, "oh, it's all just data", and then shove it all into the database. But meta-data is actually closer in nature to code, than it is to data - because it drives decision-making in the code.

Things like "title" and "body" do not directly drive decisions, but things like "field type" and "template filename" directly drives decision-making, and therefore does not belong in databases, anymore than your source-code does.

Some CMS take this misunderstanding to an extreme, and actually write PHP code into the database, iiiirk!... ;)

  • Like 2
Link to comment
Share on other sites

Thanks for the explanation. I get what you are saying now. I'd be supportive of anything that achieves what you are talking about so long as it doesn't change the nature of the API interactions. For instance, I wouldn't be enthusiastic about having to literally type out command pattern syntax from the API code (or asking our users to), but would be fine with that sort of pattern happening behind the scenes.

It seems like you've got a good start on this with what you were already doing. If there are any hooks, methods or states I can add that would facilitate it, I'll be glad to. For instance, after a field/template has been populated at load time, it could switch from a population state to a command state. It's already doing that in a sense with the setTrackChanges(), trackChange(), and getChanges() methods, but they are obviously for a different purpose.

It has always been my philosophy that writing data and meta-data into the same back-end (e.g. database) isn't right... I think the problem with most CMS is that nobody bothers to make the distinction between data and meta-data - we tend to think, "oh, it's all just data", and then shove it all into the database.

We have a pretty clear separation of data from meta data in ProcessWire, but it is by table. I totally agree with separating data from meta data, but think the distinction between flat file vs. table (or any other storage mechanism) hardly matters here. Either can be equally accessible for any purpose. Seems like storage medium shouldn't even be part of the equation. These things are all flat files behind the scenes. I'm fine with duplicating it into some other flat files if that adds a convenience factor for a particular purpose.

Things like "title" and "body" do not directly drive decisions, but things like "field type" and "template filename" directly drives decision-making, and therefore does not belong in databases, anymore than your source-code does.

Logical separation obviously makes sense. But isolation of these things from one storage system to another (database vs. file) seems to me like it increases the chances of corruption. If I need to restore a database or a public_html directory, I would be distraught to find the data and the schema on different file systems. For the same reason, it would be disheartening to find SQL dumps that have nothing but INSERT statements and no schema.

Some CMS take this misunderstanding to an extreme, and actually write PHP code into the database, iiiirk!...

Looking outside it all looks bad. Nobody likes eval(). And there is no good place to store code that will be run through eval(). At least in the cases where I've seen PHP code stored in a database, it was a matter of pursuing the lesser evil as it relates to the context of a web site (or perhaps the more secure one), not a misunderstanding or a blind decision. The eval() function really should be renamed evil(). If you are presented with a situation where the requirements call for input of PHP code through some form, nothing you do will look good to people looking from outside. I have been asked a few times to introduce some ability to enter PHP code in text page fields (like you can do in Drupal for instance). I understand the convenience it may bring, but have completely avoided it thus far.

Link to comment
Share on other sites

I totally agree with separating data from meta data, but think the distinction between flat file vs. table (or any other storage mechanism) hardly matters here.

The problem is source control - if it's data in a database, there is no file you can check in, compare version history or review changes made by other team members. Automated deployment becomes harder. You could dump SQL statements and check those in, but auto-increment keys make it extremely difficult to integrate, say, two different newly added fields or templates from different developers, since they will have the same primary keys. Just one example of many.

Because metadata in nature is closer to code than it is to data, the metadata usually needs to be kept strictly in sync with the code - you usually have code that depends on the metadata, because the metadata "implements" (or at least strictly defines) the model. As opposed to data, which is usually less vulnerable to asynchronous deployment.

I don't know if source-control and automated deployment are very important to most people - perhaps not a big concern for most smaller websites?

  • Like 2
Link to comment
Share on other sites

The problem is source control - if it's data in a database, there is no file you can check in, compare version history or review changes made by other team members. Automated deployment becomes harder.

Integration with other source control tools -- that makes sense, I understand now. For the reasons mentioned before, I think there are some drawbacks to keeping the templates/fields on separate file systems from the tables that represent them. But the benefits you've pointed out make sense too. Seems like the best route is to choose both, rather than one or the other. I like the idea of the system maintaining a running file that keeps track of the updates in a manner that can be played back on another instance. So perhaps that's what we should aim to do. And I think this is what you were originally thinking with the module you've put together, but we could take it a lot further by making the capability part of the core. So lets keep the discussion going and see what we can do.

I don't know if source-control and automated deployment are very important to most people - perhaps not a big concern for most smaller websites?

I think it just depends on the management strategy of the website, more than the scale of the website. For the sites that I manage (a few of which are quite large), I keep local development copies that clone daily from the live server. But I keep my /site/templates/ version controlled with Git. For every 5 hours of code that I might be writing, I'm spend 5 minutes in the PW admin adding fields, templates or modules… though it's rare that even that is necessary. So this part has never represented a bottleneck. I simply re-create those fields/templates on the live server before pushing my template changes. So on this end, this particular part of development has never been a problem to solve because it doesn't actually consume any more time than any alternatives might. But I recognize my projects are not the same as other people's projects and what's not an issue for one may be for another. The ability to version control and deploy things like templates and fields makes good sense to me even if it's not as applicable to my own work. Not to mention it sounds fun, and I look forward to collaboration here.

Link to comment
Share on other sites

That actually sounds like a great approach for read-only sites. I suppose, if you used an external (Javascript) service for e.g. comments on a site, that approach would even still work - as long as no new data is being added to the public site.

Continuous integration has become the default for me, I never even consider anything else - but I'll have to keep this approach in mind for projects that meet those requirements. Some clients would probably see it as an advantage - that they can tweak and adjust the site, in cooperation with us, and decide precisely when they think it's ready for the public.

  • Like 1
Link to comment
Share on other sites

Drupal uses Features. Export configuration settings from database to a module to disk.

Again, I can only see this as being a work-around that doesn't address the root of the problem.

I would strongly prefer working with core concepts that do not present those problems to begin with.

To underline my point: look at the Features module for Drupal. 100K sites are using it, which means it should have been a standard feature, and the core architecture should have been designed with these requirements in mind - it should not have been an afterthought. Look at the number of open issues and bug-reports against this module. Look at the complexity - thousands of lines of code and special handling for lots of different modules: blocks, fields, images, menus, taxonomies, users, etc...

Introducing so many new concepts and potential points of failure, just to work around this one problem - to me, this is an indication of core design failure. This is why good software must break backwards compatibility with major revisions - design issues have to be addressed when they're discovered, and with better design, not just with more code.

Just my opinion :)

Link to comment
Share on other sites

That actually sounds like a great approach for read-only sites. I suppose, if you used an external (Javascript) service for e.g. comments on a site, that approach would even still work - as long as no new data is being added to the public site.

In our case they aren't read-only sites in terms of content. I'm mirroring their content to a staging server so that I'm working from a reasonably new copy, but I don't ever push that content back to the site. They have staff making edits to the sites all day long and they decide when/where content goes. But the staff focuses just on content, and they don't even see the Setup menu. I run the staging server for everything but the actual content. I'm pushing mostly changes to template files and modules, and [less often, when needed] templates and fields. When another developer is involved, they stage to my server too, but I'm the one that does the migration to live (good to have a gatekeeper I figure). There are occasionally times when we do need to push content additions, like adding a few hundred pages at once or something. But I use PW API code to do this, and it becomes a script that gets played first on staging, then on live. Things like database IDs aren't a factor since it is scripted rather than imported data. The same approach could be taken with those schema changes (templates, fields) but in my case they don't represent much of the changes, so it's usually quicker to just issue those changes manually. But my case may not be the norm. And if the core (or a module) did keep track of this for me, I'd probably use it.

Link to comment
Share on other sites

  • 1 month later...

Can anyone shed any light on the current status of this code/idea/module?

The idea of recording changes into a "log" os sorts so they can be mirrored between stage and live versions of a CMS is something I've seen work very well i nthe the past. an ex-colleague wrote a plugin for anotehr CMS that did this exact thing, and it worked just fine.

I certainly consider this to be a very useful feature and the kind of functionality that I know many, many other web developers often need.  Most places I have worked including big publishers, design agencies and whatnot all had development->stage->live workflows and it really is a pain without this kind of thing.

It would certainly be a MAJOR selling point for new users (developers and publishers/editors etc) coming to PW.

Link to comment
Share on other sites

Not a problem, I totally understand, most devs are in the same position!

I'll give it some thought myself.  I'm sure that there must be some way to smooth the migration process, even if it isn't a fully-automated tool.  It might be possible to at least figure out some guidelines to help people know how to manually export SQL at one end and then import the right bits at the other end.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...