Jump to content

Module proposal - page version control


Rob
 Share

Recommended Posts

I'm sure this must have come up before in the forums but I couldn't find anything.

It would be nice to have a history of page version and rollback / rollforward ability. If a user edits a page and wants to preview their changes before going "live", unless I am mistaken this isn't currently possible. You could mimic this behaviour but it wouldn't be a clean workflow.

I was thinking about the repeater functionality and this essentially seems to re-utilise existing page behaviour. Perhaps it would be possible to utilise a similar approach by creating "mirror" copies of pages under some /admin/versions location in the page tree and use this as a storage for page versions. Theoretically it would then be possible to preview versions, rollback etc and still use core PW functionality that is already in place rather than having to write a whole mass of new code.

I am happy to have a crack at this myself but I thought if I put the idea out there the community will likely have much more advanced module coders than me who may be interested in the idea.

To recap:

If we could have a save->preview->publish workflow by saving copies of pages under a "hidden" location in the page tree then it would create the ability for users to safely make edits, check them etc before going live, and also revert to old version if necessary.

Any takers?!

Link to comment
Share on other sites

Actually it's not going to be in 2.3, but it is a long-term work in progress. Versioning that is respectful of resources is rather complicated, especially when it gets into things like automatically created versions with file-based assets, repeaters, and the like. The system I'm working on keeps the version data outside of the pages system and page tree. But I'm a long way off from figuring out all the details. :)

Link to comment
Share on other sites

I think it would be great to have some basic versioning, even just for textfields would be very beneficial. I know it goes greatly more complicated with files and repeaters, but most (or at least big) benefit comes from just text versions. Page history module is also first item on roadmap, so many people might have high hopes for it: http://processwire.com/about/roadmap/ (if it's not coming for 2.3 then maybe to strike from list and move to 2.4+ part of that list).

Also - would it make any sense to have field history instead of page history? Would it be any simpler to implement that way?

  • Like 1
Link to comment
Share on other sites

The page history module may yet make it into 2.3, but not as a full blown versioning system. However, it may be able to do some of what you mentioned with text fields. But the point of the page history module is more about seeing what changed, when, and by whom. Though tracking this info crosses over into the territory of versioning in many respects. I guess I think of full blown versioning as including all page data (files, repeaters, etc.), but it's a difficult prospect without actually making clones of the pages themselves (like Rob mentioned). Maybe I should stop trying to think about these things and focus on the stuff that can be versioned efficiently. Your idea of a field history is compelling, I need to give some more thought to it.

  • Like 1
Link to comment
Share on other sites

I keep meaning to mention something I've seen in a CMS I used about 8 years ago (feel old now) that was pretty bloated.

They had a cache that basically cached the parsed field contents for a page into a text file. Versions were simply stored as that page name with a date stuck on the end as the filename, and the system fed through the latest version and handled it that way, keeping X versions.

It was simple but it worked and didn't put unnecessary amounts of data in the database that way.

Probably not the route we want to go, but I'm wondering if we might be trying to over-complicate versioning - there's certainly a worry that it could add some serious overhead to the database even for the slightest tweak (probably why Apeisa's idea of versioning fields that have changes rather than the whole page sounds appealing).

Link to comment
Share on other sites

That's not far off from what I'd been working on with versioning, which was to store page versions in flat JSON encoded strings in a DB table (though could just as easily be stored as text files). Any ProcessWire page can be reduced to a PHP array of basic types (strings, integers, floats, arrays), which means it can be reduced to a JSON string really easily. This lends itself well to version storage, except for assets where only references are encoded (like files and page references).

But I have to admit that the more I think about Antti's proposal to store this at the field-level, the more Iike it. I would create another interface that could optionally be added to a Fieldtype's implements section in the class definition (FieldtypeWithVersions or something like that), and give the individual fieldtypes responsibility for if, how and where they store their versions. That would enable us to get up and running with versions quickly (for text fields at least). It would also seem to provide a nicer level of control to the user, as you don't have to keep track of what is changing in multiple fields at once. And, it would let you to enable versions on fields that you want it, omitting on field where you don't, making things more efficient. Lastly, it would leave the page-structure out of versions completely (where the page lives in the tree), which struck me as a potentially dangerous problem with versions. It could work something roughly like this:

post-2-0-14768300-1342446030_thumb.png

You hover over the revisions label and get a summary of recent revisions. You click on one, it could pop up a modal showing you the revision (maybe with a compare tool) and a button to revert to it or cancel. It just seems like a really easy-to-use solution for the user.

  • Like 6
Link to comment
Share on other sites

I really like the sound of that.

I would still suggest putting in an option to revert the whole page to a specific date as well, just because that's what people are used to with version control, but I like this method a lot as there's no other system where you could really do it this way without it being really difficult.

I'm not saying it would be easy this way, but since all fields are tables already it sure makes it a bit simpler than it could be.

Link to comment
Share on other sites

In fact, wouldn't a very basic way of doing it be to simply have a version, author and datestamp field added to every field table? Or am I over-simplifying things? Could perhaps handle file/image field (and others) versioning simply that way too.

That would add some overhead though I guess on larger sites by leaving versions in the same tables though now I think about it. Oh well, I'm sure you'll come up with a good solution.

Link to comment
Share on other sites

@Pete, I'm afraid that you're absolutely right with that point about larger sites, so I wouldn't suggest using same table for version data either :)

Clearly Ryan has different path in mind already, but I just wanted to throw in this article about MySQL versioning I read some time ago and found quite interesting: http://www.jasny.net...ing-mysql-data/. The idea there is to utilize MySQL triggers and that way move (most of) the versioning logic to database layer, which IMHO is a great idea. Naturally there are some drawbacks with this method too:

"There are some situations where this solution as a bit to basic. A record might span across multiple table, like an invoice with invoice lines. In that case, we don’t want to revision each individual invoice line, but the invoice as a whole." (etc.)

Anyway, just throwing in some food for thought :)

Link to comment
Share on other sites

  • 6 months later...

I am also in a need for a feature like that to convince a customer to do the switch.

I got used to it with Drupal 7 ("diff"-module) and did not experience it slowing down things, or to be über-complex.

http://drupal.org/project/diff

Cool stuff. It wasn't too complicated so, a.f.a.i.k. it is limited on node text and title fields. You could easily navigate the versions, see all (text) differences highlighted and could roll back (which means that a new version is created with the content of the selected older version, so you could even revert that change later).

Anyone knowing this module? Maybe this is a route a processwire module could go as well.

  • Like 2
Link to comment
Share on other sites

Originally inspired by a comment by Pete on Process Changelog thread, I've been playing with (and just pushed to GitHub) an experimental version control module for text based fields, which does some of the things discussed here. Somewhat coincidentally it also bears quite a bit of resemblance with the mockup Ryan posted above UI wise.. :)

post-175-0-19153300-1361028534_thumb.png

It's not production ready, only supports storing content to database (though adding another mechanism for storing the bulk of content wouldn't require much work and most likely will get added soon) and currently only supports Text and Textarea fields. I'm looking into this subject more closely once I find some free time for it, but in the meanwhile anyone interested can check it out, use it, fork it etc. as long as you're aware of the fact that it's more of a proof-of-concept than anything else and that it's far from a perfect solution in more than one way.

Regarding Drupal 7 diff module posted above by @ceberlin, this module doesn't provide that kind of features at the moment, though it does store all necessary data (and a bit more) to enable those at some later point. I was originally planning to only store diff data, but since PHP doesn't have native method for doing that I ended up storing full content on each edit. Not very efficient, but for small-scale use (or proper limits, to be added later..) it should be good enough (for now.) I've also omitted many other features, such as those mentioned by Ryan above (modals etc.), for the sake of simplicity and feasibility.

Anyway, if anyone is interested to try it out I'd be very happy to hear any comments on this one :)

  • Like 7
Link to comment
Share on other sites

Teppo, this is really a fantastic proof of concept! Very well put together and seems very much fully-functional to me. Worked great in my testing here. It actually seems like much more than just proof of concept--it's quite stable! 

Thanks for your great work here. I look forward to seeing this evolve. Let me know anything I can do to help. This is a great addition and perhaps something that should find it's way into the core. 

A couple of minor optimizations to mention, at line 263 of the main module: 

// $page = $this->pages->get((int) $this->input->get->id);
$page = $this->page->process->getPage(); 
// if (!$page || !in_array($page->template->id, $this->enabled_templates)) return;
if(!$page || !$page->id || !in_array($page->template->id, $this->enabled_templates)) return;
  • Like 1
Link to comment
Share on other sites

Thanks Ryan! Just pushed those optimizations to GitHub.

Regarding development of this module in general, to be honest I don't have a very good plan right now. Other than making minor improvements here and there, I've planned adding some basic features such as proper cleanup of old entries, option for saving actual content to files on disk and possibly a JavaScript-based diff feature etc. Anyway, if you have any ideas what should be included or to what direction this module could move in order to benefit more users, I'd be more than happy to hear your opinions.

For an example I wasn't originally planning to support anything other than basic text fields, but support for images/files would definitely be nice addition at some point. Problem is that it would also add quite a bit of complexity to the module (perhaps that should be another module entirely?) and in the worst case enabling a feature like that could end up consuming a lot of disk space without user even realizing it. That's one idea I'd love to take further, but it will clearly require proper planning first.. :)

  • Like 1
Link to comment
Share on other sites

I can't seem to get it working on my local linux test box. Pretty much default installation. Templates and field selections are empty. "Enable for these fields" field first allowed for selections, but after revisiting the module page those selections were gone and nothing could be selected anymore. "Enable for these templates" was always empty. 

Fieldtype selection has these options available and nothing selected:

FieldtypeFieldsetClose

FieldtypeFieldsetOpen

FieldtypeFieldsetTabOpen

Both database tables remain empty.

post-18-0-38603900-1361129583_thumb.png

  • Like 1
Link to comment
Share on other sites

Thanks for reporting this, Antti. I managed to reproduce this by downgrading my test ProcessWire installation from 2.2.13 (though I've upgraded some files independently, so this might not be 100% correct number) to 2.2.9. It was a selector issue at config method.

Could you try if updating the module to latest version available at GitHub (0.0.2) fixes the problem for you?

Link to comment
Share on other sites

Regarding development of this module in general, to be honest I don't have a very good plan right now. Other than making minor improvements here and there, I've planned adding some basic features such as proper cleanup of old entries, option for saving actual content to files on disk and possibly a JavaScript-based diff feature etc. Anyway, if you have any ideas what should be included or to what direction this module could move in order to benefit more users, I'd be more than happy to hear your opinions.

I think that all the things you mention sound good. A few questions comments though:

  • What would be the benefit of saving content to disk (vs database?). I'm not sure that it really matters to the user where it is stored, so wanted to inquire more about your thoughts here. 
  • What would define old entries? I'm guessing in some cases, people would like to just let it go forever (disk space is cheap).
  • Javascript diff feature sounds awesome. Though also have to admit, just being able to toggle between the different versions and see the immediate change (the way you have it working now) is kind of a nice "diff" effect too. :) You don't necessarily need anything else for a version 1.0. 
  • How does it scale? Meaning, what happens when you've got 100 versions. I haven't tried it yet... and maybe you've already figured this out. But I was thinking maybe it shows the most recent 10 edits when you hover the icon, and ajax/paginates them somehow after that? (or opens a modal to a dedicated Process when you click more?)
  • How does one handle deleting versions? I was thinking it doesn't need to be manual or interactive, but just a global time or quantity setting, i.e. "only keep last 50 versions" or "only keep versions for [n] days" or something like that. But having the option to keep them forever is also good… perhaps the behavior when the "[n] days" is left blank. 
For an example I wasn't originally planning to support anything other than basic text fields, but support for images/files would definitely be nice addition at some point. Problem is that it would also add quite a bit of complexity to the module (perhaps that should be another module entirely?) and in the worst case enabling a feature like that could end up consuming a lot of disk space without user even realizing it. That's one idea I'd love to take further, but it will clearly require proper planning first..

Just supporting text fields for a 1.0 version seems ideal. This probably covers the vast majority of needs. Versioning of files/images sounds fun, but you are right that it's an entirely different task on the development side, since it has to manage files. And these fields aren't just files, but sort order, description, tags… and more in the future. Probably too much work for too little value here. So if it were me, I would just focus on those text fields. I think that the vast majority of versioning needs for files/images could probably be covered just by a file "trash" (whether global or page specific) where one could retrieve old files if they ever needed to… but that would be a different module. 

To summarize my thoughts: you've already got something great here that is already hugely useful. I'm not sure what more you need to take it beyond proof-of-concept (seems quite functional as-is), but the only thing I would consider is just making sure it can scale time and quantity. And then get version 1.0 out when ready. I think a lot of us can't wait to start using this. :) If there is anything that I can do to help (code, testing, etc.), I'm at your disposal. 

  • Like 2
Link to comment
Share on other sites

What would define old entries? I'm guessing in some cases, people would like to just let it go forever (disk space is cheap).

Just a thought about legalities, on certain sites there may be either a legal requirement or necessity to keep old versions for ever.

For instance, records about users may be required to kept so that changes in user details are recorded, and news sites may want to keep old versions in case of legal disputes over the accuracy of published information.

Link to comment
Share on other sites

Teppo: settings screen works now, but when editing page I get this error: 

Error    Call to a member function getPage() on a non-object (line 269 of /home/apeisa/Apache/Roskis/site/modules/VersionControlForTextFields/VersionControlForTextFields.module)

Link to comment
Share on other sites

Teppo, this is really a fantastic proof of concept! Very well put together and seems very much fully-functional to me. Worked great in my testing here. It actually seems like much more than just proof of concept--it's quite stable! 

Thanks for your great work here. I look forward to seeing this evolve. Let me know anything I can do to help. This is a great addition and perhaps something that should find it's way into the core. 

A couple of minor optimizations to mention, at line 263 of the main module: 

// $page = $this->pages->get((int) $this->input->get->id);
$page = $this->page->process->getPage(); 
// if (!$page || !in_array($page->template->id, $this->enabled_templates)) return;
if(!$page || !$page->id || !in_array($page->template->id, $this->enabled_templates)) return;

Shouldn't this be?

$page = $this->process->getPage(); 
  • Like 1
Link to comment
Share on other sites

Ryan: thanks for your feedback! As soon as I get some free time for this, I'm going to start making some improvements - judging on your comments and my own thoughts they're going to be mostly about scalability (both in terms of both UI and data.)
 

What would be the benefit of saving content to disk (vs database?). I'm not sure that it really matters to the user where it is stored, so wanted to inquire more about your thoughts here.

Main reason for this would be somewhat more manageable (and scalable) data structure. Currently metadata is saved to one table, content to another - which in larger use could result in very large (and thus pretty slow) content table. By using individual files instead of this table might result in better scalability.. though flat file hierarchy probably wouldn't be such a good idea either, since it could result in disk-related bottlenecks in the long run.

Another idea would be to just split that content table into smaller chunks. One table for each field, or perhaps each fieldtype?

I'll have to take a bit closer look at this one. I'm not even sure if it's really such a huge problem or if I'm just overcomplicating things. What do you think? :)

What would define old entries? I'm guessing in some cases, people would like to just let it go forever (disk space is cheap).

Probably something similar to what I'm doing with (both) history modules would make sense here too; let the user define this via module settings. Like Joss pointed out, some sites might want to hold on to their data "forever", though I'd like to suggest that abacking things up properly might make more sense in those cases. On the other hand, on many occasions data older than, say, 6 months to a year, wouldn't be of much use to anyone. (Another CMS I've been using stores similar data for approximately 6 months in "easy to get" format, which has been more than enough in 99% of cases.)

Javascript diff feature sounds awesome. Though also have to admit, just being able to toggle between the different versions and see the immediate change (the way you have it working now) is kind of a nice "diff" effect too. :) You don't necessarily need anything else for a version 1.0.

I guess you're right. It's just that I saw somewhere a nice demo of this and thought it might be really fun to have. Let's see :)

How does it scale? Meaning, what happens when you've got 100 versions. I haven't tried it yet... and maybe you've already figured this out. But I was thinking maybe it shows the most recent 10 edits when you hover the icon, and ajax/paginates them somehow after that? (or opens a modal to a dedicated Process when you click more?)

This is a very good point and something I'm going to focus on. Short answer is that it scales "relatively well" at the moment; UI wise this is handled by max-height + overflow-y: auto, but all revisions are loaded and processed when page edit is opened. It could definitely use some extra measures for large amounts of stored revisions, probably something similar to what you've described here.

How does one handle deleting versions? I was thinking it doesn't need to be manual or interactive, but just a global time or quantity setting, i.e. "only keep last 50 versions" or "only keep versions for [n] days" or something like that. But having the option to keep them forever is also good… perhaps the behavior when the "[n] days" is left blank.

This is very much related to some of the things I mentioned earlier in this post. Removing individual rows might make sense in some cases, but most that's not very high on my list here. I'm definitely going to add a configurable time limit, but would also like a quantity limit (like you're describing here.) If I recall correctly latter was also suggested by Nik and now I've got double the incentive to make it happen ;)

  • Like 2
Link to comment
Share on other sites

Teppo, thanks, now it works. It doesn't work on master though:

  • If using master branch, I do get icon but don't see the hover. Instead of JS error: Uncaught TypeError: Object [object Object] has no method 'on'
  • Above is probably because of jQuery version. I think it is just fine to create modules like this just for latest and greatest in mind, so focusing on 2.3 should be fine. So you can forget me about even mentioning this.

Now that I got it working... just WOW! Absolutely brilliant solution and implementation. Congratulations on this Teppo and big thanks for your work on this. Very solid and very nice work! I just used 10 min of my life just clicking and watching that nice flashing animation with big smile on my face.

I think this should definitely be on core. Not installed by default, but definitely ship with core. And maybe pack in with upcoming 2.3 (I think it can easily wait week or two to get this beauty in).

And now that we have this "proof of concept" (I also think this is much more than that) - I really like the field level of version control. Makes a much more sense than page level version control. Thanks Ryan to that UI mockup btw, it was spot on!

  • Like 1
Link to comment
Share on other sites

Main reason for this would be somewhat more manageable (and scalable) data structure. Currently metadata is saved to one table, content to another - which in larger use could result in very large (and thus pretty slow) content table. By using individual files instead of this table might result in better scalability.. though flat file hierarchy probably wouldn't be such a good idea either, since it could result in disk-related bottlenecks in the long run.

I see what you mean. Row quantity really isn't a problem so long as the indexes are good. So I suspect even a single table would scale just fine. But if you still prefer a file-based storage option, using the existing $page->filesManager->path might be a good place to do it, as ProcessWire will maintain one directory per page (i.e. /site/assets/files/[id]/. You wouldn't want the version files to be publicly accessible, so you'd probably want to put them in filenames that PW prevents direct access to (like .php). 

Another idea would be to just split that content table into smaller chunks. One table for each field, or perhaps each fieldtype?

Either of these seem like good options too. One table per field would be consistent with the way PW stores its fields... though that's for DB-related reasons that wouldn't apply to the case we are talking about. I guess the main benefit of splitting things into separate tables per field or fieldtype would just be to allow the possibility of versioning different data types. But your table with the single mediumtext column still allows for it, because all PW field data stored in the DB can be reduced to an array or JSON string:

$data = $field->type->sleepValue($page, $field, $value); // convert to array 
$json = json_encode($data); // convert to JSON string

Once it is stored as a string in the DB, then it's no longer "findable" except by possibly a fulltext index. But it seems like these versions probably would not need to have that search accessibility for storage anyway. 

I'll have to take a bit closer look at this one. I'm not even sure if it's really such a huge problem or if I'm just overcomplicating things. What do you think?

I think you've got a lot of good ideas here. Though I'm not sure that you necessarily need anything more than what you've already got, with regards to data storage. From what I can tell, the data doesn't need to be selectable by anything in the data itself... just indexed attributes of the data: page_id, field_id, user_id, date. That combination would always be unique regardless of row quantity. It doesn't seem like you'd ever have to worry about potential for "full table scan" in MySQL. But there may be things I'm not thinking of. I haven't had my morning coffee yet. :)

I do agree that splitting tables by fieldtype or field sounds better… but I don't know if it would ultimately matter. It would be interesting to take a table with 1m rows and a field_id index, and compare that to 10 tables with 100k rows and no field_id index. Assuming just straight selection from the table by page_id (no joins) and date-sorted results, would it make any difference? I have a feeling it would not, but haven't tried. But having it in one table you could select all field versions for a page in 1 query and your date-based maintenance could do its thing in 1 query.

This is a very good point and something I'm going to focus on. Short answer is that it scales "relatively well" at the moment; UI wise this is handled by max-height + overflow-y: auto, but all revisions are loaded and processed when page edit is opened. It could definitely use some extra measures for large amounts of stored revisions, probably something similar to what you've described here.

From a scalability standpoint, probably the only thing it would need there is just some limit on the number of versions it would load at once [25, 50, 100]? I'm assuming it doesn't load the actual version "text" here (it looked like you were doing that with ajax), but just the attributes. So maybe it would just need some kind of pagination or "older versions" link that loads the next block of [25, 50, or 100].  

Either that, or you could just have version 1.0 say "we will store a maximum of 100 versions and no more", which would probably be fine for 99.9% of folks. No pagination worries. I think this is what the 37signals guys would say to do, at least for version 1.0. :)

  • Like 2
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...