Strategy for user-created data storage

bmacnaughton · June 26, 2016

The situation I'm trying to arrive at a good solution for is how to store user-created data in a production environment. We're pre-production but the issue I'm running into is that storing user-data as pages creates problems when it comes time to update the production site with a new version (of our site, not PW).

Because I'm adding data as pages the data will be deleted if I create a backup and restore it to the production environment. I've spent a fair amount of time writing some DB compare logic to find differences but that will require manual updating for each difference. Because pages reference other page numbers via IDs and the production environment will be creating users and meta-information pages it isn't possible to save/restore those particular pages around a backup (from development) and restore (to production) of the PW database.

It seems like the most straightforward approach is to put the user-created data in separate tables so that PW information can be updated independently. That is pretty easy using $db or $database.

Are there any best practices for this kind of thing? Am I thinking about it wrong? How is this problem generally solved?

The user-data is 1) user accounts and 2) meta-information about uploaded files (stored in the file system).

LostKobrakai · June 26, 2016

The problem – as you're describing it – is trying to separate user-content from otherwise bootstrapped content. That's just not how it works. It's all content in pages and processwire does not care where it came from. You might set templates up to be only single use or other things, but in the end it's all pages. So you probably should not try to work/hack around that. Also db backups and dumps are almost never a sane way to update a production site as soon as any content is independently created on it, be it user or only client/editor changes. It's hardly a processwire specific problem.

If you need things to not be manual updates you've probably not many more options than these.

The common way for other web frameworks out there to have is to use migration files, which are files handling any "bootstrap" changes in the db and as they're files they simply are tracked by any used VCS and just run on each deployment. I've created a module to use this strategy with processwire:

There's also a migrator module from adrian, which tries to solve the "migrating pages / templates / fields" issue, but I'm not sure if it can handle e.g. changes of module settings or installing new modules and such things:

You could also look in non pw-specific migration tools like https://phinx.org/, but hand writing pw changes in mysql queries might not be fun. There's also this tool, which claims to track db changes, but I'm not sure how well it does work: http://dbv.vizuina.com/.

bmacnaughton · June 26, 2016

Thanks. Let me ask a couple more questions. I have seen many approaches that create change-files of databases but all of those presume that production-time content is separated from framework content.

Quote

The common way for other web frameworks out there to have is to use migration files, which are files handling any "bootstrap" changes in the db and as they're files they simply are tracked by any used VCS and just run on each deployment. I've created a module to use this strategy with processwire:

It would be great if module/template/page adds, removes, and configurations were all captured. Would your module work if we have user-content in a production system? Does it apply changes to the target system? I've read the thread but it's not exactly clear to me what exactly it does. Is there an example to look at?

7 hours ago, LostKobrakai said:

There's also a migrator module from adrian, which tries to solve the "migrating pages / templates / fields" issue, but I'm not sure if it can handle e.g. changes of module settings or installing new modules and such things:

Thanks. I think your observations on Migrator are correct. Nor does it keep page IDs (references) the same across DBs - the README indicates that the strategy is to rewrite them in the templates/*.php files but that seems a risky proposition (better not to embed them in the php files to begin with, but...)

7 hours ago, LostKobrakai said:

Also db backups and dumps are almost never a sane way to update a production site as soon as any content is independently created on it, be it user or only client/editor changes. It's hardly a processwire specific problem.

Agree and understand. Two points that (I think) mitigate this problem: 1) there are no client/editor changes for this particular site - we use it only ourselves because we don't have clients/editors and 2) the dump file only drops tables that PW "knows" about, so data in separate tables would not conflict in any way. It seems analogous to the way in which PW has `site/` and `wire/` directories. It can replace `wire/` without concern for issues with the site itself. I know we would have to copy current versions of templates and modules in addition to the DB but that seems to be relatively easy to script.

I would appreciate your thoughts on the last quoted point in particular on whether my thinking makes sense and to what extent your module would address the issues.

LostKobrakai · June 26, 2016

My module does not really do anything on it's own. It's more a managing helper for handling/running those migration files. Any actual changes you'd write on your own using the processwire api (in those files). It's like you would have done these changes in the backend, but it's done by code. There are some helpers included to reduce the amount of verbose code, but that's about it. With the processwire api at hand you'll probably not even need to think about the db, which is probably the biggest difference between something like phinx or other frameworks' migration files to the ones my module does handle. You'll not manually add any new tables or columns. You'll just add a field to a template or create a new page or something.

If you've handled all local changes using those migration files you can just put those on the production server and run them there as well. If everything went well all your changes should be applied as they were locally. Just keep in mind the module is not making any backups of sorts. This is the users responsibility.

Your points about db dumps are certainly valid, but it just is the case, that almost everything in pw is a page, even users. There's no point in working against this. E.g. how should pw know which users where created by you locally and which one online in the production system? Same for normal pages. How should pw devide between a list of countries you added as pages for a select field and some other pages, which represent dummy user content you work with locally, but are not to be migrated. Especially in a dynamic structure like the pagetree where things can be moved around super easily and templates can also be changed as fast there's no point in even trying to separate out anything.

bmacnaughton · June 26, 2016

Thank you again.

2 hours ago, LostKobrakai said:

Your points about db dumps are certainly valid, but it just is the case, that almost everything in pw is a page, even users. There's no point in working against this. E.g. how should pw know which users where created by you locally and which one online in the production system? Same for normal pages.

It is possible to distinguish each - there is an end-user field I've added to the user template and all the production pages share specific templates. But that still results in IDs in production (if I add a page) that are different than the IDs that would be created in a development environment.

My thought is to keep the production-users and production-pages in a separate table; that allows us to copy PW tables into a new DB without any conflicts. And it prevents us from forgetting changes done in the back-end admin tool. (Maybe I'm misunderstanding - it seems that migration files need to make the changes via API so we'd have to keep a good log of all changes made in order to create the migration files.)

What do you recommend?

LostKobrakai · June 26, 2016

In the ideal case you're not keeping log, but rather you make the local changes only via migration files in the first place. The actual number of those does not really matter as long as it's in logical steps. It might seem like a lot of extra work, but it really is – after creating a few of those – more a case of copy&pasting and adjusting a few key points. And the time saved when actually deploying changes is a lot. It makes the whole update process a lot more transparent and less error prone.

I'd strongly suggest you to not hardcode/rely on any page IDs in your app (besides maybe ID 1 and 2). They'll get out of sync faster then you can keep track of it.

The idea of separating tables might seems like a nice solution, but it would probably need a lot lot of core hacking to make that happen. Also it's not just the 'pages' table, but you'd need to split any field's table as well. It might work somehow, but I imagine it to be not worth the hassle and the inflexibility.

I'm not really recommending anythings. Migrations work great for me, but every use-case is different.

Ivan Gretsky · June 26, 2016

1 hour ago, LostKobrakai said:

I'd strongly suggest you to not hardcode/rely on any page IDs in your app (besides maybe ID 1 and 2). They'll get out of sync faster then you can keep track of it.

What are you suggestions about it? Only use selectors based on path?

bmacnaughton · June 26, 2016

Do you have some example code that uses the API for templates and modules? I already have used the API to create users and pages (they both are created by my code in our site already). Or is there a place where creating templates is documented? It's not clear which properties must be set.

For example, to create a page I do:

$p = new Page();
$p->of(false);
$p->template = 'template-name';
// the parent is unpublished, so "include=all"
$p->parent = wire('pages')->get('path=/page-parent/, include=all');
// set fields
$p->field_1 = $data['field_1'];
...
$p->addStatus(Page::statusUnpublished);
// make a unique name for the page - no one ever sees it
$p->name = make_name();
$p->save()

But I haven't seen the same thing done with templates. I also don't know what properties must be set, the syntax for creating field groups, etc. I'm hoping your migration scripts (or documentation) helps illuminate that.

Ditto for modules. They are probably less of an issue - it's easy to get a list of each one and there are fewer changes there on average.

LostKobrakai · June 26, 2016

It depends. I'm mostly using templates and tree/branch relationships between pages to select (other) pages. Paths are also good if you're really sure they stay consistent. I'm using ids really only if I have to (e.g. alternative user parent). In the end not using ids is also more descriptive – it's like it's with magic numbers in programming. And it's a lot more resilient. If someone does accidentally delete a page the id is gone, whereas any other selector will just work as soon as you recreate that page.

Edit: And the point, which is more in context to the topic. As soon as production and local development are running in parallel you can either not rely on ids or you have to (probably manually) edit every new id reference in your code on deployments, which has the not so nice side-effect of bringing your codebase out of sync.

bmacnaughton · June 26, 2016

I don't use IDs for the reasons you mention. I don't think I made my concern very clear.

Are there examples somewhere of using something like 'new Template()' and how to fill in fields and fieldgroups in order to create a new template via API? Ditto for modules. It's only by doing things via API (or admin tool) that I don't have to worry about the IDs. E.g., If I use DB primitives to copy specific pages or delete and add a field or page it will have different IDs in development and production. Having different IDs is OK if all the changes are made via PW's API. But I don't see the API for creating a template or configuring a module defined anywhere.

It's the internal book-keeping I'm worried about, not our own code. I already subscribe to not using hardcoded IDs.

LostKobrakai · June 26, 2016

There are probably various examples of the plain pw api usage around in the forums. I've also started this one, to allow for a user-contributed migration "snippets" directory: https://github.com/LostKobrakai/MigrationSnippets

Examples on how to use the specific helper migration classes of the module can be found here: https://github.com/LostKobrakai/Migrations/tree/master/migrations

bmacnaughton · June 26, 2016

Thanks for these.

Sign In

Strategy for user-created data storage

Recommended Posts

bmacnaughton

LostKobrakai

bmacnaughton

LostKobrakai

bmacnaughton

LostKobrakai

Ivan Gretsky

bmacnaughton

LostKobrakai

bmacnaughton

LostKobrakai

bmacnaughton

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Similar Content

solved Path settings appear in the wrong fieldset tab

Create fields and pages remotely

[Solved] Warning, the database time differs from PHP time

mdx Processwire and MDX export/import

Module: ImportShorthand

Browse

Activity

My Activity Streams

Support

Store

My Details