Jump to content

Oh no, not another migration module!


MarkE
 Share

Recommended Posts

Well, yes and no.

Two migration modules already exist in ProcessWire, but neither suited my needs:

  • “Migrations” by @LostKobrakai seems effective but quite onerous to use and has been deprecated in favour of “RockMigrations”
  •  RockMigrations by @bernhard is simpler and has a nice declarative method: migrate(). However, it is ideally suited to “headless” development, where the API is used in preference to the Admin UI. This is great for professional PW developers, but for occasional developers like me, it is much easier to use the UI rather than just the API.

In addition there @adrian's ProcessMigrator which is designed for migrating page trees.

Concept

I wanted something to achieve the following:

  1. To allow development to take place (in a separate environment on a copy of the live database, or on a test database with the same structure) using the Admin UI. When finished, to permit a declarative approach to defining the migration and implementing it (again in the UI).
  2. To allow testing of the migration in a test environment on a copy of the live database.
  3. To allow roll-back of a migration if installation causes problems (ideally while testing rather than after implementation!).
  4. To provide a record of changes applied.

Although not originally intended, the module I developed also allows the selective reversion of parts of the database by exporting migration data from a backup copy.  Also, if changes are made directly on the live system (presumably simple, low-risk mods – although not best practice), it allows reverse migration to the development system in a similar fashion.

I should emphasise that what I have built is more of a 'proof of concept' than a fully-fledged module. The code is pretty hacky and uses some stuff outside of the module itself. Lots of validation is missing. However, I have used it successfully in a number of small tests and a medium-sized live migration. If there is sufficient interest, I will tidy the code and make it available, but it would still need input from better coders and PW-savants than me to make it into something more widely usable.

EDIT: Please note that the module has moved on a bit from this original post - the design has changed somewhat to make it more robust and flexible and additional features have been added. Please see the help file for full details. I still consider it to be at alpha stage, however, so use with care - test before making migrations and always take backups first.

Design

The module has the following principal components:

  • A PW module “ProcessMigrateData”, bundled with a bootstrap migration in the same ProcessMigrateData folder, to be placed in the site/modules folder;
  • A Page Class “MigrationPage” to be placed in the site/classes folder;
  • Php files migrationActions.php and migrationControl.php to be placed in the site/templates/RuntimeMarkup folder (and migrationActions.js to be placed in site/templates/RuntimeMarkup/scripts).
  • There are also a methods which need to be put in class DefaultPage and a functions in the init.php file.

The module requires the FieldtypeRuntimeMarkup module.

Migration definitions are held in .json files in the ProcessMigrateData/migrations/{migration name} folder (I might move this). This folder contains up to 2 sub-folders - “new” and “old” which each contain a file called a migration.json file, which defines the scope of the migration in terms of fields, templates and pages, and also one or more of fields.json, templates.json, pages.json and remove.json. The first 3 of these files contain the field, template and file definitions within the migration scope and the remove.json file simply lists fields, templates and pages to be removed.

These migration files are mirrored by pages of template “Migration” under a parent /migrations/ of template “Migrations”. The mirroring happens in two ways:

  1. If a new migration is created in the module (from the Setup -> DB Migration menu), then initially no json files exist. The json files are created, after the scope of the migration is defined on the page, by running “Export Data” from the eponymous button.
  2. If json files exist, but there is no matching migration page, then the latter is created by the module on accessing the DB Migration admin page. In this case, we are in the “target” database so there is no “Export Data” button, but instead “Install” and/or “Uninstall” buttons.

Migrations therefore either view the current environment as a “source” (type 1) or a “target” (type 2).

Installation

The module creates templates called Migration and Migrations and a page below the root named ‘migrations’.

Open the admin page “Setup -> DB Migration” to create a migration. One (“bootstrap” is already installed) and cannot be modified.

The pic below illustrates the DB Migrations page in the source environment.

334591048_Setuppage-source.thumb.jpg.0abb3af2e01eb187f1a3b3ff58231c71.jpg

The status of a migration (as a source page) can be ‘pending’ or ‘exported’. ‘Pending’ means either that the migration data have not yet been exported or that the current export files differ from the source database.

On opening this page in the target environment, the individual Migration pages (type 2) are created from the definitions in their respective /new/migration.json file.

The pic below illustrates the DB Migrations page in the target environment.

150217201_Setuppage-target.thumb.jpg.1d031893c2922cd335ad9cec534dbf71.jpg

In a target environment, a migration status can usually be ‘indeterminate’, ‘installed’ or ‘uninstalled’. ‘Indeterminate’ means either that the migration has not yet been installed (so no ‘old’ files containing the uninstall definition exist yet) or that the current state matches neither the ‘new’ or the ‘old’ state. ‘Installed’ means that the current state matches the ‘new’ definition and ‘uninstalled’ means that it matches the ‘old’ definition (i.e. it has been actively uninstalled rather than not yet installed).

When carrying out development work, you keep a note of what fields, templates and pages you have added, changed or removed. The module does not track this – it is a declarative approach, not a macro recorder. Also, it does not handle other components such as Hanna codes and Formbuilder forms. These come equipped with their own export/import functions.

You can update a migration page as you go along, rather than keep a separate note of changed components. The migration page also allows you to document the migration and add any number of “snippets”. These snippets do not do anything,  but can be a convenient place to store (for example) Hanna code exports for pasting into the target environment and help to make the page a comprehensive record of the migration.

See example below:

97822101_Migrationpagesource-full.thumb.jpg.b9c1f584374f02f8664430c0ce631855.jpg

Note that migration pages just define the scope of the migration. It is entirely feasible for other parts of the dev database to be changed which are outside this scope and which will therefore not be migrated.

After sync'ing code files to the target environment, the new migration will be listed on the setup page.

On the migration page, in the target environment, there are “preview” buttons to see what changes will be implemented. The migration can then be 'installed'. See example of the migration page in ‘installation’ mode below:

428447573_Migrationpage-target.thumb.jpg.dd836ed1dfb99d0fb6e7914c9cd7915a.jpg

That's the gist of it, but inevitably there are complications. Happy to discuss and share further if there is interest in this.

Edited by MarkE
Update
  • Like 19
Link to comment
Share on other sites

A question: Do new and changed fields and/or pages get processed in the order they're specified? 

I'm thinking of the scenario where a new page field is added that depends on a template and page that may also need to be added. That's one of the scenarios where using the built in field inport/export doesn't work too well if there are dependencies. It will advise you of them, but then you have to go back and import the dependencies then re-do the import, whereas if it's possible to specify order so that dependencies are met before an object is processed, it would make migrations more robust. This isn't an issue for declarative development via the API rather than the admin UI, as the developer can order their code so that dependencies are processed in order, but if changes are made via the admin UI this is a potential problem.

  • Like 1
Link to comment
Share on other sites

@MarkE - this looks really very impressive - looking forwarding to testing it out!

@Kiwi Chris - that's one of the things that my ancient Migrator module handles - it takes several loops to correctly install required fieldtype modules, create fields (including any required page reference page trees, as well as any new fields they might need), templates and pages such that all dependencies are satisfied in the correct order. This stuff is all pretty painful. Definitely curious to see how @MarkE has handled this.

Link to comment
Share on other sites

Wow, looks like you have put a lot of work into that ? I think every step towards better PW migrations is important and very welcome ? 

4 hours ago, Kiwi Chris said:

I'm thinking of the scenario where a new page field is added that depends on a template and page that may also need to be added. That's one of the scenarios where using the built in field inport/export doesn't work too well if there are dependencies.

https://github.com/BernhardBaumrock/RockMigrations/blob/bb43552f55ef7ff57533083f4d886c3aa00a8e41/RockMigrations.module.php#L1956-L2000

$rm->migrate([
  'fields' => [...],
  'templates' => [...],
  'pages' => [...],
]);

Maybe I'm missing something, but I thought I had that problem too when developing the migrate() method and it turned out to be quite easy: I create fields, then templates, then I setup the fields and templates again (now that fields and templates exist in the system the references can properly be set) and finally I create all pages. In my scenarios this has worked perfectly for several months (years?) now ? 

8 hours ago, MarkE said:

This is great for professional PW developers, but for occasional developers like me, it is much easier to use the UI rather than just the API.

I have to think about that sentence ? Maybe you could elaborate a little more on that? I try to understand your workflows better. I always thought that if somebody does not want to learn how to use code-based migrations he/she could simply use PW's import/export tools?! Do you think you could create a quickstart-screencast using some free tool like https://screencast-o-matic.com/home to show the workflow when using your migrations?

  • Like 2
Link to comment
Share on other sites

5 hours ago, Kiwi Chris said:

Do new and changed fields and/or pages get processed in the order they're specified? 

I'm thinking of the scenario where a new page field is added that depends on a template and page that may also need to be added

The sequence is: remove (pages > templates > fields); add/change (fields > templates > pages). Within each type (e.g. pages), the sequence is in the order listed in the entry.

I think this caters for most situations - e.g. a new page select field followed by a changed template, followed by pages in the right dependency order - but it may not cover everything (particularly making the uninstall happen correctly too). In mitigation, unsuccessful migrations can be reviewed via the previews (see example below) and more complex migrations can be broken into smaller migrations to better handle dependencies (I had thought of providing dependency linkage between migrations, but at the moment I think that adds too much complexity). Pic below shows an example preview with one 're-purposed' field and one new one - this can be reviewed before implementation and also afterwards if the install was not complete.

1146442877_Previewdifferences-newandchangedfields.thumb.jpg.e0b245ebb5a7a8a06a302ebcabe2d89e.jpg

5 hours ago, adrian said:

that's one of the things that my ancient Migrator module handles

I will take a closer look at that to see if it helps any.

1 hour ago, bernhard said:

I think every step towards better PW migrations is important and very welcome

Quite! IMHO it is the main area where PW is lacking, particularly given that it is such a good tool for heavy-duty apps as well as just websites - a point that has been well made by @Kiwi Chris. I was surprised and a bit disappointed by how little attention this has had in terms of core modules.

1 hour ago, bernhard said:

I create fields, then templates, then I setup the fields and templates again (now that fields and templates exist in the system the references can properly be set) and finally I create all pages.

Not sure if I need to loop round like this (and also as @adrian indicates). Further testing may indicate that it is necessary.

 

1 hour ago, bernhard said:

I always thought that if somebody does not want to learn how to use code-based migrations he/she could simply use PW's import/export tools?!

Well, they could but

  1. the tools are not complete (e.g. fields does not handle select options and pages does not exist except 'in development');
  2. for anything but the simplest migration, you have to do several manual steps in the right order (coupled with working out what didn't implement correctly) - this can be a real pain if you are testing several times then needing to implement on the live version and can run a real risk of making mistakes when doing it for the n'th time;
  3. there is no documentation of what you have done;
  4. there is no ability to uninstall other than to repeat the manual steps correctly in reverse order.

For very simple migrations (e.g. one changed template), then it is a feasible approach. I wanted something that installs fast and consistently every time - particularly when it is interdependent with changed code and I want to minimise downtime.

Also, it is not just a case of 'not wanting to learn'. Code-based migrations need familiarity through frequent use and work best in the context of code-based app development where the same comment applies.

1 hour ago, bernhard said:

Do you think you could create a quickstart-screencast

I'll do a bit more testing and try and cover some of the points made above, then do this.

Meanwhile, the main task is to refactor and restructure the code which is a truly horrible lot of spaghetti at the moment. If this is to go any further, I will then need some help, particularly with turning it into a more 'professional' module, as my PHP (particularly OOP) skills are limited and I have never previously done anything beyond simple Process modules .

  • Like 1
Link to comment
Share on other sites

6 minutes ago, MarkE said:

The sequence is: remove (pages > templates > fields); add/change (fields > templates > pages). Within each type (e.g. pages), the sequence is in the order listed in the entry.

I think this caters for most situations - e.g. a new page select field followed by a changed template, followed by pages in the right dependency order - but it may not cover everything (particularly making the uninstall happen correctly too).

Where this may not work is where you have a new page reference field that needs to access pages with a given template when that template does not yet exist, or a given parent page when that parent page does not yet exist. In that case you're going to need to add a page or template (or both) before you add the field.

@adrian mentioned having to loop through several times to ensure all dependencies are met, and I don't think it's possible to avoid this.

Some field types don't have dependencies, so it makes sense to process them first, but page references will have dependencies that may or may not have been met, and if not, you'll need to install any templates and pages then loop back and check whether the page reference fields have their dependencies met.

Where it gets really messy is if you have a template that depends on one page reference field, which happens to be the template used by another page reference field.

eg: customer->billing-contact (page reference to contact template) , invoice->customer (page reference to customer template), so invoice template can't be made till customer page field exists, which in turn depends on template that has a page field that references contact template. In this case adding fields > templates > pages in that order won't work.

FWIW, it's hard to make dependency tracking work, and even the big guys have issues. I recently had a .Net Xamarin Forms app project using Microsoft Visual Studio, and circular references between dependencies between third party libraries was blocking me from updating what I needed to get it working. In the end I had to delete all the dependencies and then add them back in with the updated versions to get it to work! 

I think what I'd be happy to settle for as a developer with ProcessWire, would be to be able to set up a single list of objects in order, specifying whether they're a page, field, template, or module, etc with the ability to reorder them if necessary, similar to in the template editor you can re-order fields. Although this means I'd need to manually figure out what order things should be in, if I'm creating the code, even if it's via ProcessWire admin rather than the API, I should know what depends on what.

It might simplify the layout of your UI a bit, as you'd simply have object type, object name, whether to add/update or remove it, although it would require some ajax callback if you want a lookup on the list of objects to make sure the object with the name you've specified actually exists as the type you've specified, although maybe not absolutely essential as currently you've just got a text field to add your object names.

  • Like 2
Link to comment
Share on other sites

1 hour ago, Kiwi Chris said:

although maybe not absolutely essential as currently you've just got a text field to add your object names

Originally I was using select fields and lookups, but as the objects are not necessarily in the current database (and it depends on whether you are in a source or target environment), this made things troublesome and inconsistent, so I reverted to plain text.

Your general suggestion of item by item ordering is interesting and I will consider that, thanks.

Link to comment
Share on other sites

I've reworked this slightly, based on the helpful suggestion of @Kiwi Chris.

The individual migration items are now entered in a repeater field and so the sequence can take into account any dependencies. Each item can be either 'new', 'changed' or 'removed'. When a migration is uninstalled, the sequence is automatically reversed and 'new' items are changed to 'removed' and vice versa.

Example below shows a migration testing a page reference dependency (it works!). This is the appearance in the source database (pre-export):

79488562_Pagereftest-migrationsource.thumb.jpg.16116283236e66726c152b532428dd11.jpg

If you click 'Preview' you see what changes are proposed to export (see below). This feature also operates to review (in the target database) what changes will happen on install or uninstall - or, if the install has failed, what changes remain (either by modifying the migration or applying a manual fix if all else fails).

655760680_Pagereftest-previewexport.jpg.3a732f9a9f95eff2bfdb1d69e6924110.jpg

Export is shown as 'no object' in the above, because the migration has not yet been exported. I think this all seems to work as designed, but am grateful for any futher thoughts on the design. I will now work on tidying the code a bit and doing a bit of documentation.

There may still be a few bugs. There are some at the moment that I can't pin down but they are cosmetic rather than fundamental - I think I may need some help from those who understand the inner workings of PW better ( @adrian ?). Promises of help will encourage me to get on and release some code ? 

Meanwhile, I am happy to answer any further questions.

  • Like 6
Link to comment
Share on other sites

  • 2 weeks later...

Code is now posted at https://github.com/MetaTunes/ProcessDbMigrate 

Many thanks once more to those who gave me ideas and help in doing my first 'proper' module - in particular @Kiwi Chris, @bernhard and @adrian.

Please test and feedback. But be gentle in your criticism of the code ?. Most importantly - this is a proof of concept only at this stage and it makes changes to your files and databases so please do not use on production sites and do back up everything beforehand - use at your own risk!

To install: Place the ProcessDbMigrate folder in your site/modules directory. Make sure your environment meets the requirements - you need @Robin S's FieldtypeRuntimeOnly to be installed first. The earliest PW version I have tested it with is 3.0.148, but it might work on earlier 3.0.xxx versions. Please let me know if it works with earlier versions.

Having satisfied the dependencies, install the module.

You will see that there is an extensive help.md file - please read it, particularly if you get stuck.

@bernhard asked for a screencast - I will do that next - hopefully it will make things clearer.

  • Like 4
Link to comment
Share on other sites

For those that like a screencast, I hope this helps (I've broken it down into logical steps):

Install the module in your development environment (making sure you have FieldtypeRuntimeOnly installed first). Then open the "Database Migrations" setup page and refresh it.
You will see that it has automatically installed a 'bootstrap' migration.

 


Create a new migration page to hold the scope definition of your new migration - just enter the basic details and save it at this stage.

Make the changes you want in the development environment (of course, you may have already done this ? ).
We will add a new page and a couple of children.

Then a couple of fields (one a page ref with the new page as parent) and a template.

Add a page using the new template.

Now go back to the migration page and define the affected elements. Use the preview to see the effect, then "export" the migration if you are happy.

The next step is to install the new migration in the target environment. Sync the code files (including the .json files created by the migration and any new images/files in assets/), install ProcessDbMigrate in the target if necessary and go to the setup page. You can preview the migration before installing it.

If necessary, you can uninstall the migration (and the module), but the code files will remain.

End of show!

 

  • Like 4
Link to comment
Share on other sites

Wow. This looks so cool. The creation of fields and templates via the admin might not be for everyone, but I think you can generate the migration file also by hand, right?

A feature that has been requested multiple time, is that all changes that you do in the admin should be tracked and added to a migration.

I like the basic idea behind it, and think of a hook, that gets triggered after creating a field/template, or making modifications to a field, which automatically or after confirmation modifies or adds a migration file.

As you are also running a diff, you might create a migration automatically, to see what changes have been made to templates/fields/contents since the last migration. Then you would just choose which templates/fields/contents should be included in the migration. For example you added a new field, added it to two templates and created a new page with one of the templates. 
Now your module could run a "Get changes" command, that fetches all differences since the migration and asks which of them you want to integrate.
With this behaviour you would not have to remember and "define the affected elements" as in your video.

What you think about this approach?

I am also happy to have a look at your code and try it out, because I think migrations is a major issue with ProcessWire right now. I am using @bernhards RockMigration atm, but also like your approach. Migrations should be an important part of the core IMHO.

  • Like 3
Link to comment
Share on other sites

2 hours ago, dotnetic said:

The creation of fields and templates via the admin might not be for everyone

You can create/modify the fields/templates/pages however you like in the development environment. Then create the migration page by just defining what has been added/changed/deleted.

2 hours ago, dotnetic said:

I think you can generate the migration file also by hand, right?

Sure, but why bother (see the above)? Minor tweaks by hand are OK if the json file isn't quite what you want, but better to let the code generate it.

2 hours ago, dotnetic said:

A feature that has been requested multiple time, is that all changes that you do in the admin should be tracked and added to a migration.

My module does not track changes - that can get very messy. You just define the scope of changes (in the right dependency order) and it picks up the current state - not how it got there.

2 hours ago, dotnetic said:

As you are also running a diff, you might create a migration automatically, to see what changes have been made to templates/fields/contents since the last migration.

Perhaps this might work better than the real-time hooking. A suitable compromise that might be workable is a separate component that just logs what has changed since last time (without knowing how). That could then build a draft migration page, but the sequence may need to be hand-sorted as getting the system to work out the dependencies could be tricky. In theory, you could use json files to snapshot the whole database and then take diffs from that to create the migration page, but that could be pretty resource-intensive - at the very least you would want to restrict the page tree to exclude user pages which are not maintained in the dev.

2 hours ago, dotnetic said:

With this behaviour you would not have to remember and "define the affected elements" as in your video.

What you think about this approach?

It definitely has that advantage, provided you took a snapshot before you started work on the changes. On the other hand (a) it is a good idea to document what you are doing ? and (b) if you are working on 2 or more sets of (disjoint) changes, your approach would bundle them as one. So while it may be a good idea (and maybe achievable as per the above comments), you would definitely want it to be optional - e.g. have a button "Create migration page from snapshot".

2 hours ago, dotnetic said:

Migrations should be an important part of the core IMHO.

I couldn't agree more, and would appreciate @ryan's take on this. It is the only thing about PW that irritates me. I'm not sure he would agree - the whole import/export stuff seemed to have been left unfinished years ago - for example this.

And by all means look at my code, but you might wish to wear gloves ?

  • Like 3
Link to comment
Share on other sites

I have made a few minor amendments to the code at https://github.com/MetaTunes/ProcessDbMigrate, so anyone who has downloaded an earlier version might wish to update their copy.

I found a few bugs with the image files which I've  hopefully fixed - but there is a residual issue: Because the target database might have different page ids from the source database, the module uses page paths not ids for referencing. However images and files are stored in folders using ids. In order to migrate a page with files/images, it is necessary to upload the files in their related folders. The module will then put them in the right folder for the target system, but problems could arise if there is a page in the target system with images/files and its id is the same as the source page id. I'll scratch my head a bit over that one!

I just used the module to migrate a site from my first prototype (see the OP) to this version and it worked fine.

  • Like 1
Link to comment
Share on other sites

Thanks for the tip, @adrian. In my tests to date, this hadn't proved to be an issue, because I was leaving the uploaded image in place. In other words, the image field on the page pointed to the folder for that (target) page id, but the RTE field pointed to the original (source) folder which I had left in place, so I didn't spot the problem.

However as I said, I realise that there is a potential issue with leaving the original image in its folder. I was going to just delete it after the migration installation, but this yields several problems:

  • RTE images, as you point out
  • Uninstalling and re-installing won't work unless you reinstate the original
  • If you use the development environment for testing by restoring different database versions then the original file is lost.

So now I am trying to find a way of handling all these. BTW, I realise that the last one is not good practice as /files/id/ conflicts could arise anyway - but I have a personal problem in that, having moved to PHP7.0, my test environment needs an upgrade before I can use it again, so I was making do with using the dev environment as a test environment too.

Currently my thinking is to use a special-purpose files directory within the migration package for the images to be uploaded, but I've yet to work that through.

Link to comment
Share on other sites

1 minute ago, MarkE said:

Currently my thinking is to use a special-purpose files directory within the migration package for the images to be uploaded, but I've yet to work that through.

If you look at the code in Migrator you'll see that I store images in the migration zip package under a path that matches the path of the page they are associated with, so for example, an image asociated with a blog post might look something like this:

Page path: /blog/my-latest-blog/
Images for that pos: zippackage/blog/my-latest-blog/image-1.jpg, zippackage/blog/my-latest-blog/image-2.jpg, etc

Then when the migration package is installed on the other site and the new page ID for blog/my-latest-blog/ is determined, the images are then added to that page and installed into the /assets/files/xxxx that matches that new ID.

There is so much to consider with all this stuff, but if you run through a full migration of a tree of pages using Migrator you'll see how it stores all this stuff in the zip and how it modifies the paths in RTE fields to match the path of the page and then back to the /assets/files path again.

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

1 minute ago, adrian said:

If you look at the code in Migrator you'll see that I store images in the migration zip package under a path that matches the path of the page they are associated with

Thanks @adrian. That's roughly what I was thinking of - except that in my case there is no zip. However, I was thinking it might be simpler to use the original page id, not the path. I'll work through your code and see how well it fits. Hope it's OK to use it (with credits!) if it works in my situation.

Link to comment
Share on other sites

Just now, MarkE said:

Thanks @adrian. That's roughly what I was thinking of - except that in my case there is no zip. However, I was thinking it might be simpler to use the original page id, not the path. I'll work through your code and see how well it fits. Hope it's OK to use it (with credits!) if it works in my situation.

Oh sorry, I honestly haven't tried your module yet - probably should before I provide poor advice. In my experience there isn't a way to do this by relying on the same page IDs but maybe it's OK with the way you have things set up. I'll shut up for now ?

Link to comment
Share on other sites

@adrian, your suggestions have been invaluable! I think I have it working OK using ids - basically the 'new' pages all store a meta value for the related old page id so that mapping is possible (of course all pages with the source images must be included in the migration). That means that I only have to do one 'translation' - in the target system, replacing the old id's with the new ones. I used the code in your nameImagePathId() method for this - amended as required:

protected function replaceImgSrc($page, $field, $idMapArray) {
    $files = $this->wire()->config->urls->files;
        $html = $page->$field;
    if (strpos($html,'<img') === false) return $html; //return early if no images are embedded in html
    $dom = new DOMDocument();
    @$dom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));
    foreach ($dom->getElementsByTagName('img') as $img) {
        $src = $img->getAttribute('src');
        bd($src, 'Image src for ' . $page);
        $origId = basename(dirname($src));
        $destId = (isset($idMapArray[$origId])) ? $idMapArray[$origId] : $origId;
        $img->setAttribute( 'src', $files . $destId . '/' . basename($img->getAttribute('src')));
        bd($img->getAttribute('src'), 'reset img src');
    }
    return preg_replace('/^<!DOCTYPE.+?>/', '', str_replace( array('<html>', '</html>', '<body>', '</body>'), array('', '', '', ''), $dom->saveHTML()));
}

$idMapArray is just an array of oldId => newId pairs. The only slight problem is that this introduces line breaks ( \n ) at the start and end of the html and I can't see why.

  • Like 2
Link to comment
Share on other sites

Looking at the recent discussion in this thread I wonder if the distinction between your module and mine is not about having a UI or not but is more about where and how you want to use it.

I have never ever had the need for "migrating" content of an RTE field to another site for example. And I have never ever had the problem of changing ids on that process. Why? Because my module relates to everything BUT content. Content should be part of the site and not part of the migration. That means I can develop a system that I can setup multiple instances of (for example a local dev and a live production system, or as another example this could also be one setup for sports clubs that is used by multiple sports clubs running the same system and getting the same updates - but keeping thier content).

Your module seems to be targeted to another world. Migrating content (and maybe also necessary config fields/templates) from one site to another?

Am I right or did I get a wrong impression here?

  • Like 1
Link to comment
Share on other sites

31 minutes ago, bernhard said:

I have never ever had the need for "migrating" content of an RTE field to another site for example.

TBH, neither have I*. However, I have had to migrate pages, sometimes with images. For example, pages that are used to hold site settings. Also, I intended that the module might be used in 'rescue' mode as explained in the original post, which might involve migrating 'content'. Since the module does allow migrating pages, prompted by @adrian, I thought I would try and include RTE fields if I could.

*Correction - I meant RTE fields with images. Even the migration pages themselves have an RTE field, but I hadn't expected to put images in it, although that is possible.

Edited by MarkE
Correction
  • Like 3
Link to comment
Share on other sites

19 hours ago, MarkE said:

The only slight problem is that this introduces line breaks ( \n ) at the start and end of the html and I can't see why.

There were other issues too, like '/>' vs '>' as the img tag end. Eventually I decided to ditch the DOMDocument and just use a simple preg_replace:


protected function replaceImgSrc($html, $idMapArray) {
	if (strpos($html,'<img') === false) return $html; //return early if no images are embedded in html
    foreach ($idMapArray as $origId => $destId) {
        bd([$origId, $destId], 'Id pair');
        $re = '/(<img.*\/files\/)' . $origId . '(\/.*>)/m';
        $html = preg_replace($re, '${1}' . $destId . '$2', $html);
    }
    return $html;
}

Any reason @adrian why you went the DOMDocument route? I'll post an updated script to GitHub shortly, then maybe someone will find some holes in it!

  • Like 1
Link to comment
Share on other sites

1 minute ago, MarkE said:

Any reason @adrian why you went the DOMDocument route? I'll post an updated script to GitHub shortly, then maybe someone will find some holes in it!

Because it's the only reliable way to parse HTML properly  and it is much easier to query and replace things - I just wish it didn't mess with the html when saving. Some people say to use saveXML() but that has other problems.

The new line issue is easily fixed with a trim(). As for the self closing tags and the slash - I guess that doesn't bother me too much.

This shouldn't be so hard ?

  • Like 2
Link to comment
Share on other sites

1 minute ago, adrian said:

The new line issue is easily fixed with a trim()

Yeah, I did that and then came across the /> issue. Trouble is my diff method will report any diffs in the source unless I exempt them. I don't like it if you can't predict what will be returned. I think I'll stick with preg_replace for now since the parsing is very limited and see if it works out OK.

  • Like 1
Link to comment
Share on other sites

Version 0.0.2 now on GitHub https://github.com/MetaTunes/ProcessDbMigrate

This version more fully allows for different page ids in source and target systems. A meta value (idMap) maintains the mapping. This allows the replacement of links in RTE fields provided the relevant pages are all in the migration. Also, all existing image variants are migrated.

EDIT: Now 0.0.3 fixes install problem and adds upgrade via modules -> refresh. 

  • Like 3
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...