Jump to content

Case Study: SignificatoJournal.com: Migrating from MODX Evolution to ProcessWire


Peter Falkenberg Brown
 Share

Recommended Posts

Case Study: SignificatoJournal.com:
Migrating from MODX Evolution to ProcessWire

Contents:

    * Useful MODX Fields

    * Custom Template Files

    * Template Chunks

    * Field Chunks

    * Snippets

    * The Writer Table

    * The Migration Script

    * URL Replacement

    * Image Migration

    * TinyMCE Code Breaks

    * Post Migration Data Checks

    * Link to Script Content


I just finished migrating the magazine site that my wife and I run (http://significatojournal.com) from MODX Evolution to ProcessWire. How I did it may be of interest to other MODX users that wish to migrate their sites to ProcessWire.

I liked MODX because it was so flexible. My experience with ProcessWire has been that PW is even more flexible than MODX, and it breaks the 5,000 page MODX Evo barrier that was the great bugaboo in Evo. I attempted to use MODX Revolution multiple times but was very unsatisfied with the slowness of the editorial interface. There were also other reasons that I left MODX for PW, that have been addressed by other writers.


After using PW’s API to build a large web app in 2013 (which will be a different case study), and now, after having migrated my magazine site to PW, I’m absolutely thrilled with ProcessWire. I could go on and on... :-)

There were many things that I liked about MODX Evo, that provided functionality that I wanted to continue to use in ProcessWire. They included:

Chunks, snippets, and a combination of built in MODX fields and custom template var “fields” that I created for my former website, including:


Useful MODX Fields

  MODX Fields:

longtitle (headline)

pub_date

introtext (summary)

template (id of custom template)

menuindex

menutitle

hidemenu (show in menu)


* In PW, I use the three menu fields to create the menu of the site, using Soma’s excellent module “MarkupSimpleNavigation”, with code like this: 'selector' => 'show_in_menu=1'

 

post-1176-0-08205500-1388096261_thumb.jp

 

* I use the publish_date field to block display of pages with future publish_dates, as well as to show the actual date of publication set by the editor (not just the date of the saved record).

 

Custom Template Var Fields:

subtitle

writer_id

static_author

static_author_attribution

newsletter_volume_number

headline_thumbnail

  article_type

  sitemap_exclude

code_blocks1-7


Custom Template Files


MODX Evo allows one to assign a custom template file to each article, which I found very useful. Unlike PW, MODX Evo uses a primary static data field set for each article, with custom fields added on as “template vars.”

The advantage of the custom template files is that it allows one to use different display templates for different types of articles or section pages. I generally use a four level method of:


- home page

- multi-section page

- section page

- article page


Given PW’s method of creating virtual data tables, aka template “field sets”, with display template files assigned to each template field set, I had to work out a method to have the same type of flexibility of assigning display template files to each page.


For example, an article page might be a regular article page, with writer information, or an “article_plain” page, like a Contact Us page. A section page could be a multi-section page, with a list of sections, or a paginated section page that lists articles. I also had a need for custom section pages, to display articles based on “content_tags”.

My solution was to create generic PW template data field sets, that all ran one php template file called “custom_template_controller.php.” The two main PW template field sets are:

* article_page_structure_permanent

* list_page_structure_permanent

Using this method, when a new page is created, I select the PW data template first:

post-1176-0-30325200-1388096160_thumb.jp


and then, once I’m in the main set of fields, I select the custom template file for that page:
 

post-1176-0-59590900-1388096256_thumb.jp
 

The custom_template_controller.php is very simple, and simply pulls up the custom template file assigned to the page, and runs it:

<?php
###################################################################################################
# custom_template_controller.php
###################################################################################################

include("./inc_vars.php");
include("./inc_functions.php");

#....................................................................
# block future publish dates; don't block home page (id 1)

if ( ( $page->id == '1'  ) || ( ! empty( $publish_date ) and $publish_date <= $now ) )
      {
      # page can be displayed
      }
else
      {
      wire(session)->redirect("/http404", false);
      }

#....................................................................

include("./$custom_template_file");

###################################################################################################

The "./inc_vars.php" file gets the value of the field:

$custom_template_file = $page->custom_template_file->select_value;

and also initiates a variety of variables and template “chunks”.

Template Chunks

MODX Evo allows one to define “chunks” of text that can be replaced in the templates or in data fields, using {{tags}} that are replaced at run time. In PW, I divided the chunks into sets of templates chunks, and a smaller set of field chunks.

Because of the way PW uses PHP as its “templating” language (which I REALLY like), I decided to simply place the template “chunks” in a file called "./inc_vars.php", and define them as normal PHP variables. Since that file is loaded before every page, the variables are available to use in the custom template files.


Field Chunks

For field chunks, I created a PHP function that loops through a set of “chunk” data pages and looks for corresponding tags in the body field, and then replaces them. I placed the “field chunks” branch of data pages under a hidden master page called “elements”, which I also used for custom selects like the custom_template_files.

post-1176-0-66125300-1388096254_thumb.jp


The field chunks use the MODX delimiters of curly brackets {{chunk_name}} and the contents of the chunks are replaced. For example, “{{email.pfb}} is replaced with an image of the email address as the title of a clickable, Javascript encoded mailto: link.

In MODX, the field chunk system also allowed one to replace tags in text fields of data coming from template vars (custom fields). I found that my primary need for that was with code that TinyMCE didn’t like, such as data entry forms or special Javascript, so I created seven “code_block” fields, e.g. “code_block1” … “code_block7”. Seven is a bit much, but at the time I created the fields in MODX, I was using many Amazon affiliate tags for books and CDs, in various articles.

Snippets

MODX Evo also has handy-dandy snippet tags that get replaced at run time. For my purposes, I only need to replace snippets in the code block fields, prior to replacing the code block tags in the body text.

For example, I have a form that needs to display a dynamically generated captcha image that gets created by a Perl script. So, in the code_block1 field of the article, which contains the form, I place a snippet tag:

[!s_form_get_captcha!]

which then gets replaced by the same function that parses the chunk tags.

I used the syntax [!...!] from MODX Evo mainly for convenience. Unlike MODX, the ! exclamation marks don’t affect caching of the snippet.

To work with the snippet tags, I created an array in the chunk parsing function that attaches the snippet tag to the name of a PHP include file:

$snippet_array = array(
                       '[!s_form_get_captcha!]' => '/home/sigj/s_form_get_captcha.php',
                      );

In this case, the PHP file “s_form_get_captcha.php” contains a backtick call to a Perl script which returns the dynamically generated captcha image. But, the PHP file could contain any normal PHP code that has to be generated at run time. Here are the contents of the function that parses chunks and snippets:

###################################################################################################
function parse_field_chunks($page_id)
{

$body = wire(pages)->get("$page_id")->body;

$snippet_array = array(
                       '[!s_form_get_captcha!]' => '/home/sigj/s_form_get_captcha.php',
                      );

#..............................................................................

$field_chunk_id_array = wire(pages)->find("parent=1052, include=all");

foreach( $field_chunk_id_array as $chunk_id )
      {
      $chunk_name  = '{{' . wire(pages)->get("id=$chunk_id")->name . '}}';
      $chunk_value = wire(pages)->get("id=$chunk_id")->chunk;

      $body = str_replace($chunk_name, $chunk_value, $body);
      }

#..............................................................................
# replace code_block tags with field values

for ( $count=1; $count<=7; $count++ )
      {
      $code_block_field = 'code_block' . $count;
      $code_block_tag   = '{{' . $code_block_field . '}}';

      $code_block_value = wire(pages)->get("$page_id")->$code_block_field;

      # now parse code block value for snippet tags
      # [!snippet_name!]
      # [!s_form_get_captcha!]

      foreach ( $snippet_array as $snippet_tag => $snippet_include_file )
            {
            if ( strpos($code_block_value, $snippet_tag) !== false )
                  {
                  $snippet_value    = include("$snippet_include_file");
                  $code_block_value = str_replace($snippet_tag, $snippet_value, $code_block_value);
                  }
            }

      $body = str_replace($code_block_tag, $code_block_value, $body);
      }

return($body);

}
###################################################################################################

The Writer Table

I use the writer_id field as a popup, to pull in an id from a data table of long term writers. When an article page is displayed, code grabs the id and pulls in the writer info, including a photo, attribution and Javascript encoded email address. In MODX, I had to use a custom table. In PW, I simply created a template field set called ‘writer_page_structure_permanent.’

post-1176-0-35642800-1388096260_thumb.jp
 

I use a ‘static_author” and “static_author_attribution” field for those times when an author is a one-off writer. My code tests for a writer dropdown ID for ‘Non-Registered’ writer, and if the static fields have something, then that data is displayed.

Template Structure

Here are some screen shots of my PW template structure, which essentially replicated my previous MODX structure:

Templates:

post-1176-0-25450800-1388096259_thumb.jp

List Page Structure Permanent:

post-1176-0-49295300-1388096255_thumb.jp

Article Page Structure Permanent:

post-1176-0-35839500-1388096231_thumb.jp
 

The Migration Script

 

One of the challenges I faced with my script to migrate the data was the assignment of the correct parent of each article. Luckily, I wanted to keep the exact structure of the section and article tree and the urls.

 

Since I didn’t have tens of thousands of articles, I decided to create an associate array of the sections and articles under the first level of the home page (i.e. starting at level 2), and then use that sorted list to create the ProcessWire parents before each lower level of section or article. I built the script dynamically, testing as I went, so I wouldn’t say that the script is fit for any and all MODX situations. It’s heavily tailored to my installation, and is missing a few elements that I missed until after I had finished with it (thus causing me to fix some things by hand).

URL Replacement

 

I had to parse through each article, in both the body, summary and subtitle, to make sure that any internally pointing MODX urls were replaced with the full url. MODX Evo uses the syntax [~ID~] in the “<a href...” tag to dynamically create the full page url at run time. I had to create a routine to replace the ID tags with page urls, e.g. “/columns/some_article_name”.

Image Migration

 

I first took the lazy way out and thought that I could simply move the “/assets/” folder from the MODX installation over to the new account. However, when I opened a PW page in edit mode, the links to the /assets/... images were there, but the images weren’t attached to the page, and thus, in edit mode, the image didn’t now show up in the edit box. I therefore added a routine to copy the images to each page.


TinyMCE Code Breaks

 

I found that TinyMCE kept trashing my various CSS codes that came over from MODX. I tried adding various tags to the body field’s “valid_elements” field under Input / TinyMCE, but finally just changed valid_elements to:


+*[*]

 

It’s a bit radical, I suppose, but my editors are fully trusted. After that, my migrated data was fine.


Post Migration Data Checks

 

After I migrated the data for the umpteenth time, in order to get it right, I still needed to do a variety of clean up tasks. I found the Selector Test module by Niklas Lakanen very useful (Thanks, Niklas!), and used it to run checks like:

	body*=src\=\"{~root_url}assets

    (which looked for left over links to /assets/ which came from MODX)

 

I also queried the MODX and PW tables directly, using SQLYog, a Windows MySQL client.

 

I ran a script that compared the results of a find command (find . -type f -printf "%f\n") under the MODX assets folder to the files under the PW site/assets/files folder. I found about 70 files that were not copied, some because they were in template files, which my script didn’t parse, and some because the filenames were problematic, like files with spaces, etc. The script took into account the PW code that changes uploaded file names. To do that, I copied the PW “validate_filename” function into my script.


For all my checking, I still forgot things, like parsing the subtitle or summary fields for hrefs, which I then had to go and do by hand (since there were very few records like that).

 

I also created a few redirect aliases by hand, instead of trying to handle them via the script.

 

All in all, this migration confirmed once again that website migrations are a Bear. Ugh. I’d rather not do them. :-)

Link to Script Content


Here’s the link to the script. Note that it didn’t catch everything, and it was heavily tailored to my design. Also note that I’ve removed some of the private data, like writer’s names, etc.

 

sj_modx_pw_migrate_script.php

 

That’s my case study. I hope it may be useful to another “MODX Refugee”. :-)

 

Peter Falkenberg Brown

  • Like 12
Link to comment
Share on other sites

Peter, thanks for sharing this -- always interesting to hear how others are solving issues like migration from one system to another. I'll have to read this more carefully later, just had a chance to browse through and check out the code parts. Two quick comments on those, though:

  • This might be just a question of preference, but it looks like you could've easily used Hanna Code for those snippets. Personally I've got a tendency to go with the "as little custom code as possible" route, and so I might've preferred that.
  • You don't really need to redirect user to 404 page, especially not with a hard-coded path (though it's unlikely to change). You can get current 404 page with $config->http404PageID and use it's real ("dynamic") URL or you can simply throw a Wire404Exception().

Again, thanks for sharing this with us. Very much appreciated :)

  • Like 1
Link to comment
Share on other sites

Thank you very - very much for posting this. It remembers me why I am here. Was once hooked on modx EVO

but it ended with the road modx took. Since there are a lot of evo (revo) refugees here your post will be most appreciated, for others the comparing you did gives much information for them too.

http://processwire.com/talk/topic/2850-processwire-for-designers/page-2#entry30349

Link to comment
Share on other sites

Dear Craig, Teppo and PWired,

Thanks for your kind words! I appreciate it very much.

Teppo, I'll revisit the redirect. I agree, it's better to get the config value.

And yes, I could have used Hanna, except that I had a lot of legacy tags and fields, like code_block1, etc.

I just looked at the Hanna module again, and it looks great.

Ain't it great that PW is so great! Zowie.

Peter

Link to comment
Share on other sites

Just wanted to mention that redirecting to the 404 page is wrong.

wire(session)->redirect("/http404", false);

Even if this would be correct... on a side note: the redirect to "/http404" without trailing slash would redirect to "/http404/" (with default settings), and further more if the 404 page uses the basic-page template and the requested page too, you'd end up in endless redirect.

Back to why using this is wrong:

It won't give you a real 404 page with correct header! It will just redirect to a regular page with content. Means this page will get indexed instead for all pages you're doing this.

The way to do it is:

throw new Wire404Exception();

This will render the 404 page content with correct header and keep you at the url you requested.

  • Like 3
Link to comment
Share on other sites

Another thing I noticed is that you use wire(pages) ... wire(session). Which may work, but will throw a notice that it expects wire("pages") or wire("session"). If you would enable debug mode you'd end up with notices all over your site.

Link to comment
Share on other sites

  • 3 months later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...