Jump to content

CMSCritic Development Case Study


ryan

Recommended Posts

  • 1 month later...

This is great stuff! 

I'd love to see some additional detail about how you managed images, topics, tags and assigning authors. I have a few big sites to migrate.

I'd also love to see your take on managing page references for related content, if applicable.

Thanks for sharing!

-Brent

  • Like 2
Link to comment
Share on other sites

  Quote
I'd love to see some additional detail about how you managed images, topics, tags and assigning authors. I have a few big sites to migrate.

I'll cover these each separately. First I'll start with the images, and will come back to the others a little later when I've got more time.  

WordPress really only uses images for placement in body copy, so I extracted the links to them right out of there and imported them that way. I did this after the pages had already been imported. In order to keep track of which images had already been imported (so that I could feasibly run the importer multiple times without getting duplicate images), I turned on ProcessWire image "tags" option, and stored the original filename in there. Here's the function I used, which I've used many different variations of over the years with different sites. You basically just give it a $page you've already imported (but is still linking to the old site's images) and it converts the images linked in the body copy from the old site to the new. 

  1. function importImages(Page $page) {
  2. if(!$page->id) return 'You need to save this page first';
  3. $out = '';
  4. $body = $page->body;
  5. // find all images reference in the 'body' field
  6. $regex = '{ src="(http://www.cmscritic.com/wp-content/uploads/[^"]+)"}';
  7. if(!preg_match_all($regex, $body, $matches)) return $out;
  8.  
  9. foreach($matches[0] as $key => $fullMatch) {
  10. $url = $matches[1][$key]; // image URL
  11. $tag = basename($url); // image filename
  12. $tag = wire('sanitizer')->name($tag); // sanitized filename
  13. $image = $page->images->getTag($tag); // do we already have it?
  14. if(!$image) {
  15. // we don't already have this image, import it
  16. try {
  17. $page->images->add($url);
  18. } catch(Exception $e) {
  19. $out .= "<div>ERROR importing: $url</div>";
  20. continue;
  21. }
  22. $numAdded++;
  23. $image = $page->images->last(); // get image that was just added
  24. $status = "NEW";
  25. } else {
  26. $status = "Existing";
  27. }
  28. $image->tags = $tag;
  29. // replace old image URL with new image URL
  30. $body = str_replace($url, $image->url, $body);
  31. // report what we did
  32. $out .= "<div>$status: $image->basename</div>";
  33. }
  34.  
  35. // assign the updated $body back to the page
  36. $page->body = $body;
  37.  
  38. // return a printable report of what was done
  39. return $out;
  40. }
  • Like 8
Link to comment
Share on other sites

Topics and tags: The first step was to create the parent pages and templates for these. For topics, there were only a few of them, so I created all the category pages ahead of time. On the other hand, with tags, there are 2000+ of those, so those are imported separately. Here are the manual steps that I performed in the PW admin before importing topics and tags: 

  1. Created template "topics" and page /topics/ that uses this template. 
  2. Created template "topic" and 6 topic pages that use it, like /topics/cms-reviews/ for example.
  3. Created Page reference field "topics" with asmSelect input, set to use parent /topics/ and template "topic". 
  4. Created template "tags" and page /tag/ that uses this template. Note that I used /tag/ as the URL rather than /tags/ for consistency with the old WordPress URLs. Otherwise I would prefer /tags/ as the URL for consistency with the template name. 
  5. Created template "tag".
  6. Created Page reference field "tags" with PageAutocomplete input, set to use parent /tag/ and template "tag". I also set this one to allow creating of new pages from the field, so the admin can add new tags on the fly. 
  7. Added the new "topics" and "tags" fields to the "post" template. 

With all the right templates, fields and pages setup, we're ready to import. WordPress stores the topics, tags and the relationships of them to posts in various tables, which you'll see referenced in the SQL query below. It took some experimenting with queries in PhpMyAdmin before I figured it out. But once I got the query down, I put it in a function called importTopicsAndTags(). This function needs a connection to the WordPress database, which is passed into the function as $wpdb. For more details on $wpdb, see the first post in this thread. 

  1. /**
  2. * Import WordPress topics and tags to ProcessWire
  3. *
  4. * This function assumes you will do your own $page->save(); later.
  5. *
  6. * @param PDO $wpdb Connection to WordPress database
  7. * @param Page $page The ProcessWire "post" page you want to add topics and tags to.
  8. * This page must have a populated "wpid" field.
  9. * @return string Report of what was done.
  10. *
  11. */
  12. function importTopicsAndTags(PDO $wpdb, Page $page) {
  13. $out = '';
  14. $sql = <<< _SQL
  15.  
  16. SELECT wp_term_relationships.term_taxonomy_id, wp_term_taxonomy.taxonomy,
  17. wp_term_taxonomy.description, wp_terms.name, wp_terms.slug
  18. FROM wp_term_relationships
  19. LEFT JOIN wp_term_taxonomy
  20. ON wp_term_taxonomy.term_taxonomy_id=wp_term_relationships.term_taxonomy_id
  21. LEFT JOIN wp_terms
  22. ON wp_terms.term_id=wp_term_taxonomy.term_id
  23. WHERE wp_term_relationships.object_id=$page->wpid
  24. ORDER BY wp_term_relationships.term_order
  25.  
  26. _SQL;
  27.  
  28. $query = $wpdb->prepare($sql);
  29. $query->execute();
  30.  
  31. while($row = $query->fetch(PDO::FETCH_ASSOC)) {
  32.  
  33. if($row['taxonomy'] == 'category') {
  34. // this is a topic: find the existing topic in PW
  35. $topic = wire('pages')->get("/topics/$row[slug]/");
  36. if($topic->id) {
  37. // if $page doesn't already have this topic, add it
  38. if(!$page->topics->has($topic)) $page->topics->add($topic);
  39. // report what we did
  40. $out .= "<div>Topic: $topic->title</div>";
  41. }
  42.  
  43. } else if($row['taxonomy'] == 'post_tag') {
  44. // this is a tag: see if we already have it in PW
  45. $tag = wire('pages')->get("/tag/$row[slug]/");
  46. if(!$tag->id) {
  47. // we don't already have this tag, so create it
  48. $tag = new Page();
  49. $tag->template = 'tag';
  50. $tag->parent = '/tag/';
  51. $tag->name = $row['slug'];
  52. $tag->title = $row['name'];
  53. $tag->save();
  54. }
  55. // if $page doesn't already have this tag, add it
  56. if(!$page->tags->has($tag)) {
  57. $page->tags->add($tag);
  58. $out .= "<div>Tag: $tag->title</div>";
  59. }
  60. }
  61. }
  62.  
  63. return $out;
  64. }
  65.  
  • Like 8
Link to comment
Share on other sites

Greetings,

I have been following this discussion -- an excellent example of a case study with highlights of how to accomplish several key goals in ProcessWire.

This last post highlights a couple of interesting points for me:

1. Emphasizes the advantage of having everything exist as a page in ProcessWire (in this case, tags).

2. How easy it is to use the API to implement functions that take care of major actions in ProcessWire.

3. How to migrate an existing CMS to ProcessWire. Might be dangerous if more people in the WordPress community knew about it!

Regarding 3: I come from the Joomla world. It seems that WordPress databases are more logical than Joomla databases. I think the migrate script for Joomla would be much more involved because simple "page" data is very fragmented in that CMS. But the same principles would apply.

As always, a very illuminating discussion!

Thanks,

Matthew

  • Like 2
Link to comment
Share on other sites

For authors, there were only about 6 of them at import time, so I created the authors as users in PW manually. I also added the "wpid" field to the "user" template, and populated the value of that manually. That was easy to find in WordPress just by editing the author and noting the ID in the URL. The WordPress wp_posts table has a field in it called post_author, which is the ID of the author. So assuming we've got a user in ProcessWire with a "wpid" that matches up to that, it's easy for us to assign the right PW user to each post. You'll see how this takes place in the code below.

Wrapping it up

Here is the same "import" code as in the first post, but I added all the code accounting for authors, topics, tags, and images back into it. This all just goes in a ProcessWire template file, and viewing the page triggers the import. Because it's aware of stuff that is already imported, it can be run multiple times without causing duplication. 

  1. <!DOCTYPE html>
  2. <html lang="en">
  3. <head>
  4. <meta http-equiv="content-type" content="text/html; charset=utf-8" />
  5. <title>Import Posts</title>
  6. </head>
  7. <body>
  8. <table border='1' width='100%'>
  9. <thead>
  10. <tr>
  11. <th>New?</th>
  12. <th>ID</th>
  13. <th>Author</th>
  14. <th>Date</th>
  15. <th>Name</th>
  16. <th>Title</th>
  17. <th>Images</th>
  18. <th>Topics</th>
  19. <th>Changes</th>
  20. </tr>
  21. </thead>
  22. <tbody>
  23. <?php
  24.  
  25. // get access to WordPress wpautop() function
  26. include("/path/to/wordpress/wp-includes/formatting.php");
  27.  
  28. $wpdb = new PDO("mysql:dbname=wp_cmscritic;host=localhost", "user", "pass",
  29. array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'UTF8'"));
  30.  
  31. $posts = wire('pages')->get('/posts/');
  32.  
  33. $sql = "
  34. SELECT * FROM wp_posts
  35. WHERE post_type='post'
  36. AND post_status='publish'
  37. ORDER BY post_date
  38. ";
  39.  
  40. $query = $wpdb->prepare($sql);
  41. $query->execute();
  42.  
  43. while($row = $query->fetch(PDO::FETCH_ASSOC)) {
  44. $post = $posts->child("wpid=$row[ID]"); // do we already have this post?
  45. if(!$post->id) {
  46. // create a new post
  47. $post = new Page();
  48. $post->template = 'post';
  49. $post->parent = $posts;
  50. echo "Creating new post...\n";
  51. }
  52. $post->of(false);
  53. $post->name = wire('sanitizer')->pageName($row['post_name']);
  54. $post->title = $row['post_title'];
  55. $post->date = $row['post_date'];
  56. $post->summary = $row['post_excerpt'];
  57. $post->wpid = $row['ID'];
  58.  
  59. // find the post author
  60. $author = wire('users')->get("wpid=$row[post_author]");
  61. // if we don't have this post author, assign one (Mike)
  62. if(!$author->id) $author = wire('users')->get("mike");
  63. // set the post author back to the page
  64. $post->createdUser = $author;
  65. // assign the bodycopy after adding <p> tags
  66. // the wpautop() function is from WordPress /wp-includes/wp-formatting.php
  67. $post->body = wpautop($row['post_content']);
  68. // give detailed report about this post
  69. echo "<tr>" .
  70. "<td>" . ($post->id ? "No" : "Yes") . "</td>" .
  71. "<td>$row[ID]</td>" .
  72. "<td>$row[post_author]</td>" .
  73. "<td>$row[post_date]</td>" .
  74. "<td>$row[post_name]</td>" .
  75. "<td>$row[post_title]</td>" .
  76. "<td>" . importImages($post) . "</td>" .
  77. "<td>" . importTopicsAndTags($wpdb, $post) . "</td>" .
  78. "<td>" . implode('<br>', $post->getChanges()) . "</td>" .
  79. "</tr>";
  80. $post->save();
  81.  
  82. }
  83.  
  84. function importTopicsAndTags(PDO $wpdb, Page $page) {
  85. // see implementation in previous post
  86. }
  87.  
  88. function importImages(Page $page) {
  89. // see implementation in previous post
  90. }
  91.  
  92. ?>
  93. </tbody>
  94. </table>
  95. </body>
  96. </html>
  97.  
  • Like 12
Link to comment
Share on other sites

  • 2 weeks later...

This hasn't been asked, but wanted to cover how the permissions and publish workflow work on the site. It has a very simple, though nice setup, where authors can submit new posts but can't edit already published posts, nor can they edit unpublished posts by other authors. It enables Mike to have full control over any content that gets published on the site, while still allowing easy submission and edits for the authors.

Post workflow

All of the authors have a role called "author" with page-edit permission.

On the "post" template, the boxes for "edit" and "create" are checked for this "author" role. 

This site also makes use of the page-publish permission, which is an optional one in ProcessWire that you can add just by creating a new permission and naming it "page-publish". Once present, it modifies the behavior of the usual page-edit permission, so that one must also have page-publish in order to publish pages or edit already published pages.

The "author" role does not have page-publish permission. As a result, authors on the site can submit posts but can't publish them. Nor can they edit already published posts. In this manner, Mike has final say on anything that gets posted to the site. 

Post ownership

The default behavior in ProcessWire is that the Role settings control all access... meaning all users with role "author" would be able to do the same things, on the same pages. In this case, we don't want one author to be able to edit an unpublished/pending post created by another author. This was easily accomplished by adding a hook to /site/templates/admin.php:

/**
 * Prevent users from being able to edit pages created by other users of the same role
 *
 * This basically enforces an 'owner' for pages
 *
 */
wire()->addHookAfter('Page::editable', function($event) {
  if(!$event->return) return; // already determined user has no access
  if(wire('user')->isSuperuser()) return; // superuser always allowed
  $page = $event->object; 
  // if user that created the page is not the current user, don't give them access
  if($page->createdUser->id != wire('user')->id) $event->return = false; 
}); 

Planned workflow improvements

Currently an author has to let Mike know "hey my article is ready to be published, can you take a look?". This is done by email, I'm assuming. An addition I'd like to make is to add a Page reference field called "publish_status" where the author can select from: 

  • DRAFT: This is a work in progress (default)
  • PUBLISH: Ready for review and publishing
  • CHANGE: Changes requested - see editor notes
  • DELETE: Request deletion

Beyond that, there is also an "editor_notes" text field that only appears in the admin. It's a place where Mike and the author can communicate, if necessary, about the publish status. This editor_notes field doesn't appear on the front-end of the site. 

All this can be done in ProcessWire just by creating a new field and adding these as selectable page references. That's easy enough, but I want to make it so that it notifies both Mike (the reviewer) and the author by email, every time there is a change in publish status or to the editor_notes. This will be done via another hook in the /site/templates/admin.php: 

wire()->addHookAfter('Page::saveReady', function($event) {
  // get the page about to be saved
  $page = $event->arguments(0);

  // if this isn't a post, don't continue
  if($page->template != 'post' || !$page->id) return;

  // if this post wasn't made by an "author" don't continue
  if(!$page->createdUser->hasRole('author')) return;

  $subject = '';
  $message = '';

  if($page->isChanged('publish_status') || $page->isChanged('editor_notes')) {
    // the publish status or editor notes have changed
    $subject = "CMSCritic post publish status";
    $notes = $page->isChanged('editor_notes') ? "Notes: $page->editor_notes" : "";
    $message = "
      Title: $page->title\n
      URL: $page->httpUrl\n
      Status: {$page->publish_status->title}\n
      $notes
      ";

  } else if($page->isChanged('status') && !$page->is(Page::statusUnpublished)) {
    // page was just published
    $subject = "CMSCritic post published";
    $message = "The post $page->httpUrl has been published!";
  }

  if($message) {
    $reviewer = wire('users')->get('mike'); 
    $author = $page->createdUser; 
    mail("$reviewer->email, $author->email", $subject, $message); 
    $this->message("Email sent: $subject"); 
  }

}); 
 

Mike, if you are reading this, does this sound useful to you? 

  • Like 16
Link to comment
Share on other sites

  • 1 month later...
  • 7 months later...

I'm having trouble with dates coming over properly. All the dates are getting set to the current time of import, which is obviously not useful.On line 53 of the revised code, you have:
 

$post->date = $row['post_date'];

But I don't see the date method in the $page documentation. I tried:
 

$post->created = strtotime($row['post_date']);
$post->modified = strtotime($row['post_modified']);

But with the same results. Any suggestions?
 



Also, for others attempting this, I encountered this tidbit that may be useful to you:
The SQL to get the data starts:
  SELECT * FROM wp_posts 

But I had modified my WordPress table definition prefix for security reasons (like everyone should have), so I had to change wp_posts to wp_xxx_posts

That threw me for half an hour.

Link to comment
Share on other sites

Seems like one should be able to modify the created and modified properties (correct term?).

I created a field for date and then was able to move this field over to the posts so it would be easier to manage with new posts as well with:

update pages,field_date set pages.created=field_date.data where pages.id=field_date.pages_id

Since I'm only doing a one-time import, I can now delete the date field.

Link to comment
Share on other sites

Quiet mode will help you with the created part:

https://processwire.com/talk/topic/5109-page-save-silently/?p=49275

but to set the modified you'll still need to use SQL because as soon as you save the page, modified gets updated again. At least that is my experience. Maybe there is another workaround I haven't thought of.

https://processwire.com/talk/topic/651-set-created-by-creating-a-page-via-api/?p=5293

Link to comment
Share on other sites

  • 6 months later...
  • 3 months later...

Thank you for sharing this Ryan, great stuff!

I would really like to use PW for an upcoming project, but since it is heavily relying on publishing news, I'm still hesitating to use WP, to take advantage of all the scheduling options.

So my question: since CMSCritic posts very frequently, did you use a scheduling mechanism/module to be able to post on certain days and hours, throuigh cron (or lazycron)?

Since this is not built in in PW, I searched for some tips on the forum, but besides the known module there's not much buzz around scehduling posts.

Thank you!

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...