Jump to content

CMSCritic Development Case Study


ryan

Recommended Posts

  • 1 month later...

This is great stuff! 

I'd love to see some additional detail about how you managed images, topics, tags and assigning authors. I have a few big sites to migrate.

I'd also love to see your take on managing page references for related content, if applicable.

Thanks for sharing!

-Brent

  • Like 2
Link to comment
Share on other sites

I'd love to see some additional detail about how you managed images, topics, tags and assigning authors. I have a few big sites to migrate.

I'll cover these each separately. First I'll start with the images, and will come back to the others a little later when I've got more time.  

WordPress really only uses images for placement in body copy, so I extracted the links to them right out of there and imported them that way. I did this after the pages had already been imported. In order to keep track of which images had already been imported (so that I could feasibly run the importer multiple times without getting duplicate images), I turned on ProcessWire image "tags" option, and stored the original filename in there. Here's the function I used, which I've used many different variations of over the years with different sites. You basically just give it a $page you've already imported (but is still linking to the old site's images) and it converts the images linked in the body copy from the old site to the new. 

function importImages(Page $page) {
  
  if(!$page->id) return 'You need to save this page first';
  
  $out = '';
  $body = $page->body;
  
  // find all images reference in the 'body' field
  $regex = '{ src="(http://www.cmscritic.com/wp-content/uploads/[^"]+)"}'; 
  if(!preg_match_all($regex, $body, $matches)) return $out;

  foreach($matches[0] as $key => $fullMatch) {
  
    $url = $matches[1][$key]; // image URL
    $tag = basename($url); // image filename
    $tag = wire('sanitizer')->name($tag); // sanitized filename
    $image = $page->images->getTag($tag); // do we already have it?
  
    if(!$image) {
      // we don't already have this image, import it
      try {
        $page->images->add($url);
      } catch(Exception $e) {
        $out .= "<div>ERROR importing: $url</div>";
        continue;
      }
      $numAdded++;
      $image = $page->images->last(); // get image that was just added
      $status = "NEW";
    } else {
      $status = "Existing";
    }
  
    $image->tags = $tag; 
    // replace old image URL with new image URL
    $body = str_replace($url, $image->url, $body);
    // report what we did 
    $out .= "<div>$status: $image->basename</div>";
  }

  // assign the updated $body back to the page
  $page->body = $body;

  // return a printable report of what was done
  return $out;
}
  • Like 8
Link to comment
Share on other sites

Topics and tags: The first step was to create the parent pages and templates for these. For topics, there were only a few of them, so I created all the category pages ahead of time. On the other hand, with tags, there are 2000+ of those, so those are imported separately. Here are the manual steps that I performed in the PW admin before importing topics and tags: 

  1. Created template "topics" and page /topics/ that uses this template. 
  2. Created template "topic" and 6 topic pages that use it, like /topics/cms-reviews/ for example.
  3. Created Page reference field "topics" with asmSelect input, set to use parent /topics/ and template "topic". 
  4. Created template "tags" and page /tag/ that uses this template. Note that I used /tag/ as the URL rather than /tags/ for consistency with the old WordPress URLs. Otherwise I would prefer /tags/ as the URL for consistency with the template name. 
  5. Created template "tag".
  6. Created Page reference field "tags" with PageAutocomplete input, set to use parent /tag/ and template "tag". I also set this one to allow creating of new pages from the field, so the admin can add new tags on the fly. 
  7. Added the new "topics" and "tags" fields to the "post" template. 

With all the right templates, fields and pages setup, we're ready to import. WordPress stores the topics, tags and the relationships of them to posts in various tables, which you'll see referenced in the SQL query below. It took some experimenting with queries in PhpMyAdmin before I figured it out. But once I got the query down, I put it in a function called importTopicsAndTags(). This function needs a connection to the WordPress database, which is passed into the function as $wpdb. For more details on $wpdb, see the first post in this thread. 

/**
 * Import WordPress topics and tags to ProcessWire
 *
 * This function assumes you will do your own $page->save(); later. 
 *
 * @param PDO $wpdb Connection to WordPress database
 * @param Page $page The ProcessWire "post" page you want to add topics and tags to. 
 *     This page must have a populated "wpid" field. 
 * @return string Report of what was done. 
 *
 */
function importTopicsAndTags(PDO $wpdb, Page $page) {
  $out = '';
  $sql = <<< _SQL

  SELECT wp_term_relationships.term_taxonomy_id, wp_term_taxonomy.taxonomy, 
  wp_term_taxonomy.description, wp_terms.name, wp_terms.slug
  FROM wp_term_relationships
  LEFT JOIN wp_term_taxonomy 
    ON wp_term_taxonomy.term_taxonomy_id=wp_term_relationships.term_taxonomy_id
  LEFT JOIN wp_terms 
    ON wp_terms.term_id=wp_term_taxonomy.term_id
  WHERE wp_term_relationships.object_id=$page->wpid
  ORDER BY wp_term_relationships.term_order

_SQL;

  $query = $wpdb->prepare($sql);
  $query->execute();

  while($row = $query->fetch(PDO::FETCH_ASSOC)) {

    if($row['taxonomy'] == 'category') {
      // this is a topic: find the existing topic in PW
      $topic = wire('pages')->get("/topics/$row[slug]/");
      if($topic->id) {
        // if $page doesn't already have this topic, add it
        if(!$page->topics->has($topic)) $page->topics->add($topic);
        // report what we did
        $out .= "<div>Topic: $topic->title</div>";
      }

    } else if($row['taxonomy'] == 'post_tag') {
      // this is a tag: see if we already have it in PW
      $tag = wire('pages')->get("/tag/$row[slug]/");
      if(!$tag->id) {
        // we don't already have this tag, so create it 
        $tag = new Page();
        $tag->template = 'tag';
        $tag->parent = '/tag/';
        $tag->name = $row['slug'];
        $tag->title = $row['name'];
        $tag->save();
      }
      // if $page doesn't already have this tag, add it
      if(!$page->tags->has($tag)) {
        $page->tags->add($tag);
        $out .= "<div>Tag: $tag->title</div>";
      }
    }
  }

  return $out;
}

  • Like 8
Link to comment
Share on other sites

Greetings,

I have been following this discussion -- an excellent example of a case study with highlights of how to accomplish several key goals in ProcessWire.

This last post highlights a couple of interesting points for me:

1. Emphasizes the advantage of having everything exist as a page in ProcessWire (in this case, tags).

2. How easy it is to use the API to implement functions that take care of major actions in ProcessWire.

3. How to migrate an existing CMS to ProcessWire. Might be dangerous if more people in the WordPress community knew about it!

Regarding 3: I come from the Joomla world. It seems that WordPress databases are more logical than Joomla databases. I think the migrate script for Joomla would be much more involved because simple "page" data is very fragmented in that CMS. But the same principles would apply.

As always, a very illuminating discussion!

Thanks,

Matthew

  • Like 2
Link to comment
Share on other sites

For authors, there were only about 6 of them at import time, so I created the authors as users in PW manually. I also added the "wpid" field to the "user" template, and populated the value of that manually. That was easy to find in WordPress just by editing the author and noting the ID in the URL. The WordPress wp_posts table has a field in it called post_author, which is the ID of the author. So assuming we've got a user in ProcessWire with a "wpid" that matches up to that, it's easy for us to assign the right PW user to each post. You'll see how this takes place in the code below.

Wrapping it up

Here is the same "import" code as in the first post, but I added all the code accounting for authors, topics, tags, and images back into it. This all just goes in a ProcessWire template file, and viewing the page triggers the import. Because it's aware of stuff that is already imported, it can be run multiple times without causing duplication. 

<!DOCTYPE html>
<html lang="en">
<head>
  <meta http-equiv="content-type" content="text/html; charset=utf-8" />
  <title>Import Posts</title>
</head>
<body>
  <table border='1' width='100%'>
  <thead>
    <tr>
      <th>New?</th>
      <th>ID</th>
      <th>Author</th>
      <th>Date</th>
      <th>Name</th>
      <th>Title</th>
      <th>Images</th>
      <th>Topics</th>
      <th>Changes</th>
    </tr>
</thead>
<tbody>
<?php

// get access to WordPress wpautop() function
include("/path/to/wordpress/wp-includes/formatting.php"); 

$wpdb = new PDO("mysql:dbname=wp_cmscritic;host=localhost", "user", "pass", 
  array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'UTF8'"));

$posts = wire('pages')->get('/posts/');

$sql = "
  SELECT * FROM wp_posts 
  WHERE post_type='post' 
  AND post_status='publish' 
  ORDER BY post_date
  ";

$query = $wpdb->prepare($sql);
$query->execute();

while($row = $query->fetch(PDO::FETCH_ASSOC)) {
  
  $post = $posts->child("wpid=$row[ID]"); // do we already have this post?
  
  if(!$post->id) {
    // create a new post
    $post = new Page();
    $post->template = 'post';
    $post->parent = $posts;
    echo "Creating new post...\n";
  }
  
  $post->of(false);
  $post->name = wire('sanitizer')->pageName($row['post_name']);
  $post->title = $row['post_title'];
  $post->date = $row['post_date'];
  $post->summary = $row['post_excerpt'];
  $post->wpid = $row['ID'];

  // find the post author
  $author = wire('users')->get("wpid=$row[post_author]");
  // if we don't have this post author, assign one (Mike)
  if(!$author->id) $author = wire('users')->get("mike");
  // set the post author back to the page
  $post->createdUser = $author;
  
  // assign the bodycopy after adding <p> tags
  // the wpautop() function is from WordPress /wp-includes/wp-formatting.php
  $post->body = wpautop($row['post_content']);
 
  // give detailed report about this post
  echo "<tr>" .
       "<td>" . ($post->id ? "No" : "Yes") . "</td>" .
       "<td>$row[ID]</td>" .
       "<td>$row[post_author]</td>" .
       "<td>$row[post_date]</td>" .
       "<td>$row[post_name]</td>" .
       "<td>$row[post_title]</td>" .
       "<td>" . importImages($post) . "</td>" .
       "<td>" . importTopicsAndTags($wpdb, $post) . "</td>" .
       "<td>" . implode('<br>', $post->getChanges()) . "</td>" .
       "</tr>";
 
  $post->save();

}

function importTopicsAndTags(PDO $wpdb, Page $page) {
  // see implementation in previous post
}

function importImages(Page $page) {
  // see implementation in previous post
}

?>
</tbody>
</table>
</body>
</html>

  • Like 12
Link to comment
Share on other sites

  • 2 weeks later...

This hasn't been asked, but wanted to cover how the permissions and publish workflow work on the site. It has a very simple, though nice setup, where authors can submit new posts but can't edit already published posts, nor can they edit unpublished posts by other authors. It enables Mike to have full control over any content that gets published on the site, while still allowing easy submission and edits for the authors.

Post workflow

All of the authors have a role called "author" with page-edit permission.

On the "post" template, the boxes for "edit" and "create" are checked for this "author" role. 

This site also makes use of the page-publish permission, which is an optional one in ProcessWire that you can add just by creating a new permission and naming it "page-publish". Once present, it modifies the behavior of the usual page-edit permission, so that one must also have page-publish in order to publish pages or edit already published pages.

The "author" role does not have page-publish permission. As a result, authors on the site can submit posts but can't publish them. Nor can they edit already published posts. In this manner, Mike has final say on anything that gets posted to the site. 

Post ownership

The default behavior in ProcessWire is that the Role settings control all access... meaning all users with role "author" would be able to do the same things, on the same pages. In this case, we don't want one author to be able to edit an unpublished/pending post created by another author. This was easily accomplished by adding a hook to /site/templates/admin.php:

/**
 * Prevent users from being able to edit pages created by other users of the same role
 *
 * This basically enforces an 'owner' for pages
 *
 */
wire()->addHookAfter('Page::editable', function($event) {
  if(!$event->return) return; // already determined user has no access
  if(wire('user')->isSuperuser()) return; // superuser always allowed
  $page = $event->object; 
  // if user that created the page is not the current user, don't give them access
  if($page->createdUser->id != wire('user')->id) $event->return = false; 
}); 

Planned workflow improvements

Currently an author has to let Mike know "hey my article is ready to be published, can you take a look?". This is done by email, I'm assuming. An addition I'd like to make is to add a Page reference field called "publish_status" where the author can select from: 

  • DRAFT: This is a work in progress (default)
  • PUBLISH: Ready for review and publishing
  • CHANGE: Changes requested - see editor notes
  • DELETE: Request deletion

Beyond that, there is also an "editor_notes" text field that only appears in the admin. It's a place where Mike and the author can communicate, if necessary, about the publish status. This editor_notes field doesn't appear on the front-end of the site. 

All this can be done in ProcessWire just by creating a new field and adding these as selectable page references. That's easy enough, but I want to make it so that it notifies both Mike (the reviewer) and the author by email, every time there is a change in publish status or to the editor_notes. This will be done via another hook in the /site/templates/admin.php: 

wire()->addHookAfter('Page::saveReady', function($event) {
  // get the page about to be saved
  $page = $event->arguments(0);

  // if this isn't a post, don't continue
  if($page->template != 'post' || !$page->id) return;

  // if this post wasn't made by an "author" don't continue
  if(!$page->createdUser->hasRole('author')) return;

  $subject = '';
  $message = '';

  if($page->isChanged('publish_status') || $page->isChanged('editor_notes')) {
    // the publish status or editor notes have changed
    $subject = "CMSCritic post publish status";
    $notes = $page->isChanged('editor_notes') ? "Notes: $page->editor_notes" : "";
    $message = "
      Title: $page->title\n
      URL: $page->httpUrl\n
      Status: {$page->publish_status->title}\n
      $notes
      ";

  } else if($page->isChanged('status') && !$page->is(Page::statusUnpublished)) {
    // page was just published
    $subject = "CMSCritic post published";
    $message = "The post $page->httpUrl has been published!";
  }

  if($message) {
    $reviewer = wire('users')->get('mike'); 
    $author = $page->createdUser; 
    mail("$reviewer->email, $author->email", $subject, $message); 
    $this->message("Email sent: $subject"); 
  }

}); 
 

Mike, if you are reading this, does this sound useful to you? 

  • Like 16
Link to comment
Share on other sites

  • 1 month later...
  • 7 months later...

I'm having trouble with dates coming over properly. All the dates are getting set to the current time of import, which is obviously not useful.On line 53 of the revised code, you have:
 

$post->date = $row['post_date'];

But I don't see the date method in the $page documentation. I tried:
 

$post->created = strtotime($row['post_date']);
$post->modified = strtotime($row['post_modified']);

But with the same results. Any suggestions?
 



Also, for others attempting this, I encountered this tidbit that may be useful to you:
The SQL to get the data starts:
  SELECT * FROM wp_posts 

But I had modified my WordPress table definition prefix for security reasons (like everyone should have), so I had to change wp_posts to wp_xxx_posts

That threw me for half an hour.

Link to comment
Share on other sites

Seems like one should be able to modify the created and modified properties (correct term?).

I created a field for date and then was able to move this field over to the posts so it would be easier to manage with new posts as well with:

update pages,field_date set pages.created=field_date.data where pages.id=field_date.pages_id

Since I'm only doing a one-time import, I can now delete the date field.

Link to comment
Share on other sites

Quiet mode will help you with the created part:

https://processwire.com/talk/topic/5109-page-save-silently/?p=49275

but to set the modified you'll still need to use SQL because as soon as you save the page, modified gets updated again. At least that is my experience. Maybe there is another workaround I haven't thought of.

https://processwire.com/talk/topic/651-set-created-by-creating-a-page-via-api/?p=5293

Link to comment
Share on other sites

  • 6 months later...
  • 3 months later...

Thank you for sharing this Ryan, great stuff!

I would really like to use PW for an upcoming project, but since it is heavily relying on publishing news, I'm still hesitating to use WP, to take advantage of all the scheduling options.

So my question: since CMSCritic posts very frequently, did you use a scheduling mechanism/module to be able to post on certain days and hours, throuigh cron (or lazycron)?

Since this is not built in in PW, I searched for some tips on the forum, but besides the known module there's not much buzz around scehduling posts.

Thank you!

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...