Jump to content

Import Disqus content to Comments module? (.xml to comments)


woop
 Share

Recommended Posts

Hi! Has anyone had success with importing comments from Disqus to Processwire? I'm thinking about migrating my disqus content since I need more control over form and logic.

I get a large .xml export from Disqus but I'm not sure how to get this into Processwire. Seems like comments aren't available as pages, so I can't use to CSV to Pages plugin.

Here's an excerpt from the export:

<thread dsq:id="12312312312">
<id />
<forum>my-disqus-forum</forum>
<category dsq:id="123123" />
<link>http://example.com/article/231</link>
<title>Article title</title>
<message />
<createdAt>2006-11-29T23:38:43Z</createdAt>
<author>
<email>john@doe.com</email>
<name>John</name>
<isAnonymous>false</isAnonymous>
<username>Johnnyboy</username>
</author>
<ipAddress>127.0.0.1</ipAddress>
<isClosed>false</isClosed>
<isDeleted>false</isDeleted>
</thread>

Thanks for helping out!

Link to comment
Share on other sites

Thanks! I don't have the time to finish this right now (going on vacation). I did manage to put together a basic parser in case anyone else is interested. Feel free to continue from here. I'll put together a full migration script when I find more time :)

$disquscomments = simplexml_load_file('disquscomments.xml');

# All threads
foreach ($disquscomments->thread as $t){
	echo $t->attributes("http://disqus.com/disqus-internals")['id'];
	echo $t->forum;
	echo $t->category->attributes("http://disqus.com/disqus-internals")['id'];
	echo $t->link;
	echo $t->title;
	echo $t->message;
	echo $t->createdAt;
	echo $t->author->email;
	echo $t->author->name;
	echo $t->author->isAnonymous;
	echo $t->author->username;
	echo $t->ipAdress;
	echo $t->isClosed;
	echo $t->isDeleted;
	echo "<hr>";
}

# All comments
foreach ($disquscomments->post as $c){
	echo $c->attributes("http://disqus.com/disqus-internals")['id'];
	echo $c->id;
	echo $c->message;
	echo $c->createdAt;
	echo $c->isDeleted;
	echo $c->isSpam;
	echo $c->author->email;
	echo $c->author->name;
	echo $c->author->isAnonymous;
	echo $c->author->username;
	echo $c->ipAdress;
	echo $c->thread->attributes("http://disqus.com/disqus-internals")["id"];
	echo $c->parent->attributes("http://disqus.com/disqus-internals")["id"];
	echo "<hr>";
}
  • Like 4
Link to comment
Share on other sites

Thanks! I don't have the time to finish this right now (going on vacation). I did manage to put together a basic parser in case anyone else is interested. Feel free to continue from here. I'll put together a full migration script when I find more time

Maybe this could be a new third party plugin for the Migrator module (https://github.com/adrianbj/ProcessMigrator)

If you haven't seen it already, the first plugin for it is Nico's MigratorWordpress (https://github.com/NicoKnoll/MigratorWordpress) which also converts XML to JSON for use with Migrator. I'd be happy to help if you have any questions on how to implement.

  • Like 3
Link to comment
Share on other sites

Thanks for the tip – I'll check that out later. The thing I'm finding most confusing is the way a comment is created. It seems like it's not an ordinary page/template, which means that I can't add new fields to it in the admin. I bet there's a good reason for this but it felt a bit un-processwirey to not being able to simply extend a comment template. I guess I'll have to hack the comment class instead, like described here: https://processwire.com/talk/topic/2092-additional-input-field-for-url-in-comment-form/ ?

Link to comment
Share on other sites

@woop: I'd assume that Ryan was looking for best possible performance when building the comments system. Comments being it's own fieldtype with hardcoded schema increases it's performance and makes it possible to fine-tune everything especially for this use case.

Main difference from pages is that those are usually something you create yourself and it's quite rare for sites to go beyond thousands of pages (though sites with hundreds of thousands of pages (or more) definitely exist). Comments, on the other hand, are something that visitors (guest users) can create without your consent, which means that there can be a lot of those.. that, and the fact that they're almost always identical between different sites, making "hardcoded schema" possible in the first place.

If you require custom fields to Comments, you've got two choices at the moment:

  1. create your own Comments field based on what's included with ProcessWire (call it CommentsExtended or whatever you prefer) or
  2. hack the built-in module.

I'd suggest creating your own field (option 1), mainly because hacking built-in components is never a good idea in the long term (updates get tricky etc.)

Link to comment
Share on other sites

@woop, it means roughly that instead of modifying files within /wire/modules/Fieldtype/FieldtypeComments/ directly, you'd copy entire FieldtypeComments directory to /site/modules/FieldtypeCommentsExtended/, change classnames etc. there to match that new name and then modify the code per your needs. Modifying code within /site/* is always safer and more future-proof than touching any code within /wire/*. 

  • Like 1
Link to comment
Share on other sites

  • 2 months later...

Hi again! Here's my Disqus import script! Some fields doesn't work correctly, but the basic parsing and import works fine. Will update this post as my code improves. Feel free to chip in!

EDIT: Updated code, which runs fine now. Just successfully imported +16000 comments :)

<?php 
# Heavily inspired by: http://www.binarytides.com/disqus-comments-importer-script-in-php/

ini_set('max_execution_time', 0); // unlimited execution time, because of large amount of comments
ini_set('memory_limit', '512M');

$file = 'disquscomments.xml';
$doc = new DOMDocument();
$doc->load($file);

$thread_list = array();
$threads = $doc->getElementsByTagName('thread');

foreach($threads as $thread) {
if (!isset($thread->getElementsByTagName('link')->item(0)->textContent)) continue;

$comment = array();
$comment['thread_id'] = $thread->getAttribute('dsq:id');
$comment['url'] = $thread->getElementsByTagName('link')->item(0)->textContent;
$path = parse_url($comment['url'], PHP_URL_PATH);
$path = preg_replace("/(\/){2,}/", "/", $path); // remove multiple slashes
$path = $sanitizer->url($path);
if ($pages->get($path)->id){
$comment['page_id'] = $pages->get($path)->id;
}

$thread_list[$comment['thread_id']] = $comment; 
}

$post_list = array();
$posts = $doc->getElementsByTagName('post');

foreach($posts as $post) {

$comment = array();
  $comment['comment_id'] = $post->getAttribute('dsq:id');
  $comment['thread_id'] = $post->getElementsByTagName('thread')->item(0)->getAttribute('dsq:id');
  $comment['comment'] = $post->getElementsByTagName('message')->item(0)->nodeValue;
  $comment['created_at'] = $post->getElementsByTagName('createdAt')->item(0)->nodeValue;
  $comment['email'] = $post->getElementsByTagName('author')->item(0)->getElementsByTagName('email')->item(0)->nodeValue;
   $comment['name'] = $post->getElementsByTagName('author')->item(0)->getElementsByTagName('name')->item(0)->nodeValue;
  if ($post->getElementsByTagName('parent')->item(0)) {
  $comment['d_parent_id'] = $post->getElementsByTagName('parent')->item(0)->getAttribute('dsq:id');
}
if (isset($thread_list[$comment['thread_id']]) && isset($thread_list[$comment['thread_id']]['page_id'])){
$thread = $thread_list[$comment['thread_id']];
$comment['page_id'] = $thread['page_id']; // the corresponding PW page's ID
$post_list[$comment['comment_id']] = $comment; // only accept pages with pageids
}

}
$postsadded = 0;
foreach($post_list as $post){

if ($pages->get("disqus_id={$post['comment_id']}")->id) continue; //ignore already imported

$c = new Page();

$c->setOutputFormatting(false);
$c->template = $templates->get("mycomment");
$c->username = $post['name'];
$c->title = "temporary title";
$c->publish_date = $post['created_at'];
$c->disqus_id = $post['comment_id'];
$c->body = $post['comment'];
// If there's a parent comment, use this as parent
if (isset($post['d_parent_id']) && isset($post_list[$post['d_parent_id']])){

$disqusparentID = $post_list[$post['d_parent_id']];
$savedParent = $pages->get("disqus_id={$disqusparentID['comment_id']}"); // must find already created page
if ($savedParent->id){
$c->parent = $savedParent; 
} else {
$c->parent = $page; // dump it here
}

} elseif (isset($post['page_id'])){
$c->parent = $post['page_id']; // root comment
} else {
continue;
}

$c->save();
$c->name = $c->id;
$c->title = $c->id;
$c->save();
$postsadded++;
}
echo "<br>#######STATS#########<br>";
echo "added +{$postsadded} comments<br>";
echo "total of threads in disquscomments.xml: ".$threads->length."<br>";
echo "total of posts in disquscomments.xml: ".$posts->length."<br>";
echo "total of posts imported: ".count($pages->find('template=mycomment'));
  • Like 4
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...