woop Posted August 7, 2014 Share Posted August 7, 2014 Hi! Has anyone had success with importing comments from Disqus to Processwire? I'm thinking about migrating my disqus content since I need more control over form and logic. I get a large .xml export from Disqus but I'm not sure how to get this into Processwire. Seems like comments aren't available as pages, so I can't use to CSV to Pages plugin. Here's an excerpt from the export: <thread dsq:id="12312312312"> <id /> <forum>my-disqus-forum</forum> <category dsq:id="123123" /> <link>http://example.com/article/231</link> <title>Article title</title> <message /> <createdAt>2006-11-29T23:38:43Z</createdAt> <author> <email>john@doe.com</email> <name>John</name> <isAnonymous>false</isAnonymous> <username>Johnnyboy</username> </author> <ipAddress>127.0.0.1</ipAddress> <isClosed>false</isClosed> <isDeleted>false</isDeleted> </thread> Thanks for helping out! Link to comment Share on other sites More sharing options...
Craig Posted August 7, 2014 Share Posted August 7, 2014 Yes this should be possible. Have a look at this thread - Is It Possible To Add A 'comment' Programmatically? The only thing you need to do is write some code to parse your XML and use the API functions to get pages, but certainly doable 4 Link to comment Share on other sites More sharing options...
woop Posted August 7, 2014 Author Share Posted August 7, 2014 Thanks! I don't have the time to finish this right now (going on vacation). I did manage to put together a basic parser in case anyone else is interested. Feel free to continue from here. I'll put together a full migration script when I find more time $disquscomments = simplexml_load_file('disquscomments.xml'); # All threads foreach ($disquscomments->thread as $t){ echo $t->attributes("http://disqus.com/disqus-internals")['id']; echo $t->forum; echo $t->category->attributes("http://disqus.com/disqus-internals")['id']; echo $t->link; echo $t->title; echo $t->message; echo $t->createdAt; echo $t->author->email; echo $t->author->name; echo $t->author->isAnonymous; echo $t->author->username; echo $t->ipAdress; echo $t->isClosed; echo $t->isDeleted; echo "<hr>"; } # All comments foreach ($disquscomments->post as $c){ echo $c->attributes("http://disqus.com/disqus-internals")['id']; echo $c->id; echo $c->message; echo $c->createdAt; echo $c->isDeleted; echo $c->isSpam; echo $c->author->email; echo $c->author->name; echo $c->author->isAnonymous; echo $c->author->username; echo $c->ipAdress; echo $c->thread->attributes("http://disqus.com/disqus-internals")["id"]; echo $c->parent->attributes("http://disqus.com/disqus-internals")["id"]; echo "<hr>"; } 4 Link to comment Share on other sites More sharing options...
adrian Posted August 7, 2014 Share Posted August 7, 2014 Thanks! I don't have the time to finish this right now (going on vacation). I did manage to put together a basic parser in case anyone else is interested. Feel free to continue from here. I'll put together a full migration script when I find more time Maybe this could be a new third party plugin for the Migrator module (https://github.com/adrianbj/ProcessMigrator) If you haven't seen it already, the first plugin for it is Nico's MigratorWordpress (https://github.com/NicoKnoll/MigratorWordpress) which also converts XML to JSON for use with Migrator. I'd be happy to help if you have any questions on how to implement. 3 Link to comment Share on other sites More sharing options...
woop Posted August 8, 2014 Author Share Posted August 8, 2014 Thanks for the tip – I'll check that out later. The thing I'm finding most confusing is the way a comment is created. It seems like it's not an ordinary page/template, which means that I can't add new fields to it in the admin. I bet there's a good reason for this but it felt a bit un-processwirey to not being able to simply extend a comment template. I guess I'll have to hack the comment class instead, like described here: https://processwire.com/talk/topic/2092-additional-input-field-for-url-in-comment-form/ ? Link to comment Share on other sites More sharing options...
teppo Posted August 8, 2014 Share Posted August 8, 2014 @woop: I'd assume that Ryan was looking for best possible performance when building the comments system. Comments being it's own fieldtype with hardcoded schema increases it's performance and makes it possible to fine-tune everything especially for this use case. Main difference from pages is that those are usually something you create yourself and it's quite rare for sites to go beyond thousands of pages (though sites with hundreds of thousands of pages (or more) definitely exist). Comments, on the other hand, are something that visitors (guest users) can create without your consent, which means that there can be a lot of those.. that, and the fact that they're almost always identical between different sites, making "hardcoded schema" possible in the first place. If you require custom fields to Comments, you've got two choices at the moment: create your own Comments field based on what's included with ProcessWire (call it CommentsExtended or whatever you prefer) or hack the built-in module. I'd suggest creating your own field (option 1), mainly because hacking built-in components is never a good idea in the long term (updates get tricky etc.) Link to comment Share on other sites More sharing options...
woop Posted August 9, 2014 Author Share Posted August 9, 2014 Thanks for the clarification, Teppo! Could you elaborate a bit more on option 1. Not sure I understand how to go about. Link to comment Share on other sites More sharing options...
teppo Posted August 10, 2014 Share Posted August 10, 2014 @woop, it means roughly that instead of modifying files within /wire/modules/Fieldtype/FieldtypeComments/ directly, you'd copy entire FieldtypeComments directory to /site/modules/FieldtypeCommentsExtended/, change classnames etc. there to match that new name and then modify the code per your needs. Modifying code within /site/* is always safer and more future-proof than touching any code within /wire/*. 1 Link to comment Share on other sites More sharing options...
woop Posted October 30, 2014 Author Share Posted October 30, 2014 Hi again! Here's my Disqus import script! Some fields doesn't work correctly, but the basic parsing and import works fine. Will update this post as my code improves. Feel free to chip in! EDIT: Updated code, which runs fine now. Just successfully imported +16000 comments <?php # Heavily inspired by: http://www.binarytides.com/disqus-comments-importer-script-in-php/ ini_set('max_execution_time', 0); // unlimited execution time, because of large amount of comments ini_set('memory_limit', '512M'); $file = 'disquscomments.xml'; $doc = new DOMDocument(); $doc->load($file); $thread_list = array(); $threads = $doc->getElementsByTagName('thread'); foreach($threads as $thread) { if (!isset($thread->getElementsByTagName('link')->item(0)->textContent)) continue; $comment = array(); $comment['thread_id'] = $thread->getAttribute('dsq:id'); $comment['url'] = $thread->getElementsByTagName('link')->item(0)->textContent; $path = parse_url($comment['url'], PHP_URL_PATH); $path = preg_replace("/(\/){2,}/", "/", $path); // remove multiple slashes $path = $sanitizer->url($path); if ($pages->get($path)->id){ $comment['page_id'] = $pages->get($path)->id; } $thread_list[$comment['thread_id']] = $comment; } $post_list = array(); $posts = $doc->getElementsByTagName('post'); foreach($posts as $post) { $comment = array(); $comment['comment_id'] = $post->getAttribute('dsq:id'); $comment['thread_id'] = $post->getElementsByTagName('thread')->item(0)->getAttribute('dsq:id'); $comment['comment'] = $post->getElementsByTagName('message')->item(0)->nodeValue; $comment['created_at'] = $post->getElementsByTagName('createdAt')->item(0)->nodeValue; $comment['email'] = $post->getElementsByTagName('author')->item(0)->getElementsByTagName('email')->item(0)->nodeValue; $comment['name'] = $post->getElementsByTagName('author')->item(0)->getElementsByTagName('name')->item(0)->nodeValue; if ($post->getElementsByTagName('parent')->item(0)) { $comment['d_parent_id'] = $post->getElementsByTagName('parent')->item(0)->getAttribute('dsq:id'); } if (isset($thread_list[$comment['thread_id']]) && isset($thread_list[$comment['thread_id']]['page_id'])){ $thread = $thread_list[$comment['thread_id']]; $comment['page_id'] = $thread['page_id']; // the corresponding PW page's ID $post_list[$comment['comment_id']] = $comment; // only accept pages with pageids } } $postsadded = 0; foreach($post_list as $post){ if ($pages->get("disqus_id={$post['comment_id']}")->id) continue; //ignore already imported $c = new Page(); $c->setOutputFormatting(false); $c->template = $templates->get("mycomment"); $c->username = $post['name']; $c->title = "temporary title"; $c->publish_date = $post['created_at']; $c->disqus_id = $post['comment_id']; $c->body = $post['comment']; // If there's a parent comment, use this as parent if (isset($post['d_parent_id']) && isset($post_list[$post['d_parent_id']])){ $disqusparentID = $post_list[$post['d_parent_id']]; $savedParent = $pages->get("disqus_id={$disqusparentID['comment_id']}"); // must find already created page if ($savedParent->id){ $c->parent = $savedParent; } else { $c->parent = $page; // dump it here } } elseif (isset($post['page_id'])){ $c->parent = $post['page_id']; // root comment } else { continue; } $c->save(); $c->name = $c->id; $c->title = $c->id; $c->save(); $postsadded++; } echo "<br>#######STATS#########<br>"; echo "added +{$postsadded} comments<br>"; echo "total of threads in disquscomments.xml: ".$threads->length."<br>"; echo "total of posts in disquscomments.xml: ".$posts->length."<br>"; echo "total of posts imported: ".count($pages->find('template=mycomment')); 4 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now