Jump to content

migration from wordpress to processwire


peterpp
 Share

Recommended Posts

Hi Pravin

There's a manual way: https://processwire.com/talk/topic/3987-cmscritic-development-case-study/

Or a new converter - install this: https://github.com/adrianbj/ProcessMigrator then this: https://github.com/NicoKnoll/MigratorWordpress and follow the instructions on the second link there :)

The topic discussing the migrator is roughly here: https://processwire.com/talk/topic/4420-page-list-migrator/page-6

  • Like 6
Link to comment
Share on other sites

Hi Pravin

There's a manual way: https://processwire.com/talk/topic/3987-cmscritic-development-case-study/

Or a new converter - install this: https://github.com/adrianbj/ProcessMigrator then this: https://github.com/NicoKnoll/MigratorWordpress and follow the instructions on the second link there :)

The topic discussing the migrator is roughly here: https://processwire.com/talk/topic/4420-page-list-migrator/page-6

Thanx a lot..i will try...

Link to comment
Share on other sites

  • 3 weeks later...

Hi.

First off the work you guys have done with the Wordpress migration tool is amazing, unfortunalety I am not able to get it to work :(

I am running PW 2.4, I have tried to modules both in a WAMP environment and ISP provided environments, both in brand new PW installations and current ones.

I have loaded the blog.zip profile first, this works fine.

Then when I try to import the XML from wordpress (3.9.1) all I am getting is a message saying.

"Missing required ZIP or JSON source"

There is a possibility that I have missed something vital.

This are the basic steps that I take.

Install PW

Install module ProcessMigration

Install module MigratorWordpress

Migrating the blog.zip profile to /home/

Trying to migrate the wordpress.xml to /home/blog (root of blog recently created with blog.zip)

Any Ideas ?

Would really like to get this one working.

Thanks Ronnie

Link to comment
Share on other sites

Hi Ronnie - I am taking a look at this now - it is possible I broke something with the MigratorWordpress in my recent changes to Migrator. I see that you also posted over in the Migrator thread, so I'll respond there when I have an answer/fix. 

EDIT: Fixed in latest version on Github

Link to comment
Share on other sites

  • 3 months later...

Hi Andi,

The wordpress migrator works great with kongondo's blog module. You just need to make sure the field and template names match. These are fully configurable in the wordpress migrator settings. If you have already installed the blog modules these should be easy to figure out, but let us know if you have any questions.

Link to comment
Share on other sites

post-1232-0-67545500-1414485556_thumb.jp

Hi.

I have run in to some issues regarding content that has Swedish letter present (Å Ä Ö).

My quess is that I need to "setlocale" somewhere in the code to get this working.

This is what I have found so far.

XML that has been exported from Wordpress.

Here the letters Å Ä Ö is present in its original form, correct.

<content:encoded><![CDATA[Idag provade jag att gå på gympa.</content:encoded>

When I have used the XML file in the MigratorWordpress plug in this is the resultet JSON.

My understanding is that this is a correct when JSON is uft8 encoded, meaning \u00F6 = ö

Massa nya barn och fr\u00f6knar hade b\u00f6rjat s\u00e5

But then when content is being created in PW its getting all weird.

This is copied from the DB.

Where i am prettys sure that Ãyen; does not equal "å".

Collation on the specific DB table is "utf8_general_ci"

Peggy på besök.

So my guess is that the translation goes wrong when the JSON file is being decoded´.

But a funny thing is that this is only happening to body content, the title of a post is being translated correctly. (see attached image.)

Anyone have any idea of where I should start ?

Thanks / Ronnie

post-1232-0-67545500-1414485556_thumb.jp

Link to comment
Share on other sites

Hi Ronnie,

I don't know too much about UTF conversions (the curse of being a native English speaker), but could you please try changing this line

from:

return json_encode($this->getNew());

to:

return json_encode($this->getNew(), JSON_UNESCAPED_UNICODE);

OR, If you don't have php 5.4+ then you instead of the above change, you need to add this function somewhere in the MigratorWordpress file:

public function raw_json_encode($input) {

return preg_replace_callback(
'/\\\\u([0-9a-zA-Z]{4})/',
function ($matches) {
return mb_convert_encoding(pack('H*',$matches[1]),'UTF-8','UTF-16');
},
json_encode($input)
);

}

and then use:

return raw_json_encode($this->getNew());

Please let us know if that works and we'll get it updated.

  • Like 1
Link to comment
Share on other sites

Just to follow up, I have made a change to the Worpdress Migrator which should hopefully handle all non latin characters. I am waiting on Nico to accept my pull request, so until then, you can grab my forked version at: https://github.com/adrianbj/MigratorWordpress

You should also grab the latest version of Migrator itself as there is a fix for certain image filenames coming from WP that were not being properly cleaned and hence not being embedded into the RTE field.

Please let me know if that fixes your problems Ronnie.

On a side note, that commit also swaps out TinyMCE for CkEditor, so now WP to PW migrations won't try to use TinyMCE.

I hope to find some time shortly (unless Nico beats me to it - please feel free to :)) to handle other page content structures from WP, in particular the type where you have Header and Paragraph columns in the page editor, which are stored in meta_key/meta_value in the exported xml file. 

  • Like 1
Link to comment
Share on other sites

Hello Adrian.

First off, thanks alot for your efforts and the reply.

I have tested the updated version of the "migrator wordpress" module, and it does fix the issue with the body content, GREAT WORK !

However the thing that worked before, the title is now getting messed up.

For instance, this tite:

Lördagskväll med mostrarna

Gets formatted like this:

Lördagkväll med mostrarna

ö does translate to "ö"

and 

ä does translate to "ä"

Se more here:

http://www.tiger.se/dok/koder.html

post-1232-0-48505300-1414675095_thumb.pn

As the title became correct before, I guess it would be possible to merge the two versions :)

post-1232-0-48505300-1414675095_thumb.pn

Link to comment
Share on other sites

Glad to hear we are part way there. Unfortunately I am struggling to get this fully correct. I don't think I really want to be converting to htmlentities like I did - while it works in the body field, it is obviously causing problems in the title field.

I have tried numerous combinations now and still not quite succeeded. The closest I got was having everything appear correct, except for edit mode for the title.

Is there someone out there is non-English speaking land who deals with these characters all the time who knows the right way to do this. The problem is getting the escaped unicode characters from the json back to normal utf8 characters. 

I could keep stumbling through, but someone must know the right way. Stack Overflow hasn't been very useful - all solutions come up short - either that, or I am missing something obvious.

Big thanks to anyone who can help out :)

Link to comment
Share on other sites

I decided to commit the changes that got me close - everything works, except for the hexidecimal entities when in edit mode for the title.

Not ideal, but I wanted to commit some other fixes for properly resizing images in RTE fields to match the dimensions from the WP source, so thought I'd throw the encoding changes in there too in case you want to test.

Anyone have any ideas on properly fixing the encoding - please :)

Link to comment
Share on other sites

Hi Adrian

have a look here as it sums it up very good http://codex.wordpress.org/Converting_Database_Character_Sets

You need to convert the fields to their binary counterparts first and than convert to utf 8 and than move the data again from the binary counterparts to the originals.
 

  • CHAR ⇾ BINARY
  • TEXT ⇾ BLOB
  • TINYTEXT ⇾ TINYBLOB
  • MEDIUMTEXT ⇾ MEDIUMBLOB
  • LONGTEXT ⇾ LONGBLOB
  • VARCHAR ⇾ VARBINARY

In 2007 we converted the complete UNESCO Bangkok website (about 5500 Pages and 16.000 Content Elements (multilingual in the 6 UNESCO Languages) from latin1 to utf8 actually using WordPress. The original was a TYPO3 Database so we installed WordPress into that database and than called the Wordpress Plugin for the conversion. Not sure if it still exists and is still working. This plugin was actually able to convert all available Tables and Fields in that database no matter if they were TYPO3 or WordPress or something else. Attacked if they were already outfit and skipped those. The latin ones it converted first to binaries and than performed the conversion before moving back the data.

The only problem we actually encountered was that some data was already utf8 encoded but the settings were latin1. so the plugin thought to convert those which caused a doubled conversion and strange output. 

https://wordpress.org/plugins/utf-8-database-converter/

http://naveensnayak.wordpress.com/2013/07/31/mysql-convert-to-utf8/

https://codex.wordpress.org/User:JeremyClarke/exampleSQLForUTF8Conversion

Andi

Link to comment
Share on other sites

Thanks for the links Andi although they seem to be all about db conversion - with migrator we don't have access to the db, just the exported xml from WP.

The biggest problem I am having is dealing with escaped unicode characters when using json_decode. I can use the JSON_UNESCAPED_UNICODE option, but that doesn't actually seem to help much come decoding time during import.

So at the moment I am not using that setting and instead just before decode I am using:

preg_replace('/\\\u([0-9a-z]{4})/', '$1;', $json);

This results in everything rendering well. Most characters come through as normal utf8 and those that don't get converted to hex entities, but I don't really want these stored in the database like this (and they appear as hex in plain text fields and of course in the html view in RTE fields). I have tried several snippets to convert these back to utf8, but with no luck.

I feel like I might be close - just need the right combination of conversions in the right order, but it has eluded me so far!

Link to comment
Share on other sites

Hi Adrian

Yes we had always database access which makes things much easier!

Perhaps simply mention in the migrator description that people should convert their sites first to UTF8 - while they are still in Wordpress or elsewhere and than start the actual migration process to processwire with an xml which contains that utf8 data already. 

The main problem we encountered was doubled encoded stuff and this can be a real headache. By converting first to utf8 before exporting data to xml this could be avoided or actually the problem would be with the ones who created probably that encoding mismatch. ;-)

Link to comment
Share on other sites

  • 5 weeks later...

I am trying to move my old website to PW and this would be great help if only ... I would know what I am supposed to do with this module. Everything seems to work, no errors reported, but id does not import anything. Am I supposed to create templates/fields beforehand?

Link to comment
Share on other sites

encho - there is no need to create any templates/fields - that occurs automatically. I have personally only tested it with the blog section of a WP site so far, but I think it should also work with normal WP pages. I will actually be using it myself in the next couple of weeks, so if there is an issue with ordinary pages, or any other specific types of WP content, I will be keen to get these sorted out.

Could you perhaps PM me your WP xml export file so I can take a look?

Link to comment
Share on other sites

  • 1 month later...

Just wanted to let everyone know that thanks to @Sephiroth we now have support for comment migration - you'll need to grab the latest commits for both Migrator and MigratorWordpress!

Next on my list is custom fields ...

  • Like 4
Link to comment
Share on other sites

anyone with non-latin wordpress contents , back at the office wanna look into it and read up on it. thanks all

Thanks for wanting to help with this, but I just managed to fix it - talk about approaching things all a** backwards. It was a very simple problem and not at all related to what I thought. I should have clued in when I read Ronnie's original analysis of things:

But a funny thing is that this is only happening to body content, the title of a post is being translated correctly.

Anyway, it should all be working fine now - please let me know if anyone notices any further encoding problems!

You only need the new version of ProcessMigrator, NOT MigratorWordpress.

Link to comment
Share on other sites

Nice work Adrian. am glad you got it fixed. will test it today. I have a xml file over 40MB its quite large so am gonna run it over Migrator and see if there's any issue with large xml. I have like 7gb of images from wordpress to migrate to processwire i prefer not to duplicate them. My punishment for not switching to WordPress on time. nice work

Link to comment
Share on other sites

Great info Adrian.

Will try it out when I find the time, what was the hickup ? 

Thanks / Ronnie

It was all due to DOMDocument which Migrator uses to convert RTE textarea field embedded image paths from assets/files/page_id/filename.jpg to assets/files/page_name/filename.jpg and back again. These conversions are actually not needed for MigratorWordpress, but are necessary for export/import from one PW site to another so that the images can be referenced by a meaningful path from the exported JSON and converted to the ID of the new page once imported to the new site. DOMDocument needs to load the html like so:

$dom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));

Without the mb_convert_encoding option, it converts UTF8 characters to entities which is where things start getting ugly.

Nice work Adrian. am glad you got it fixed. will test it today. I have a xml file over 40MB its quite large so am gonna run it over Migrator and see if there's any issue with large xml. I have like 7gb of images from wordpress to migrate to processwire i prefer not to duplicate them. My punishment for not switching to WordPress on time. nice work

Sounds great - definitely keen to hear how that goes - 7GB will be interesting - hope your internet connection is fast! I know that Joss was testing a largish file on his local dev setup and was having some issues, although it was working fine for me. It might be helpful to up your PHP memory settings. Long term I probably need to implement some way of batching things.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...