Jump to content

MODX content (including images) to ProcessWire


Peter Knight
 Share

Recommended Posts

I'm redeveloping a site from MODX to ProcessWire. The client has about 1200 pages and exporting the current database to a CSV and then importing content has been relatively easy using the CSV to pages module.

Client is fine with manually porting over 1200 images but as he's just had his 3rd child and is a busy man, i was wondering if there was a way to somehow pull in images from each MODX post and import into each new PW pages a Images field.

Looking for general thoughts. I'll have to outsource this as my own database chops are minimal but having some insight can help me brief a dev.

Many thanks

Peter

Link to comment
Share on other sites

I did this on a smaller scale. I'll see if I can find my code but it won't be til tomorrow unfortunately.

It basically just looked for the images in the HTML usng a preg_match_all I think, imported the images to the new page, replaced the image URLs in the HTML and saved the updated HTML. Worked quite well but I was doing it with few enough pages that I was checking them a page at a time.

That way was a little less system-specific actually.

Link to comment
Share on other sites

It is totally doable if you follow Ryan's tips here: https://processwire.com/talk/topic/3987-cmscritic-development-case-study/

It will be a good thing if you have imported the posts keeping somewhere the old ID, did you?

Yes, I have page IDs. I presume that's good for creating a new PW page with matching ID as the old one. The Page IDs are some kind of bridge or reference for fetching images?

It is totally doable if you follow Ryan's tips here: https://processwire.com/talk/topic/3987-cmscritic-development-case-study/

It will be a good thing if you have imported the posts keeping somewhere the old ID, did you?

Yes, I have page IDs. I presume that's good for creating a new PW page with matching ID as the old one. The Page IDs are some kind of bridge or reference for fetching images?

Link to comment
Share on other sites

I did this on a smaller scale. I'll see if I can find my code but it won't be til tomorrow unfortunately.

It basically just looked for the images in the HTML usng a preg_match_all I think, imported the images to the new page, replaced the image URLs in the HTML and saved the updated HTML. Worked quite well but I was doing it with few enough pages that I was checking them a page at a time.

That way was a little less system-specific actually.

Thanks Pete. I'd be interested in seeing that although I wouldn't attempt it myself.

I was thinking of dumping all 1200+ into a containing page called "to be sorted" and that way my client can then manually move them into their correct locations over time.

I know it's still quite a bit of manual work but the new structire and templates are still being designed so I need a holding bay for them until we're ready to figure out where they go.

I've created a temporary field in PW called "old URL" which contains the previous full url (folder/folder/pagename) so he can easily identify where a post sat originally.

Link to comment
Share on other sites

I've created a temporary field in PW called "old URL" which contains the previous full url (folder/folder/pagename) so he can easily identify where a post sat originally.

There you have your answer! Loop through all the pages in your PW install, get the url, pass it to a DOMparser, find all the images inside the content section from that page, get their url, and store them in this page image field.

Pseudo-code:

foreach ( $pages->find("template=all|imported|pages") as $p) {

    $html = file_get_html($p->old_url);
    
    foreach( $html->getElementById("content")->img as $img ) {

        $src = $img->src;
        
        // The image url is $src now
        // see the in this post in the cmscritic case study 
        // how to extract the file name, and use both the url and filename to import the images to the page in $p

    }
}
Link to comment
Share on other sites

diogo's pretty spot on there - I did this all back before I knew things like simplehtmldom (in his above post) existed. If you get the contents imported and use that to iterate through all the image fields, you can indeed then use the article he links to to have PW pull the image files into PW and replace the output with simplehtmldom as you go.

My code was so specific to the site I was working on I don't think it's worth posting but I will try and dig it out tomorrow to see if there are any other useful pointers that arose from the process.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...