Jump to content

Copying pages or content from one Processwire instance to another


FrancisChung
 Share

Recommended Posts

Hi there,

I was wondering if there's a module that allows me to copy certain pages or content from one instance of Processwire to another instance?

Unfortunately, I can't copy a whole instance across as there needs to be some major cleanup job performed and it won't get done in time.

Otherwise, if anyone has any code samples that would be much appreciated.

Link to comment
Share on other sites

@FrancisChung - I am going to be offline for about a month, but please let me know how Migrator works out for you. I use it a lot in my development process for migrating new branches of content from dev to live, but I do know that there are still some issues that need to be sorted out. As I mentioned in the Migrator support thread, it's definitely not abandoned, I just need to find a good chunk of time to address some bugs.

I would definitely recommend testing the migration from one dev site to another dev site first and if that goes as expected it should be safe to migrate to a live site. Also be sure to use the the inbuilt backup option just in case.

Hopefully sometime in the new year this module will get some more love.

  • Like 2
Link to comment
Share on other sites

Thanks for the feedback Adrian. 

I've had some mixed results with the Migrator. It worked for some of our content but it didn't seem to work for Blog Articles i was trying to export.

But I've realised the Blog are complex objects that have a lot of dependencies and I may have not specified them correctly.

The Blog articles i'm using are part of the ProcessBlog module.

All in all it's been positive overall and I think it's probably best for me to stick to simple pages or objects when it comes to the Migrator.

I look forward to the bug fixes and I hope it gets the love and attention it deserves.

  • Like 1
Link to comment
Share on other sites

  • 8 months later...

Hi, I was wondering if there's an alternative ways of doing this other than using Migrator?

Has there been a new module out that allows this? 
Or is it possible via API to get pages from one site, programatically go through each fields and create + save down to a new page on another site?

Link to comment
Share on other sites

i had this problem some days ago and wrote a simple script on both websites. both online on different domains:

export on old website:

<?php 
$export = false;
if($export AND $input->get->export == 'blogitems') {
    echo "<?xml version='1.0' ?>";
    echo '<pages>';

    $parent = $pages->get('/blog');
    $results = $parent->children();
    //$results = $pages->find('id=3198'); // ohne pic
    //$results = $pages->find('id=3204'); // mit pic

    foreach($results as $p):
        $p->of(false);
        ?>
        <page>
            <title><?= $p->title ?></title>
            <date><?= $p->blog_date ?: $p->created ?></date>
            <featured>1</featured>
            <pic><?= $p->main_slider_coverpic->first()->httpUrl ?: '' ?></pic>
            <body><?= $p->body ?></body>
            <images><?php foreach($p->images as $image) {
                echo '<image>' . $image->httpUrl . '</image>';
            } ?></images>
            <files><?php foreach($p->files as $file) {
                echo '<file>' . $file->httpUrl . '</file>';
            } ?></files>
            <gallery><?php foreach($p->gallery as $image) {
                echo '<image>' . $image->httpUrl . '</image>';
            } ?></gallery>
        </page>
    <?php endforeach;
    echo '</pages>';
    die();
}

and then the import:

<?php
$import = false;
if($import AND $input->get->import == 'blogitems') {
    $items = simplexml_load_file('https://www.your-old-website.com/?export=blogitems');

    foreach($items as $page) {
        $p = new Page();
        $p->template = 'blogitem';
        $p->parent = '/news';

        $p->title = $page->title;
        $p->name = $sanitizer->pageNameUTF8($page->title, true);
        while($pages->find('parent='.$p->parent.',name='.$p->name)->count() > 0) $p->name .= '-1';

        $p->date = $page->date;
        $p->featured = $page->featured;

        // get body html and remove root node
        // $p->body = substr($page->body->asXML(), 6, -7);
		// ###### update: it's better to use base64_encode($page->body) in your export and then base64_decode($page->body) in your import #####

        $p->save();

        // add images
        if(strlen($page->pic)) $p->pic->add((string)$page->pic);
        foreach($page->images->image as $image) $p->images->add((string)$image);
        foreach($page->files->file as $file) $p->files->add((string)$file);
        foreach($page->gallery->image as $image) $p->gallery->add((string)$image);

        $p->save();
        echo 'new page <a href="' . $p->editUrl . '" target="_blank">' . $p->path . '</a><br>';
    }
    die();
}

of course that is not bulletproof but it's really simple and you can do whatever you want (export junks by adding start=0, limit=10 or the like to your selector)

you can also try adrians batch child editor. or csv importer. but for me the example above worked like a charm :)

ps: try it with limit=1 for testing ;)

  • Like 6
Link to comment
Share on other sites

a module would be cool. but i think it will be difficult to handle all situations... the freedom of processwire will make it difficult to find some common standard in all installations / setups... i guess it will almost always be faster to use some simple unique script like mine above but i'm happy if you prove me wrong ;)

good luck with your import!

Link to comment
Share on other sites

I agree with you on the difficulty in trying to create something that will handle all scenarios.  Always quicker to build something simple and custom fit.

I think the difficulty lies in how to identify and deal with complex and compound data types, whether it is built by the user or introduced by module(s) the site is using. Is it straight forward to access those types using the API, for example? (I assume Yes but you never know)

 Also, if there are any new types introduced to PW Ecosystem, then it will have to retrospectively be able to handle it so you'll always be catching up.

Probably the correct way to write something generic is to use Reflection and infer capabiltiies and attributes of your target objects, ala how PW does with its API documentation (Btw, how cool is that? Self documenting APIs! Sehr Guile!)

 https://processwire.com/blog/posts/processwire-3.x-api-reference/ . 

  • Like 2
Link to comment
Share on other sites

i think what COULD make sense is some kind of helper module that makes it easy to handle some more advanced features (sometimes necessities) like deviding the import in chunks or providing some helper functions like handling pagename collisions, importing images and so on.

but i don't know if there are better solutions already providing such things like using lister pro or batch child editor, import csv and so on. for me it was just the quickest solution to code it on my own. i had only 168 pages in the blog and 165 persons to import and that went without even increasing execution time :) 

  • Like 1
Link to comment
Share on other sites

I certainly agree with your Helper Module sentiments.

I would actually go one step further and say a framework built on the PW API would be awesome as I sometimes find myself spending a lot of time trying to get certain tasks done, something a Framework could have easily mitigated.

I tried to incorporate your code and I got it working for most part.
I have a problem where I'm trying to populate a TextArea field from the XML file.

$p->content = $xml->content;

No errors but no data populated either. 
I'm guessing I may have to do some sort of casting or conversion?

Has anyone got any experience or ideas on this?

Link to comment
Share on other sites

i would recommend using @adrian s awesome tracy console:

$items = simplexml_load_file('https://www.your-old-website.com/?export=blogitems');
foreach($items as $page) d($page->yourxmlnode->whatsoever);

on the other site you can easily see what's happening like this:

$p = $pages->get(2427); // your-test-page-id
$p->of(false);
$p->eventcancelled = 'your_console_output_from_other_site';
d($p->eventcancelled);

as this field is a checkbox it would modify the '<p>test</p>' to 1 and you would know on which side the problem exists:

2016-09-07 13_41_23-Program Manager.png

but still i have to add that i have no experience with more complex import/export scenarios. for example multilanguage. maybe that's already possible with the other solutions :)

  • Like 1
Link to comment
Share on other sites

<content>
    <p>Liebe Sonne komm gekrochen,</p>
    <p>denn mich friert` s an meine Knochen.</p>
    <p>Liebe Sonne, komm gerennt,</p>
    <p>denn mich friert` s an meine Händ.</p>
</content>

This is the content field I'm trying to update as currently represented in the XML file

Link to comment
Share on other sites

2016-09-07 13_59_38-Edit Page_ Demo-Event • mustangs.do2.baumrock.com.png

PS: see how i did it this with my body field

        // get body html and remove root node
        $p->body = substr($page->body->asXML(), 6, -7);

PPS: i had problems with the umlaut complaining about UTF8 but i had this problem on another spot and think that might be a problem with my setup and not related to this topic or tracy...

Link to comment
Share on other sites

58 minutes ago, FrancisChung said:

I would actually go one step further and say a framework built on the PW API would be awesome as I sometimes find myself spending a lot of time trying to get certain tasks done, something a Framework could have easily mitigated.

I'm curious what you think a framework could migrate, that processwire can't.

  • Like 1
Link to comment
Share on other sites

5 hours ago, LostKobrakai said:

I'm curious what you think a framework could migrate, that processwire can't.

It's not so much migrate or migration but just building upon what the processwire API provides. I really do appreciate the "freedom" the API provides, but I also sometimes wished there was some structure or guidance to follow (i.e. a Framework). You'll have to excuse me coming from a .NET background and still struggling with a lot of PHP's idiosyncrasies and gotchas.

I'm not saying a Framework is useful for every scenarios, but for some scenarios it would be much quicker to get a site going if we could leverage reusable, tested components ala a Framework. I'm sure there are plenty here that have built their own framework(s) over the years. Would be nice to see an official or a collaborated effort.

  • Like 2
Link to comment
Share on other sites

I thought I should post my implementation based on Bernhard's code.

It tries to work out the length of the first tag and modifies the ->asXML parameters accordingly.


The Import Code is as follows :

class ImportFromXML
{

    private $file;

    public function __construct($file)
    {
        $this->file = $file;
    }

    public function Execute()
    {
        $import = true;
        $xmlFile = $this->file;
        if ($import) {
            if (!file_exists($xmlFile))
                exit($xmlFile . ' failed to open');

            $items = simplexml_load_file($xmlFile);

            foreach ($items as $xml) {
                $p = new \Processwire\Page();
                //$p = new \Page();

                $p->of(false);

                $p->template = wire(templates)->get("id=" . $xml->template);
                $p->parent = wire(pages)->get("id=" . $xml->parent);
                $p->title = $xml->title;
                //Struggle to call Sanitizer  $p->name = wire(sanitizer)->pageNameUTF8($xml->title, true);
                $p->name = $xml->title;
                //while ($xmls->find('parent=' . $p->parent . ',name=' . $p->name)->count() > 0) $p->name .= '-1';

                //$p->content = $this->PopulateContent($xml->content);
                $p->content = $this->PopulateMarkup($xml->content);
                //$p->content_intro = $this->PopulateIntro($xml->content_intro);
                $p->content_intro = $this->PopulateMarkup($xml->content_intro);
                $p->author = $xml->author;
                $p->content_path = $xml->content_path;
                $p->seo_title = $xml->seo_title;
                $p->seo_keywords = $xml->seo_keywords;
                $p->seo_description = $xml->seo_description;
                $p->seo_image = $xml->seo_image;
                $p->seo_custom = $xml->seo_custom;
                $p->image = $xml->seo_custom;
                $p->image_alt = $xml->image_alt;
                $p->keywords = $xml->keywords;
                $p->title_nav = $xml->title_nav;
                $p->seo_section_title = $xml->seo_section_title;

                //$p->save();

                // try creating PageArray first ????
                //$cats = new \PageArray();
                $cats = new \Processwire\PageArray();

                foreach ($xml->category->id as $id) {

                    //$cat = wire(pages)->get("name='" . $catname . "', parent='/Categories/'");

                    $pageid = (string) $id;
                    $cat = wire(pages)->get($pageid);

                    if (!IsNullPage($cat))
                        $cats->add($cat);
                        //$p->categories->add($cat);-
                }

                $p->category->import($cats);


                $p->save();
                echo 'new page <a href="' . $p->editUrl . '" target="_blank">' . $p->path . '</a><br>';
            }
            //die();
        }
    }


    private function PopulateMarkup($node)
    {
        $xml = $node->asXML();

        //Check for tags
        if($xml != strip_tags($xml)) {
            //$startTag  = strpos($node,"<");
            $endTag = strpos($xml,">");

            if ($endTag > 0)
                return substr($xml, $endTag+1, -1*($endTag+2));
        }

        return $node;
    }
}

 

And the Export :

 

class ExportToXML  {

    private $pages ;
    
    public function __construct($pages)
    {
        $this->pages=$pages;
    }

    public function Execute()
    {
        $import = true;
        if($import) {

            echo "<?xml version='1.0' ?>";
            echo '<pages>';

            //$parent = $pages->get('/blog');
            $results = $this->pages;
            //$results = $pages->find('id=3198'); // ohne pic
            //$results = $pages->find('id=3204'); // mit pic

            foreach($results as $p):
                $p->of(false);
                ?>
                <page>
                    <name><?= $p->name ?></name>
                    <title><?= $p->title ?></title>
                    <template><?= $p->template->id ?></template>
                    <parent><?= $p->parent ?></parent>
                    <category><?php foreach($p->category as $category) {
                            echo '<id>' . $category->id . '</id>';
                        } ?></category>
                    <content><?= $p->content ?></content>
                    <content_intro><?= $p->content_intro ?></content_intro>
                    <author><?= $p->author ?></author>
                    <content_path><?= $p->content_path ?></content_path>
                    <seo_title><?= $p->seo_title ?></seo_title>
                    <seo_keywords><?= $p->seo_keywords ?></seo_keywords>
                    <seo_description><?= $p->seo_description ?></seo_description>
                    <seo_image><?= $p->seo_image ?></seo_image>
                    <seo_custom><?= $p->seo_custom ?></seo_custom>
                    <seo_canonical><?= $p->seo_canonical ?></seo_canonical>
                    <image><?= $p->image ?></image>
                    <image_alt><?= $p->image_alt ?></image_alt>
                    <keywords><?= $p->keywords ?></keywords>
                    <title_nav><?= $p->title_nav ?></title_nav>
                    <seo_section_title><?= $p->seo_section_title ?></seo_section_title>
                </page>
            <?php endforeach;
            echo '</pages>';
            //die();
            /**
             *  <date><?= $p->created ?></date>
                <featured>1</featured> */
            }
        }
}

 

In particular, I had a field that was a PageArray linking to other Pages.

 

                    <category><?php foreach($p->category as $category) {
                            echo '<id>' . $category->id . '</id>';
                        } ?></category>

 

Hope it helps some one out!

  • Like 4
Link to comment
Share on other sites

  • 1 month later...

just because i needed this again today:

if you are dealing with any sort of tags (HTML data) in your fields, than the easiest solution is to base64_encode($var) your data in the export and then base64_decode($var) it in your import.

i had to import some pages with inline images today and if you know how to do it, that is also quite easy and straigtforward. the problem is, that you have some html like img src="/site/assets/files/12345/your-image.jpg" in your field and the ID will change after the import!

sample export xml - note the tag <pid> holding the old id

echo "<?xml version='1.0' ?>";
echo '<pages>';

// find pages
$results = $pages->find('parent=/your-parent/');
$results->add($pages->find('parent=/something-else/'));

foreach($results as $p):
    $p->of(false);
    ?>
    <page>
        <title><?= $p->get('headline|title') ?></title>
        <date><?= $p->created ?></date>
        <featured>1</featured>
        <pid><?= $p->id ?></pid>
        <pic><?= $p->coverpic->first()->httpUrl ?: '' ?></pic>
        <body><?= base64_encode($p->body) ?></body>
        <images><?php foreach($p->images as $image) {
            echo '<image>' . $image->httpUrl . '</image>';
        } ?></images>
        <files><?php foreach($p->attachments as $file) {
            echo '<file>' . $file->httpUrl . '</file>';
        } ?></files>
        <gallery><?php foreach($p->gallery as $image) {
            echo '<image>' . $image->httpUrl . '</image>';
        } ?></gallery>
    </page>
<?php endforeach;
echo '</pages>';
die();

and then the import:

$items = simplexml_load_file('your-url-of-export-data');

foreach($items as $page) {
    $p = new Page();
    $p->template = 'blogitem';
    $p->parent = '/news';

    $p->title = $page->title;
    $p->name = $sanitizer->pageName($page->title, true);
    while($pages->find('parent='.$p->parent.',name='.$p->name)->count() > 0) $p->name .= '-';

    $p->date = $page->date;
    $p->featured = $page->featured;

    // get body html and remove root node
    $p->body = base64_decode($page->body);
    $p->save();

    // change images in body field
    $re = '/src="\/site\/assets\/files\/' . $page->pid . '\//';
    $p->body = preg_replace($re, 'src="/site/assets/files/' . $p->id . '/', $p->body);
    $p->save();

    // add images
    if(strlen($page->pic)) $p->pic->add((string)$page->pic);
    foreach($page->images->image as $image) $p->images->add((string)$image);
    foreach($page->files->file as $file) $p->files->add((string)$file);
    foreach($page->gallery->image as $image) $p->gallery->add((string)$image);

    $p->save();
    echo 'new page <a href="' . $p->editUrl . '" target="_blank">' . $p->path . '</a><br>';
}
die();

just set your ckeditor field settings porperly before your import and all images will be recreated on your new site! :)

2016-11-01 22_00_57-Edit Field_ body • svt.dev.png

remark: this will only replace images from the same page and not any images that are linked from a different page with different page-id. that would need some extra mapping of old-id --> new-id

  • Like 6
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...