Peter Falkenberg Brown Posted May 20, 2020 Share Posted May 20, 2020 ==> Edit: May 23, 2020 - I ended up creating a small script to fulfill these functions, and posted it below. It may not be the best solution, but it worked for my narrow use-case. If anyone has tips and recommendations, I'm delighted to learn new and better ways to do things. ? ============================================== Hi Folks, I have a project in which I want to copy sections of pages (i.e. articles) from one PW domain to a 2nd PW domain in the same server, using a PW PHP shell script. I'm not copying the entire site: just selected groups of pages. I have root access, and have successfully connected to both PW domains using the multi-instance method. Copying all of the page fields of an article to a new page in the target section (differently named) is no problem at all. But then I thought about images and files.... The new article will have a different page id, so the files and images will be in a different subdir. It's easy enough in the script to use the linux copy command to copy the files, and it's easy to change the ID in the img src code in the body text. But of course I have to actually attach all of those images to each new article, so that they're available in the PW editor. I also want to bring in all of the variations of sizes. In one article, I have a total of almost 200 image files in the ./files/ID subdir. I've looked at PageImages: https://processwire.com/api/ref/pageimages/ and the corresponding page for files. Do you think that using those two classes are the best methods to copy the images from the old article to the new one? And will doing something like: $pageimages = $pageimages->add($item); actually *copy* the image from the source directory to the new directory? Edit: Oh, one more thing. The pages use identical field templates in both PW instances, so if there was a "Copy whole page to new Instance" kind of function, that included all of the fields, and all of the settings, that would be great. I haven't found that in my searching around, but then again I might have missed it. The only difference is that in the new instance / domain, the pages will live under a different parent url (section), e.g. $old_section_parent = '/columns/'; $new_section_parent = '/writings/essays/'; ... and the page ID of course, which influences the images and files subdir name. Thanks! Peter Link to comment Share on other sites More sharing options...
LuisM Posted May 20, 2020 Share Posted May 20, 2020 Hi @Peter Falkenberg Brown sounds like you should have a look at this: https://processwire.com/blog/posts/multi-instance-pw3/ and hook into your Pages::saveReady method on Site1 to move your partials to Site2. What comes to mind when trying to copy your files/images via CLI is the mentioned problem with IDs. I think you should avoid manually moving the files and let do PW the handling. Im in a hurry right now. But what about a "copyPage" function inside the hook, which calls your 2nd PW Instance and creates the new Page in Instance2 with the Values of Instance1? Link to comment Share on other sites More sharing options...
kongondo Posted May 20, 2020 Share Posted May 20, 2020 (edited) For these kind of things, I use the (some features still experimental) module Pages Export/Import, in-built field export and in-built template export. Pages Export/Import works a treat in most cases but you might need to experiment with it a bit first. Sometimes stuff fail in multilingual sites. If you are exporting too many pages and their assets [images, files] (the zip option) at a time, you could run out of memory. In this case, I export in batches. For non-asset exports (JSON option) it rarely fails. The module will allow you to export from any part of the tree, recursively if you want. It will also allow you to import to any part of the tree optionally updating or skipping existing identical pages, etc. This core module is not installed by default; you'll have to install it manually. If you decide to go this route, please make sure you experiment on a dev/local site first. Also backup your databases on the remote sites before starting, but I am sure you know this already ?. Edited May 20, 2020 by kongondo Link to comment Share on other sites More sharing options...
adrian Posted May 20, 2020 Share Posted May 20, 2020 @Peter Falkenberg Brown - I still use https://github.com/adrianbj/ProcessMigrator for things like this - it works a treat with images (including updating paths to RTE embedded images). It's not regularly maintained and some users have had issues over the years but I still find it much more reliable than the PW core export/import tools. Link to comment Share on other sites More sharing options...
Peter Falkenberg Brown Posted May 20, 2020 Author Share Posted May 20, 2020 Thank you, Gentlemen... I'll look at those modules. As a point of knowledge, is there a raw function or steps in coding that would grab all of the images in one page, including the variations and then if one has created a new page in the second instance and saved it, a function or steps in coding to save the previously grabbed objects / variables, etc, to the new page, while copying the images from the old location to the new one? Given that the modules above exist, I would assume so. For example, if you get a title value from the old page, you then set the new page's field to that value. It's simple and logical. Are there corresponding code steps to do the same for images and all of their related variations and meta data (like descriptions), etc? It seems like https://processwire.com/api/ref/pageimages/ might include those "code steps," but I'm not sure. I want to dig in and learn the manual method, if I can. Thanks! Peter Link to comment Share on other sites More sharing options...
Peter Falkenberg Brown Posted May 21, 2020 Author Share Posted May 21, 2020 ==> Edit: May 23, 2020 - see second version of this script, posted below on 5/23. I cleaned it up and added the function to copy non-image files. ======================= Hi All, I dug in, testing my code against one page that had two images with 2 variations and descriptions. I got to the point where my code successfully copied that page to a different PW instance; set a few data fields (not all of them); added the images and descriptions; and copied the variation image files (and chowned them to the new domain user, since I ran the script as root). It also replaced the files/ID value in the body text. It seems to work, after multiple tests. I've pasted the code below. => My questions now are: * Assuming I'm coding it by hand, rather than using a 3rd party module: 1. Is there an easier way to do it? 2. Do any errors or tasks-not-done pop out at anyone? 3. Does my code look correct? Thanks! Peter (see code below) -- I deleted this version. See code further down in thread. Link to comment Share on other sites More sharing options...
Peter Falkenberg Brown Posted May 21, 2020 Author Share Posted May 21, 2020 Note that the code above is a rough first draft, and doesn't include code to copy files, like PDFs. I've already started to change the structure of the file to accommodate that extra step, etc. The code above is just meant as a proof of concept, to see if I'm going in the right direction. Feedback from PW gurus is deeply appreciated. (Or anyone else, too!) Peter Link to comment Share on other sites More sharing options...
adrian Posted May 21, 2020 Share Posted May 21, 2020 @Peter Falkenberg Brown - sorry, I don't understand why you are trying to reinvent the wheel here - there are a LOT of problems to solve and Migrator works for this already. 1 Link to comment Share on other sites More sharing options...
Peter Falkenberg Brown Posted May 21, 2020 Author Share Posted May 21, 2020 Hi @adrian, I don't think I was really trying to reinvent the wheel. I was honed in on a more specific, "narrow-band" problem of the technical requirements to copy images in pages into a replicated PW page. I wanted to pursue that partly to understand what it took to do that one task from a PW to PW website, but also for other migrations, from other CMS's, into a PW install. In other words, I'm seeking to increase my knowledge of those image-related steps, rather than just use a plug-in module. I'll look at Migrator too. When you mention a lot of problems to solve, are you talking about the image copy task, or something else? Thanks, Peter Link to comment Share on other sites More sharing options...
Peter Falkenberg Brown Posted May 21, 2020 Author Share Posted May 21, 2020 Hi @adrian, Yes, I can see that your Migrator module looks really fantastic. I'll take a look at it. I still want to deepen my knowledge of the image steps, however. I did notice that your list of migrated items includes the cropped images, which I did not address, partly because I didn't have any cropped images. Thanks again! Peter Link to comment Share on other sites More sharing options...
Peter Falkenberg Brown Posted May 21, 2020 Author Share Posted May 21, 2020 Hi @adrian, I installed your excellent Migrator module, and saved a zip file for the one page that I've been testing. After inspecting the zip file, I noted that you do indeed bring over the template file and the field structure. In my case, my destination install has identical fields already, but in some cases the the fields or template files might conflict, so overwriting would be a problem. I didn't see a selection to only migrate content, and not fields and templates. It seems to me that your module is perfect for a certain use case, but not necessarily for mine, which only involves migrating content. Unless I missed something. ? Peter Link to comment Share on other sites More sharing options...
Peter Falkenberg Brown Posted May 21, 2020 Author Share Posted May 21, 2020 Hi @kongondo, You wrote: Quote For these kind of things, I use the (some features still experimental) module Pages Export/Import, in-built field export and in-built template export. I looked for that in my Modules setup and also on the PW modules page, and couldn't find it. Do you have a link to it? Thanks, Peter Link to comment Share on other sites More sharing options...
Ivan Gretsky Posted May 21, 2020 Share Posted May 21, 2020 It is described here. 1 Link to comment Share on other sites More sharing options...
kongondo Posted May 21, 2020 Share Posted May 21, 2020 Hi Peter, A couple of things: 18 hours ago, Peter Falkenberg Brown said: $image_dir = '/site/assets/files'; $new_domain_file_dir = '/home/PRIMARY/public_html/site/assets/files/'; $old_domain_file_dir = '/home/SECONDARY/public_html/site/assets/files/'; $old_domain_path = '/home/SECONDARY/public_html'; Having bootstrapped ProcessWire, why not just use it to give you access to paths? That will make your code more portable and consistent $image_dir = $config->paths->files; $old_domain_file_dir = $SECONDARY->config->paths->files // etc This: 18 hours ago, Peter Falkenberg Brown said: $old_results = $SECONDARY->wire->pages->find("id=SOME_ID"); If getting one, you might just as well use a get, no? // @note: no need for wire as well $old_results = $SECONDARY->pages->get("id=SOME_ID"); 1 Link to comment Share on other sites More sharing options...
kongondo Posted May 21, 2020 Share Posted May 21, 2020 13 minutes ago, Peter Falkenberg Brown said: I looked for that in my Modules setup and also on the PW modules page, and couldn't find it. Do you have a link to it? It should be under 'not installed' core modules. (modules/install/ Pages Export/Import 1 Link to comment Share on other sites More sharing options...
Peter Falkenberg Brown Posted May 21, 2020 Author Share Posted May 21, 2020 Thanks, @Ivan Gretsky and @kongondo, My old site is on PW 3.0.42, whereas the module is in 3.0.72. My new site is on 3.0.148. Looks like I'll have to upgrade. Peter 1 Link to comment Share on other sites More sharing options...
Peter Falkenberg Brown Posted May 21, 2020 Author Share Posted May 21, 2020 Hi @kongondo, Quote $image_dir = $config->paths->files; $old_domain_file_dir = $SECONDARY->config->paths->files // etc Great suggestion! So much better. On your comment about the find v. get -- you are correct, but that was just a test bit of code. My real export will have multiple pages via a find. Thanks VERY much for your help! Peter Link to comment Share on other sites More sharing options...
adrian Posted May 21, 2020 Share Posted May 21, 2020 43 minutes ago, Peter Falkenberg Brown said: but in some cases the the fields or template files might conflict, so overwriting would be a problem. Check the import settings where is says "APPEND will not change settings of existing fields, nor the content of existing pages." Link to comment Share on other sites More sharing options...
adrian Posted May 21, 2020 Share Posted May 21, 2020 35 minutes ago, Ivan Gretsky said: It is described here. I actually found that quite buggy still and IIRC it doesn't handle RTE embedded images (ie it doesn't rewrite their paths to the new location) like Migrator does. Not saying Migrator is perfect by any means, but the core page export/import was pretty broken in my experience. 1 Link to comment Share on other sites More sharing options...
Peter Falkenberg Brown Posted May 21, 2020 Author Share Posted May 21, 2020 1 hour ago, adrian said: Check the import settings where is says "APPEND will not change settings of existing fields, nor the content of existing pages." Thanks @adrian! I'll give it a whirl. Best regards, Peter Link to comment Share on other sites More sharing options...
Peter Falkenberg Brown Posted May 21, 2020 Author Share Posted May 21, 2020 Hi @adrian, I ran the exporter and importer on 1 page, and had the following issues: (I used Append as you suggested.) - The template file was still overwritten - the parent page was imported, probably because the parent page didn't exist in the target site (by design). (I only wanted to migrate the pages under one section to a new section in the target site.) - the images and pdf's were in the zip file but they did not upload to the files/ID dir - the links in the body text still had the old id in the source. - and when I selected "Edit Imported Content" it failed and produced an error - "Missing required ZIP or JSON Source" (when I didn't select that, the second time, it worked, but with the issues above) Yours, Peter Link to comment Share on other sites More sharing options...
Peter Falkenberg Brown Posted May 24, 2020 Author Share Posted May 24, 2020 Hi All, I've brought my wee script forward to the point of functionality for my use, minus a couple of things I'll be adding, like a routine to create a list of 301 redirects that I'll be pasting into my .htaccess file on the "old" domain. I'm also going to write a separate small script to click through pages to migrate that will create a list of href's in each page's body text, so that I can see if they have to be edited. In my case, and my experience, href's can have so many variations, including links to external pages, links to internal PW pages that have been moved *internally*, and links to PW pages that have moved to an external site (e.g. the site being migrated to, using a different url). So I'm not trying to incorporate href modification into the script at this time. I realize that Adrian @adrian has created a very complex and thorough blackbox module for migrations. My script is in no way a replacement for his *much more* sophisticated module. As I mentioned, and as I've noted in the script, I wrote this for a very narrow use-case that may only be useful to me. No idea, actually. I'm eager to dig deeper and deeper into ProcessWire, and move from my own procedural code method into object-oriented code. Lot's of catching up to do, since for the last 13 years I've been managing servers, etc, and didn't code all day, every day. But my oh my, I LOVE ProcessWire. What a joy to use. ? EDIT: May 26, 2020: I've added the routines listed above to v1.1, pasted in a new response below, dated May 26. I also fixed an error with the chown action. See post further down, for the script. Peter 2 Link to comment Share on other sites More sharing options...
szabesz Posted May 24, 2020 Share Posted May 24, 2020 Hello @Peter Falkenberg Brown, Thanks for sharing your script, I appreciate that. Still, they are kinda long so could you please "hide" them in a spoiler (by using the eye icon in the toolbar). That would make this thread readable and the browser responsive. Link to comment Share on other sites More sharing options...
Peter Falkenberg Brown Posted May 24, 2020 Author Share Posted May 24, 2020 Hi @szabesz, Good idea... I deleted the 1st version of the script and put the second one in a spoiler. I couldn't figure out how to delete an empty spoiler or code block. Do you know how? Edit: I figured it out: just go below the empty box and hit the backspace key until it's gone. ? Thanks again, Peter 1 Link to comment Share on other sites More sharing options...
Peter Falkenberg Brown Posted May 27, 2020 Author Share Posted May 27, 2020 Hi All, I've added routines to the script to create a file of hrefs in the body field, and also a file of 301 redirects for an .htaccess file. There are so many edge cases with internal hrefs that I decided to edit them by hand, based on the list created. (For example, an href that links to a second internal PW page on the target domain that PW's url system on the source domain doesn't know about.) I also moved the chown command to include pages that didn't have images, because the file dir has to be owned by the domain and running the script as root ends up making the file dir owned by root. Again, I realize that this script may only serve a very small group of people who need to do exactly what I needed it for: to migrate many sections of pages from one PW domain to another PW domain that had a different section structure. I've successfully and quickly moved many sections and pages as of now, so I'm confident that the script is doing what I needed. If someone else finds it useful, that's grand. ? Here it is, below. Click the "reveal hidden contents" icon below to see it. Peter Spoiler #!/usr/bin/php -q <?php namespace ProcessWire; ################################################################################################### /* copy.section.php a script for ProcessWire installations, to migrate sections of pages. Version 1.1 - no license, no guarantees, use at own risk. I recommend running this script with the 'tee' command so that you can review any potential errors, i.e.: ./copy.section.php | tee output.txt peterbrown@worldcommunity.com - May 26, 2020 https://datavarius.com ################################################################################################# NOTE: @adrian has written a very nice module for migrating sections of sites, here: https://github.com/adrianbj/ProcessMigrator NOTE: I wrote this script to specifically copy a set of child pages under a section in one domain over to a new domain, for a narrow use-case that will not fit everyone's needs. It is undoubtedly missing complex solutions. It is not even close to the sophistication of Adrian's module. My second motivation to write it was that it was a learning exercise. => This is a "quick and dirty script," i.e. it's written with procedural code to simply get the job done quickly and correctly. => This script was written for sections of pages that share the same field template. Of course it could be modified for other scenarios. => This script also creates two files: one for hrefs in the body field and a file of 301 redirects for an .htaccess file. => This should be run as root, from the shell prompt. For multi-instance notes, see article at: https://processwire.com/blog/posts/multi-instance-pw3/ */ ################################################################################################### # SET UP # NOTE: please read through the script before using. # This was custom written for MY use, and you'll have to change some things, e.g. # you'll need to add your page fields below in the section "add fields here." use DomDocument; $now = time(); $date_time = date('Y-m-d-H-i-s'); #....................................................................... # primary vars to change # primary site (site to import TO) $site1_root_index_file = '/home/USER/public_html/index.php'; # secondary site (site to import FROM) $site2_web_doc_root = '/home/USER/public_html/'; $site2_full_domain = 'http://DOMAIN.com/'; # this is for chowning the new file dir with its files $site1_group_user = 'USER.'; # used for 301 directives $site1_domain = 'https://DOMAIN.com'; $site1_section_parent = '/URL/URL/'; $site2_section_parent = '/URL/URL/'; # set this to 'yes' to run through list and make href and redirects files and then stop # without making new pages $stop_after_rewrites_hrefs = 'yes'; #....................................................................... # secondary vars to inspect / change $site2_query_string = "parent=$site2_section_parent, publish_date<=$now, sort=publish_date"; $page_template = 'TEMPLATE'; $new_page_status = 'published'; $error_msg = ''; $hrefs_file = './hrefs_' . $date_time . '.txt'; $hrefs_fh = fopen($hrefs_file, "w") or die("Unable to open file."); $redirects_file = './redirects_' . $date_time . '.txt'; $redirects_fh = fopen($redirects_file, "w") or die("Unable to open file."); # END OF SET UP ################################################################################################### # connect to primary site include("$site1_root_index_file"); # connect to secondary site $site2 = new ProcessWire($site2_web_doc_root, $site2_full_domain); #.............................................................................. # extra variable set up $image_dir = $config->urls->files; $site1_file_dir = $config->paths->files; $site2_file_dir = $site2->config->paths->files; $site2_web_doc_root = rtrim($site2_web_doc_root, '/'); $has_images_files = 'no'; #.............................................................................. # start of script print "\nselecting SITE2 pages\n\n"; $site2_results = $site2->wire->pages->find($site2_query_string); foreach( $site2_results as $result ) { $has_images_files = 'no'; $site2_page_id = $result->id; $site2_page_name = $result->name; $site2_url = $result->url; $site2_headline = $result->headline; $body = $result->body; #........................................................ # get rewrite rule $site2_url_trimmed = ltrim($site2_url, '/'); $rewrite_rule = "RewriteRule ^$site2_url_trimmed?$ $site1_domain" . "$site1_section_parent" . "$site2_page_name/ [L,NC,R=301]\n"; fwrite($redirects_fh, $rewrite_rule); print "\n\n$rewrite_rule\n"; #........................................................ # get hrefs $dom = new DOMDocument(); $dom->loadHTML($body); $tags = $dom->getElementsByTagName('a'); $out = ''; foreach ($tags as $tag) { $href = $tag->getAttribute("href"); if (strpos( $href, '/site/assets/files/' ) !== false) { continue; } $out .= 'HREF: ' . $href . "\n"; $out .= "==================================================================\n\n"; } if ( $out != '' ) { $out = "\n\n$site2_headline\n" . $out; fwrite($hrefs_fh, $out); print $out; } #........................................................ if ( $stop_after_rewrites_hrefs == 'yes' ) { continue; } #........................................................ # add new page to site1 print "\n SITE2 ID: $site2_page_id SITE2 PAGE NAME: $site2_page_name SITE2 URL: $site2_url SITE2 HEADLINE: $site2_headline ======================================= \n"; $site1_page = new Page(); $site1_page->template = $wire->templates->get("$page_template"); # check for duplicate name $check_name_dupe = $wire->pages->get( "parent=$site1_section_parent,name=$site2_page_name" ); if ( $check_name_dupe->id ) { $error_msg .= " Problem Creating Page. $site2_page_name exists in section: $site1_section_parent ============================================================== "; continue; } # save site1 page #........................................................ $site1_page->of(false); $site1_page->parent = $wire->pages->get("$site1_section_parent"); $site1_page->name = $site2_page_name; $site1_page->status("$new_page_status"); $site1_page->save(); #........................................................ $site1_page_id = $site1_page->id; #........................................................ # add other fields # I thought about coding something more complex # that would be blind to fields and process all field types # as they should be processed, but it was quicker and simpler # to just list the fields with the code below and then copy in the # fields that I actually wanted to migrate. # note that my template doesn't have complex fields like repeaters, etc, # so I didn't need to do anything that special except with images and files. # (I placed this code at the top and ran it with an exit command.) # $template = $templates->get("$page_template"); # foreach ( $template->fields as $f ) # { # print "$f->name - $f->type\n"; # } # exit; #........................................................ # add YOUR fields here: $site1_page->title = $result->title; $site1_page->headline = $result->headline; # ... $site1_page->save(); #......................................... # add images if ( count($result->images) ) { # the page has one or more images $has_images_files = 'yes'; foreach( $result->images as $image ) { $image_name = $image->name; $image_description = $image->description; $image_url = $image->url; echo "$image_name\n"; echo "$image_url\n"; echo "$image_description\n"; $image_path_and_file = $site2_web_doc_root . $image_url; $site1_page->images->add("$image_path_and_file"); # now set image description $site1_page->images->$image_name->description = $image_description; } $site1_page->save(); } #......................................... # add files if ( count($result->files) ) { # the page has one or more files $has_images_files = 'yes'; foreach( $result->files as $file ) { $file_name = $file->name; $file_description = $file->description; $file_url = $file->url; echo "$file_name\n"; echo "$file_url\n"; echo "$file_description\n"; $file_path_and_file = $site2_web_doc_root . $file_url; $site1_page->files->add("$file_path_and_file"); # now set file description $site1_page->files->$file_name->description = $file_description; } $site1_page->save(); } #............................................. # prep vars $site1_page_image_dir = $site1_file_dir . $site1_page_id; $site2_page_image_dir = $site2_file_dir . $site2_page_id; if ( $has_images_files == 'yes' ) { # replace files id with site1 id in body field # /site/assets/files/NUMBER/ $site2_body_id_string = $image_dir . "$site2_page_id/"; $site1_body_id_string = $image_dir . "$site1_page_id/"; $body = str_replace($site2_body_id_string, $site1_body_id_string, $body); $site1_page->body = $body; $site1_page->save(); # copy any image variation files and chown site1 dir/files to site1 owner echo "\nCopying image variation files.\n"; passthru("/bin/cp -v -n $site2_page_image_dir/* $site1_page_image_dir"); } else { $site1_page->body = $body; $site1_page->save(); } # we chown file dirs even if there are no images, to set the dir to the user echo "\nchowning $site1_page_image_dir to $site1_group_user\n"; passthru("/bin/chown -v -R $site1_group_user $site1_page_image_dir"); $site1_page->of(true); } print "\n\n============================================\n\n"; print $error_msg; print "\n\n============================================\n\n"; fclose($hrefs_fh); fclose($redirects_fh); print "Done!\n"; exit; ################################################################################################### 1 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now