a-ok Posted September 30, 2019 Share Posted September 30, 2019 This is slightly related to PW but also in need of some advice/help. I stupidly accidentally managed to rm -rf the LIVE assets folder on a site that has over 2 years worth of data. Let's not go there. I had a cronjob set up every night to back everything up (and keep the latest 7) but the .tar.gz files all error out with truncated tar archive and unexpected end of file I've managed to export off the image fields from the database as .csvs and my plan would be to combine them all into one .csv where you have two columns (ID and image filename). I then have a large unorganised dump of ALL the images but they: Haven't had their filenames sanitised by the PW uploading and Are all in one directory and thus not attached to any page ID My question is... how likely do you think it would be to use PW's API to: Sanitise all the filenames in a directory (and if anyone knows what type of sanitising images use) and Execute some sort of query (I'm on OSX) that would match the filename in the CSV with the filename in a directory (the image dump) and move the image to that folder with the same ID (this is a long shot) Any advice is appreciated. Link to comment Share on other sites More sharing options...
Robin S Posted October 1, 2019 Share Posted October 1, 2019 First thing would be check with your host to see if they have a backup of the file system. If they can't help you then I suggest... 1. Copy the site to your local machine. Everything else that follows you do locally - don't do anything else on the live site because you only risk losing more data. 2. Create clones of all your image fields (Add New Field > Type > Clone existing field...). Add them to the same templates that your original fields are on. This is so that when you add the new images you keep them separate from the existing field values. 3. Create another new image field that will hold all the unorganised images. Create a new template and add the image field to the template. Create a new page using the template - we'll title this "Unorganised Images". 4. Add all the unorganised images to the new page. Personally I would do this using the API together with glob or DirectoryIterator but you might be able to do it via the admin if you allow a lot of memory and a set a long max execution time. This step will sanitize the image filenames. 5. Create a DB backup now in case you make some errors in the next step and need to recover to this state. 6. Use an API script (using Admin Actions, in the Tracy console, or just in a template file) to loop over all pages that have image fields (apart from your page with the unorganised images), and loop over all the images in those image fields. For each image in any of your original image fields, look for an image of the same name in your Unorganised Images page. If you find a match then add the image to the clone of that image field. You might want to log the page title / image field name / image filename for any images that can't be matched. 7. Check the results. When you've done the best you can then you can delete the original image fields, rename the cloned image fields according to the original field names, delete the page/template/field used in Unorganised Images, and redeploy to the live site. Link to comment Share on other sites More sharing options...
Robin S Posted October 1, 2019 Share Posted October 1, 2019 (edited) As an alternative to step 2 and adding the matched images to cloned fields, in step 6 you could copy the matched files to the page's folder in /site/assets/files/. That's probably the easier approach. Edit: here's some API code you could execute in the Tracy console... // Find all image fields $image_fields = $fields->find("type=FieldtypeImage"); // Find all templates that contain those image fields $tpls = new TemplatesArray(); foreach($image_fields as $image_field) { /* @var Field $image_field */ $tpls->add($image_field->getTemplates()); } // The page that contains the unorganised images - update ID to suit $unorganised = $pages(1287); // Find all pages with missing images $pages_with_missing_images = $pages->find("template=$tpls, id!=$unorganised"); // Loop over those pages foreach($pages_with_missing_images as $p) { // Get the directory for the page $page_dir = $p->filesManager()->path; // Loop over the image fields foreach($image_fields as $image_field) { // Continue if the page doesn't contain the image field if(!$p->fields->has($image_field)) continue; // Get images as Pageimages object regardless of formatted value foreach($p->getUnformatted($image_field->name) as $image) { // Look for a matching image among the unorganised images $match = $unorganised->images->get("name={$image->basename}"); if($match) { // Copy to the page directory if a match is found $files->copy($match->filename, $page_dir); } else { // Display (or log) a message about unmatched images d("No match found for image '{$image->basename}' in field '{$image_field->name}' on page '{$p->title}'"); } } } } Edited October 1, 2019 by Robin S Added some API code Link to comment Share on other sites More sharing options...
a-ok Posted October 1, 2019 Author Share Posted October 1, 2019 Thanks so much, @Robin S Using Automator for Mac I'm going to open a finder file (the .csv) then run a bash script that would create a folder for each ID and move the corresponding image to that folder. A folder can have multiple images. As the DB is all intact it’s only the image links that are missing so this, in theory, should work. I just need to run something on all the images to sanitize them as PW did when the user uploaded them. Would you say using the API, as you’ve kindly shared, is a better approach and more effective and fool proof? My bash script: cd "${1%/*}" while read line do FolderName=${line%,*} ImageName=${line#*,} mkdir "$FolderName" mv "$ImageName" "$FolderName" done < "$1" Link to comment Share on other sites More sharing options...
bernhard Posted October 1, 2019 Share Posted October 1, 2019 I'd create a PHP script where you bootstrap PW and execute it from the command line: You have the same PW version that your live system runs on and therefore you have the same sanitizer methods. You can do a dry run and only echo the file name conversions before you really copy files over. And you can watch everything in realtime. 7 hours ago, Robin S said: First thing would be check with your host to see if they have a backup of the file system. If they can't help you then I suggest... 1. Copy the site to your local machine. Everything else that follows you do locally - don't do anything else on the live site because you only risk losing more data. +1 Link to comment Share on other sites More sharing options...
a-ok Posted October 1, 2019 Author Share Posted October 1, 2019 Thanks everyone. Do you think it's possible to filename sanitize a folder of images on my local? I'm guessing that's what @Robin S meant when he said: 10 hours ago, Robin S said: 4. Add all the unorganised images to the new page. Personally I would do this using the API together with glob or DirectoryIterator but you might be able to do it via the admin if you allow a lot of memory and a set a long max execution time. This step will sanitize the image filenames. I'll look into it! Link to comment Share on other sites More sharing options...
a-ok Posted October 1, 2019 Author Share Posted October 1, 2019 Does anyone know what sanitising the images get? I thought `$sanitizer→filename` but then it doesn't lowercase them? Link to comment Share on other sites More sharing options...
Robin S Posted October 1, 2019 Share Posted October 1, 2019 3 hours ago, a-ok said: Does anyone know what sanitising the images get? The method that sets the filename is here: https://github.com/processwire/processwire/blob/655c4cdd245fa4990d010c06ccfcfd37a04c0fde/wire/core/Pagefiles.php#L554-L599 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now