Jump to content

Assets folder data loss solution


a-ok
 Share

Recommended Posts

This is slightly related to PW but also in need of some advice/help.

I stupidly accidentally managed to rm -rf the LIVE assets folder on a site that has over 2 years worth of data. Let's not go there. I had a cronjob set up every night to back everything up (and keep the latest 7) but the .tar.gz files all error out with truncated tar archive and unexpected end of file

I've managed to export off the image fields from the database as .csvs and my plan would be to combine them all into one .csv where you have two columns (ID and image filename). I then have a large unorganised dump of ALL the images but they:

  1. Haven't had their filenames sanitised by the PW uploading and
  2. Are all in one directory and thus not attached to any page ID

My question is... how likely do you think it would be to use PW's API to:

  1. Sanitise all the filenames in a directory (and if anyone knows what type of sanitising images use) and
  2. Execute some sort of query (I'm on OSX) that would match the filename in the CSV with the filename in a directory (the image dump) and move the image to that folder with the same ID (this is a long shot)

Any advice is appreciated.

Link to comment
Share on other sites

First thing would be check with your host to see if they have a backup of the file system.

If they can't help you then I suggest...

1. Copy the site to your local machine. Everything else that follows you do locally - don't do anything else on the live site because you only risk losing more data.

2. Create clones of all your image fields (Add New Field > Type > Clone existing field...). Add them to the same templates that your original fields are on. This is so that when you add the new images you keep them separate from the existing field values.

3. Create another new image field that will hold all the unorganised images. Create a new template and add the image field to the template. Create a new page using the template - we'll title this "Unorganised Images".

4. Add all the unorganised images to the new page. Personally I would do this using the API together with glob or DirectoryIterator but you might be able to do it via the admin if you allow a lot of memory and a set a long max execution time. This step will sanitize the image filenames.

5. Create a DB backup now in case you make some errors in the next step and need to recover to this state.

6. Use an API script (using Admin Actions, in the Tracy console, or just in a template file) to loop over all pages that have image fields (apart from your page with the unorganised images), and loop over all the images in those image fields. For each image in any of your original image fields, look for an image of the same name in your Unorganised Images page. If you find a match then add the image to the clone of that image field. You might want to log the page title / image field name / image filename for any images that can't be matched.

7. Check the results. When you've done the best you can then you can delete the original image fields, rename the cloned image fields according to the original field names, delete the page/template/field used in Unorganised Images, and redeploy to the live site.

Link to comment
Share on other sites

As an alternative to step 2 and adding the matched images to cloned fields, in step 6 you could copy the matched files to the page's folder in /site/assets/files/. That's probably the easier approach.

Edit: here's some API code you could execute in the Tracy console...

// Find all image fields
$image_fields = $fields->find("type=FieldtypeImage");

// Find all templates that contain those image fields
$tpls = new TemplatesArray();
foreach($image_fields as $image_field) {
	/* @var Field $image_field */
	$tpls->add($image_field->getTemplates());
}

// The page that contains the unorganised images - update ID to suit
$unorganised = $pages(1287);

// Find all pages with missing images
$pages_with_missing_images = $pages->find("template=$tpls, id!=$unorganised");

// Loop over those pages
foreach($pages_with_missing_images as $p) {
	// Get the directory for the page
	$page_dir = $p->filesManager()->path;
	// Loop over the image fields
	foreach($image_fields as $image_field) {
		// Continue if the page doesn't contain the image field
		if(!$p->fields->has($image_field)) continue;
		// Get images as Pageimages object regardless of formatted value
		foreach($p->getUnformatted($image_field->name) as $image) {
			// Look for a matching image among the unorganised images
			$match = $unorganised->images->get("name={$image->basename}");
			if($match) {
				// Copy to the page directory if a match is found
				$files->copy($match->filename, $page_dir);
			} else {
				// Display (or log) a message about unmatched images
				d("No match found for image '{$image->basename}' in field '{$image_field->name}' on page '{$p->title}'");
			}
		}
	}
}

 

Edited by Robin S
Added some API code
Link to comment
Share on other sites

Thanks so much, @Robin S

Using Automator for Mac I'm going to open a finder file (the .csv) then run a bash script that would create a folder for each ID and move the corresponding image to that folder. A folder can have multiple images.

As the DB is all intact it’s only the image links that are missing so this, in theory, should work. I just need to run something on all the images to sanitize them as PW did when the user uploaded them.

Would you say using the API, as you’ve kindly shared, is a better approach and more effective and fool proof?

My bash script:

cd "${1%/*}"

while read line         

do         

     FolderName=${line%,*}

     ImageName=${line#*,}

     mkdir "$FolderName"

     mv "$ImageName" "$FolderName"

done < "$1"
Link to comment
Share on other sites

I'd create a PHP script where you bootstrap PW and execute it from the command line:

You have the same PW version that your live system runs on and therefore you have the same sanitizer methods. You can do a dry run and only echo the file name conversions before you really copy files over. And you can watch everything in realtime.

 

7 hours ago, Robin S said:

First thing would be check with your host to see if they have a backup of the file system.

If they can't help you then I suggest...

1. Copy the site to your local machine. Everything else that follows you do locally - don't do anything else on the live site because you only risk losing more data.

+1

Link to comment
Share on other sites

Thanks everyone.

Do you think it's possible to filename sanitize a folder of images on my local? I'm guessing that's what @Robin S meant when he said:

10 hours ago, Robin S said:

4. Add all the unorganised images to the new page. Personally I would do this using the API together with glob or DirectoryIterator but you might be able to do it via the admin if you allow a lot of memory and a set a long max execution time. This step will sanitize the image filenames.

I'll look into it!

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...