Hey folks, so in a project I'm working on there are 750+ news articles in WP that are image-heavy in recent years. WP has a habit of storing the original images full-size - well, ProcessWire does too but I always set max dimensions on the image fields to resize during upload to prevent 2mb+ images on my disk. This had resulted in an uploads folder some 25GB in size from 2003-2022.
Disk space is, fortunately, cheap nowadays, however I soon realized after mulling it over with Ryan that the easiest and most sane option for importing all those articles was to scrape in the post content HTML into a PW CKEditor field (thanks SimpleHTMLDOM for making this so simple!) but leave the images alone - WP had already done resizes for various lightboxes and galleries so it was just the original, huge images I had to contend with.
The solution - on my own copy of the WP uploads dir - was to run this command and watch it go:
find ./ -type f \( -iname \*.jpg -o -iname \*.jpeg -o -iname \*.png \) -exec mogrify -quality 90 -verbose -resize 1600x1600\> {} \;
It's basically searching all files and subfolders for .jpg, .jpeg and .png files and resizing to no more than 1600px landscape/portrait whilst preserving aspect ratio (no cropping) and 90% image quality. I chose those sizes as the various links in galleries and lightboxes were loading the original file, not one of the WP resized ones, so now we can be fairly sure that we're loading <500kb of image maximum instead of sometimes 10mb+ (some folks have really nice cameras ? ).
I'm sure @horst can probably tell me if that command could be better. I think mogrify is usually for a batch of files for example, whereas the find command is serving them up individually so convert might be more appropriate than mogrify, but this command iterates through the files quickly and only resizes which ones it needs to and is happily chugging away right now so I'm happy with it.
One thing to note is that any resizing can be a bit CPU-heavy, so if you have the chance to do this on a local server first or out of "normal" hours for site visitors then that's recommended as the server may slow down for the duration.