qubism Posted August 20, 2021 Share Posted August 20, 2021 Hey Girls & Boys, I'm currently importing data with some basic scraping from another page and adding new pages with the ProcessWire API. The data is mostly text which works fine but some images can't get downloaded from the other page. My guess would be due to the filename. Here's the Error: ProcessWire\WireException File could not be downloaded (https://************.com/content/v1/5e8119f5232766b98/1613689-FJ73MYGC/Außenansicht+2+Galerie.jpg) 400 Bad Request: (tried: curl) search My Code snippet (throws the error also without sanitizer): if($html->find('img.thumb-image', 0)->{'data-src'}) { $image = $html->find('img.thumb-image', 0)->{'data-src'}; $p->article_thumb = $sanitizer->url($image); } Is there a way to do this? Thanks for your time ? Edit: Found a solution. urlencode changed the whole URL and made the API upload empty images. So I kept the URL until the last slash and just changed the filename like so: $image = $html->find('img.thumb-image', 0)->{'data-src'}; $imageURL = $image; $pos = strrpos($imageURL, '/') + 1; $result = substr($imageURL, 0, $pos) . urlencode(substr($imageURL, $pos)); Link to comment Share on other sites More sharing options...
Robin S Posted August 20, 2021 Share Posted August 20, 2021 Try using urlencode() on the image URL to deal with spaces and other potentially problematic characters. 1 Link to comment Share on other sites More sharing options...
JayGee Posted August 20, 2021 Share Posted August 20, 2021 Also check the source server doesn't hasn't blocked image hotlinking or downloading of images without a referrer. I had an issue with an import script once along these lines. 1 Link to comment Share on other sites More sharing options...
qubism Posted August 21, 2021 Author Share Posted August 21, 2021 7 hours ago, Robin S said: Try using urlencode() on the image URL to deal with spaces and other potentially problematic characters. I tried that and also rawurlencode() and the API just uploads an blank image with the whole absolute url as filename. 7 hours ago, Guy Incognito said: Also check the source server doesn't hasn't blocked image hotlinking or downloading of images without a referrer. I had an issue with an import script once along these lines. Most images work though, so I guess that's not the problem. Another option would be to check for special characters in the filename and skip that images, so ProcessWire at least doesn't throw an error. Not optimal, but still something Link to comment Share on other sites More sharing options...
horst Posted August 21, 2021 Share Posted August 21, 2021 When using urlencode, check if the + signs are changed too! Otherwise use a str_replace("+", "%20", $URL) additionally!! Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now