Macrura Posted February 11, 2017 Share Posted February 11, 2017 I'm working on a module that scans the textarea field and imports external images, and replaces the reference to them in the editor with the local version. This is the function that is hooked after Page save. Mods: http://modules.processwire.com/modules/import-external-images/ Github: https://github.com/outflux3/ImportExternalImages 4 Link to comment Share on other sites More sharing options...
adrian Posted February 11, 2017 Share Posted February 11, 2017 If I understand the issue correctly, you can just add the image and then get the image name by getting the last added image. no need to clean the name - just let PW do it when you ->add() it. I do the same thing here: https://github.com/adrianbj/ProcessMigrator/blob/0dbeacdf4d1d4a8060d1d513ca9cb2eced3e540a/ProcessMigrator.module#L2577-L2595 As you'll see further up in that function - I am using virtually the same code as you for extracting external images from HTML, importing them locally and setting the new URL in the HTML. 5 Link to comment Share on other sites More sharing options...
Macrura Posted February 11, 2017 Author Share Posted February 11, 2017 cool - yes, thanks almost have this working using your examples!, will post final code once it's tested... Edit: this is the new improved version for specific usage; could be improved a lot (basically copy the migrator code), with the try/catch in case the external image can't be retrieved, and also allow for some additional checks for image field, RTE field; for most applications i can just add this to my SiteUtilities module and enable it for some specific templates and fields... public function importExternalImages($event) { $page = $event->arguments[0]; $html = $page->body; if (strpos($html,'<img') === false) return; //return early if no images are embedded in html $dom = new \DOMDocument(); $dom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8')); $images = $dom->getElementsByTagName('img'); if(!$images->length) return; // not needed? $assetsPath = $this->pages->get($page->id)->filesManager()->url(); $extCount = 0; foreach ($images as $image) { $img_url = $image->getAttribute('src'); if(!filter_var($img_url, FILTER_VALIDATE_URL)) continue; $page->images->add($img_url); if($image->getAttribute('title') != ''){ $page->images->last()->description = $image->getAttribute('title'); } elseif($image->getAttribute('alt') != ''){ $page->images->last()->description = $image->getAttribute('alt'); } //resize image to make version to match the size originally in the RTE //check to make sure size is different to downloaded version before resizing if($image->getAttribute('width') && $image->getAttribute('width') != $page->images->last()->width) { $imgForRte = $page->images->last()->size($image->getAttribute('width'), 0); } else { $imgForRte = $page->images->last(); } $image->setAttribute('src', $imgForRte->url); $extCount++; } if(!$extCount) return; $page->of(false); //$page->body = $dom->saveHTML(); $page->body = preg_replace('/^<!DOCTYPE.+?>/', '', str_replace( array('<html>', '</html>', '<body>', '</body>'), array('', '', '', ''), $dom->saveHTML())); $page->save('body'); $this->message("image links updated to local images."); $page->save('images'); $this->message("external images imported to page"); } thanks again for your help and advice! 5 Link to comment Share on other sites More sharing options...
adrian Posted February 11, 2017 Share Posted February 11, 2017 27 minutes ago, Macrura said: could be improved a lot (basically copy the migrator code) I won't take that as too much of an insult 2 Link to comment Share on other sites More sharing options...
Macrura Posted February 11, 2017 Author Share Posted February 11, 2017 right, if i copy your code more closely, then my function would be improved a lot... Link to comment Share on other sites More sharing options...
adrian Posted February 11, 2017 Share Posted February 11, 2017 1 minute ago, Macrura said: right, if i copy your code more closely, then my function would be improved a lot... Oh, I wouldn't say that at all. Lots of the code in Migrator is very messy - I was just joking Link to comment Share on other sites More sharing options...
Macrura Posted February 11, 2017 Author Share Posted February 11, 2017 thanks again - i learned a lot from studying the migrator as well as your other modules... I'm surprised this hasn't come up before, as i would think it quite common for clients to paste in HTML to the editor with external image references; My one client was totally confused because they would paste in these reviews with all of the images, not realizing the images were being pulled from external sites, and then they couldn't click the images in the editor; 3 Link to comment Share on other sites More sharing options...
adrian Posted February 11, 2017 Share Posted February 11, 2017 21 minutes ago, Macrura said: I'm surprised this hasn't come up before, as i would think it quite common for clients to paste in HTML to the editor with external image references; I agree and think you have a great little module here that will likely become a default install for me. Link to comment Share on other sites More sharing options...
Macrura Posted February 11, 2017 Author Share Posted February 11, 2017 ok - thanks - i will work on it and get it configurable, and do some more dummy testing... Link to comment Share on other sites More sharing options...
LostKobrakai Posted February 12, 2017 Share Posted February 12, 2017 9 hours ago, Macrura said: I'm surprised this hasn't come up before, as i would think it quite common for clients to paste in HTML to the editor with external image references; Hopefully their own content and not simply taken from others 2 Link to comment Share on other sites More sharing options...
B3ta Posted February 12, 2017 Share Posted February 12, 2017 Looking forward to check the plugin. Put me on the beta testers list please... I will need to import few thousands images and this small plugin will save my life. Link to comment Share on other sites More sharing options...
Macrura Posted February 13, 2017 Author Share Posted February 13, 2017 ok sure - yeah, you can use it as is now if you hardcode the field names; (i will setup a proof of concept module shortly on GitHub to have a fully installable module..) 1 Link to comment Share on other sites More sharing options...
Macrura Posted February 15, 2017 Author Share Posted February 15, 2017 Here is a pre-release version on Github: https://github.com/outflux3/ImportExternalImages Still needs some cleaning up, but works fine so far in limited testing on 3 sites... 6 Link to comment Share on other sites More sharing options...
mciccone Posted April 11, 2017 Share Posted April 11, 2017 We are in the process of transferring a LARGE number of websites from an older proprietary CMS to PW and we typically have a huge amount of embedded files/images inside the content, so this will be a very useful module for us. How difficult would it be to also import files (ie: pdf files inside a tags) from external sites, or is this functionality you have considered? I think, just looking at the code, it would be fairly straightforward? Link to comment Share on other sites More sharing options...
Macrura Posted April 11, 2017 Author Share Posted April 11, 2017 it depends on how you setup the logic to replace the anchor tags, but it should be doable for sure; would require more changes to the module though as you'd need to have config to select the files field, possibly a setting to enable looking for files; and then all of the logic to cycle through the anchor tags and check to see if they have a specific extension; seems that if you were going to allow file imports, you'd want to make it configurable for the file extension, in case you wanted to import some other filetypes.. Link to comment Share on other sites More sharing options...
adrian Posted September 4, 2018 Share Posted September 4, 2018 Hey @Macrura - just wanted to let you know that I just started using this module and it's very handy. I don't have enough pages to bother with creating an import script, so copy paste is the way to go and this has made things much quicker. Thanks! 3 Link to comment Share on other sites More sharing options...
Pete Posted September 23, 2018 Share Posted September 23, 2018 Great module - any chance you can look into making it scan the field for different language versions as sometimes these have different images? Link to comment Share on other sites More sharing options...
cb2004 Posted November 8, 2018 Share Posted November 8, 2018 I discovered this module today. Fantastic. 1 Link to comment Share on other sites More sharing options...
Pretobrazza Posted April 15, 2020 Share Posted April 15, 2020 Hi, I'm importing a fairly large Joomla site that exist since 2009. The module is a lifesaver BUT I encounter 2 problems where nothing happens: When the extension ends in .JPG (all caps) the image gets skipped Even if one image width is expressed in e.g. 30%, the whole import process gets aborted I'd be VERY happy to find out where I can adjust the script so that the module would run flawlessly. On a second breath, can someone point me to a script where all children of a parent get opened and saved one by one so that all the images get imported automatically with this module? Thanks in advance! Link to comment Share on other sites More sharing options...
Macrura Posted April 15, 2020 Author Share Posted April 15, 2020 @Pretobrazza, OK (1) so i just made a small modification which will convert the extension to all lowercase before it checks the field settings - each field has a list of allowed extensions; i'm guessing you don't have both jpg and JPG, so this will solve this hopefully. (2) Processwire expects an integer for image width, so if there is a % in the width, i guess the imagesizer is fatal error. I don't think it is technically correct to have anything other than an integer in the width attribute, but I have updated the module to skip resize for any images that have a percent in the width attribute. (3) You can run a simple script in Tracy Debugger; $p = $pages->get([page id]); foreach($p->children as $c) { $c->of(false); $c->save(); } @Pete - sorry for not replying to your post, i wasn't following this topic for some reason, so i didn't see your post. I'll look into the multilanguage thing, so that it scans all of the languages. I'm. guessing that right now it doesn't work at all for importing images in a multilanguage RTE field. 1 Link to comment Share on other sites More sharing options...
Pretobrazza Posted April 16, 2020 Share Posted April 16, 2020 @Macrura - You really deserve a gold medal a) for the speed you responded and b) for how the module sailed through all the articles, pulling in all the images smoothly. On (3) however, I wasn't successful as somehow the script didn't work in Tracy Debugger's console but not to worry ... I went to Batch Child editor where I 'Unpublished' the articles in bulk and published them again. In the process your module pulled in all the images. ;) Once again, Thank you very much! 2 Link to comment Share on other sites More sharing options...
adrian Posted November 27, 2021 Share Posted November 27, 2021 @Macrura - just came across an issue. I am importing content which includes images with cache busting parameters, eg: <img src="https://oldsite.com/image.jpg?1631735176" /> We need to remove that ?1631735176 for things to work. A simple solution is: $html = $page->$ta_field; $html = preg_replace('/(.*)(.jpe?g|.gif|.png)(\?[\d]+)(.*)/', '$1$2$4', $html); Thanks! Link to comment Share on other sites More sharing options...
Macrura Posted November 27, 2021 Author Share Posted November 27, 2021 i added it, but Github seems to be having a bad day; I also noticed a whole bunch of other new stuff that wasn't committed in the repo, mostly stuff that is attributed to you; I pushed the latest up to GH, and hopefully it will work the same; I haven't had a chance to test it yet today, but will do so soon... Link to comment Share on other sites More sharing options...
Pretobrazza Posted February 13, 2023 Share Posted February 13, 2023 @Macrura Hi, In a site under development, I imported 3420 articles with img links to a joomla/seblod website. Now as I did before I want to import those images into their respective pages in PW. In 2020, I went to Batch Child editor where I 'Unpublished' the articles in bulk and published them again. In the process your module pulled in all the images. This time, I seemingly cannot do this anymore. So, amongst many other trials, I created a template in admin and an empty page where I put the script as hereunder. The script runs but none of the images are pulled over. I see that your module uses a hook-after-save. How can I make your module to work to convert those pages in bulk? $p = $pages->get(2356); $p->children->find("limit=3420"); try { foreach($p->children as $c) { $c->of(false); $c->save(); } echo "Children saved successfully!"; } catch (Exception $e) { echo $e->getMessage(); } Kind regards, Bernard Link to comment Share on other sites More sharing options...
Macrura Posted February 13, 2023 Author Share Posted February 13, 2023 @Pretobrazza - Hi Bernard - So assuming that there are image references in body field, and you have done the relevant settings, the next thing to see is where it might be failing. Could you try and see what happens if you save the specific field, as in $c->save("body"), and see if that works? Else i may need to do some testing and try and replicate your setup. - Marc Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now