Jump to content

Import External Images


Macrura

Recommended Posts

I'm working on a module that scans the textarea field and imports external images, and replaces the reference to them in the editor with the local version. This is the function that is hooked after Page save.

Mods:

http://modules.processwire.com/modules/import-external-images/

Github:

https://github.com/outflux3/ImportExternalImages

  • Like 4
Link to comment
Share on other sites

If I understand the issue correctly, you can just add the image and then get the image name by getting the last added image. no need to clean the name - just let PW do it when you ->add() it.

I do the same thing here: https://github.com/adrianbj/ProcessMigrator/blob/0dbeacdf4d1d4a8060d1d513ca9cb2eced3e540a/ProcessMigrator.module#L2577-L2595

As you'll see further up in that function - I am using virtually the same code as you for extracting external images from HTML, importing them locally and setting the new URL in the HTML.

  • Like 5
Link to comment
Share on other sites

cool - yes, thanks almost have this working using your examples!, will post final code once it's tested...

Edit: this is the new improved version for specific usage; could be improved a lot  (basically copy the migrator code), with the try/catch in case the external image can't be retrieved, and also allow for some additional checks for image field, RTE field; for most applications i can just add this to my SiteUtilities module and enable it for some specific templates and fields...

	public function importExternalImages($event) {

		$page = $event->arguments[0];
		$html = $page->body;
		if (strpos($html,'<img') === false) return; //return early if no images are embedded in html

        $dom = new \DOMDocument(); 
        $dom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));
        $images = $dom->getElementsByTagName('img');
        if(!$images->length) return; // not needed?

        $assetsPath = $this->pages->get($page->id)->filesManager()->url();

        $extCount = 0;
        foreach ($images as $image) {
            $img_url = $image->getAttribute('src');
            if(!filter_var($img_url, FILTER_VALIDATE_URL)) continue;
            $page->images->add($img_url);

	        if($image->getAttribute('title') != ''){
	            $page->images->last()->description = $image->getAttribute('title');
	        }
	        elseif($image->getAttribute('alt') != ''){
	            $page->images->last()->description = $image->getAttribute('alt');
			}

            //resize image to make version to match the size originally in the RTE
            //check to make sure size is different to downloaded version before resizing
            if($image->getAttribute('width') && $image->getAttribute('width') != $page->images->last()->width) {
                $imgForRte = $page->images->last()->size($image->getAttribute('width'), 0);
            }
            else {
                $imgForRte = $page->images->last();
            }
			$image->setAttribute('src', $imgForRte->url);

            $extCount++;
        }

        if(!$extCount) return;
    	$page->of(false);
    	//$page->body = $dom->saveHTML();

   		$page->body = preg_replace('/^<!DOCTYPE.+?>/', '', str_replace( array('<html>', '</html>', '<body>', '</body>'), array('', '', '', ''), $dom->saveHTML()));

   	 	$page->save('body');
   	 	$this->message("image links updated to local images.");

        $page->save('images');
        $this->message("external images imported to page");

    }

thanks again for your help and advice!

  • Like 5
Link to comment
Share on other sites

thanks again - i learned a lot from studying the migrator as well as your other modules...

I'm surprised this hasn't come up before, as i would think it quite common for clients to paste in HTML to the editor with external image references; My one client was totally confused because they would paste in these reviews with all of the images, not realizing the images were being pulled from external sites, and then they couldn't click the images in the editor;

  • Like 3
Link to comment
Share on other sites

21 minutes ago, Macrura said:

I'm surprised this hasn't come up before, as i would think it quite common for clients to paste in HTML to the editor with external image references;

I agree and think you have a great little module here that will likely become a default install for me.

Link to comment
Share on other sites

  • 1 month later...

We are in the process of transferring a LARGE number of websites from an older proprietary CMS to PW and we typically have a huge amount of embedded files/images inside the content, so this will be a very useful module for us.  How difficult would it be to also import files (ie: pdf files inside a tags) from external sites, or is this functionality you have considered?  I think, just looking at the code, it would be fairly straightforward?

Link to comment
Share on other sites

it depends on how you setup the logic to replace the anchor tags, but it should be doable for sure; would require more changes to the module though as you'd need to have config to select the files field, possibly a setting to enable looking for files; and then all of the logic to cycle through the anchor tags and check to see if they have a specific extension; seems that if you were going to allow file imports, you'd want to make it configurable for the file extension, in case you wanted to import some other filetypes..

Link to comment
Share on other sites

  • 1 year later...
  • 3 weeks later...
  • 1 month later...
  • 1 year later...

Hi,

I'm importing a fairly large Joomla site that exist since 2009. The module is a lifesaver BUT I encounter 2 problems where nothing happens:

  • When the extension ends in .JPG (all caps) the image gets skipped
  • Even if one image width is expressed in e.g. 30%, the whole import process gets aborted

I'd be VERY happy to find out where I can adjust the script so that the module would run flawlessly. 

On a second breath, can someone point me to a script where all children of a parent get opened and saved one by one so that all the images get imported automatically with this module? 

Thanks in advance! 

Link to comment
Share on other sites

@Pretobrazza,

OK

(1) so i just made a small modification which will convert the extension to all lowercase before it checks the field settings - each field has a list of allowed extensions; i'm guessing you don't have both jpg and JPG, so this will solve this hopefully.

(2) Processwire expects an integer for image width, so if there is a % in the width, i guess the imagesizer is fatal error. I don't think it is technically correct to have anything other than an integer in the width attribute, but I have updated the module to skip resize for any images that have a percent in the width attribute.

(3) You can run a simple script in Tracy Debugger;

$p = $pages->get([page id]);
foreach($p->children as $c) {
    $c->of(false);
    $c->save();
}

@Pete - sorry for not replying to your post, i wasn't following this topic for some reason, so i didn't see your post. I'll look into the multilanguage thing, so that it scans all of the languages. I'm. guessing that right now it doesn't work at all for importing images in a multilanguage RTE field.

  • Thanks 1
Link to comment
Share on other sites

@Macrura - You really deserve a gold medal a) for the speed you responded and b) for how the module sailed through all the articles, pulling in all the images smoothly.

On (3) however, I wasn't successful as somehow the script didn't work in Tracy Debugger's console but not to worry ... I went to Batch Child editor where I 'Unpublished' the articles in bulk and published them again. In the process your module pulled in all the images. ;) 

Once again, Thank you very much! 

  • Like 2
Link to comment
Share on other sites

  • 1 year later...

@Macrura - just came across an issue. I am importing content which includes images with cache busting parameters, eg:

<img src="https://oldsite.com/image.jpg?1631735176"  />

We need to remove that ?1631735176 for things to work.

A simple solution is:

$html = $page->$ta_field;
$html = preg_replace('/(.*)(.jpe?g|.gif|.png)(\?[\d]+)(.*)/', '$1$2$4', $html);

Thanks!

Link to comment
Share on other sites

i added it, but Github seems to be having a bad day; I also noticed a whole bunch of other new stuff that wasn't committed in the repo, mostly stuff that is attributed to you; I pushed the latest up to GH, and hopefully it will work the same; I haven't had a chance to test it yet today, but will do so soon...

Link to comment
Share on other sites

  • 1 year later...

@Macrura 
Hi,

In a site under development, I imported 3420 articles with img links to a joomla/seblod website. Now as I did before I want to import those images into their respective pages in PW. 

In 2020, I went to Batch Child editor where I 'Unpublished' the articles in bulk and published them again. In the process your module pulled in all the images. This time, I seemingly cannot do this anymore.
So, amongst many other trials,  I created a template in admin and an empty page where I put the script as hereunder. The script runs but none of the images are pulled over. I see that your module uses a hook-after-save.  How can I make your module to work to convert those pages in bulk? 


$p = $pages->get(2356);
$p->children->find("limit=3420");
try {
    foreach($p->children as $c) {
        $c->of(false);
        $c->save();
    }
    echo "Children saved successfully!";
} catch (Exception $e) {
    echo $e->getMessage();
}  

Kind regards,
Bernard 

Link to comment
Share on other sites

@Pretobrazza - Hi Bernard -

So assuming that there are image references in body field, and you have done the relevant settings, the next thing to see is where it might be failing.

Could you try and see what happens if you save the specific field, as in $c->save("body"), and see if that works?

Else i may need to do some testing and try and replicate your setup.

- Marc

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...