Jump to content
Macrura

Import External Images

Recommended Posts

I'm working on a module that scans the textarea field and imports external images, and replaces the reference to them in the editor with the local version. This is the function that is hooked after Page save.

Mods:

http://modules.processwire.com/modules/import-external-images/

Github:

https://github.com/outflux3/ImportExternalImages

  • Like 4

Share this post


Link to post
Share on other sites

If I understand the issue correctly, you can just add the image and then get the image name by getting the last added image. no need to clean the name - just let PW do it when you ->add() it.

I do the same thing here: https://github.com/adrianbj/ProcessMigrator/blob/0dbeacdf4d1d4a8060d1d513ca9cb2eced3e540a/ProcessMigrator.module#L2577-L2595

As you'll see further up in that function - I am using virtually the same code as you for extracting external images from HTML, importing them locally and setting the new URL in the HTML.

  • Like 5

Share this post


Link to post
Share on other sites

cool - yes, thanks almost have this working using your examples!, will post final code once it's tested...

Edit: this is the new improved version for specific usage; could be improved a lot  (basically copy the migrator code), with the try/catch in case the external image can't be retrieved, and also allow for some additional checks for image field, RTE field; for most applications i can just add this to my SiteUtilities module and enable it for some specific templates and fields...

	public function importExternalImages($event) {

		$page = $event->arguments[0];
		$html = $page->body;
		if (strpos($html,'<img') === false) return; //return early if no images are embedded in html

        $dom = new \DOMDocument(); 
        $dom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));
        $images = $dom->getElementsByTagName('img');
        if(!$images->length) return; // not needed?

        $assetsPath = $this->pages->get($page->id)->filesManager()->url();

        $extCount = 0;
        foreach ($images as $image) {
            $img_url = $image->getAttribute('src');
            if(!filter_var($img_url, FILTER_VALIDATE_URL)) continue;
            $page->images->add($img_url);

	        if($image->getAttribute('title') != ''){
	            $page->images->last()->description = $image->getAttribute('title');
	        }
	        elseif($image->getAttribute('alt') != ''){
	            $page->images->last()->description = $image->getAttribute('alt');
			}

            //resize image to make version to match the size originally in the RTE
            //check to make sure size is different to downloaded version before resizing
            if($image->getAttribute('width') && $image->getAttribute('width') != $page->images->last()->width) {
                $imgForRte = $page->images->last()->size($image->getAttribute('width'), 0);
            }
            else {
                $imgForRte = $page->images->last();
            }
			$image->setAttribute('src', $imgForRte->url);

            $extCount++;
        }

        if(!$extCount) return;
    	$page->of(false);
    	//$page->body = $dom->saveHTML();

   		$page->body = preg_replace('/^<!DOCTYPE.+?>/', '', str_replace( array('<html>', '</html>', '<body>', '</body>'), array('', '', '', ''), $dom->saveHTML()));

   	 	$page->save('body');
   	 	$this->message("image links updated to local images.");

        $page->save('images');
        $this->message("external images imported to page");

    }

thanks again for your help and advice!

  • Like 5

Share this post


Link to post
Share on other sites
27 minutes ago, Macrura said:

could be improved a lot  (basically copy the migrator code)

I won't take that as too much of an insult :)

  • Like 2

Share this post


Link to post
Share on other sites

right, if i copy your code more closely, then my function would be improved a lot...:rolleyes:

Share this post


Link to post
Share on other sites
1 minute ago, Macrura said:

right, if i copy your code more closely, then my function would be improved a lot...:rolleyes:

Oh, I wouldn't say that at all. Lots of the code in Migrator is very messy - I was just joking :)

Share this post


Link to post
Share on other sites

thanks again - i learned a lot from studying the migrator as well as your other modules...

I'm surprised this hasn't come up before, as i would think it quite common for clients to paste in HTML to the editor with external image references; My one client was totally confused because they would paste in these reviews with all of the images, not realizing the images were being pulled from external sites, and then they couldn't click the images in the editor;

  • Like 3

Share this post


Link to post
Share on other sites
21 minutes ago, Macrura said:

I'm surprised this hasn't come up before, as i would think it quite common for clients to paste in HTML to the editor with external image references;

I agree and think you have a great little module here that will likely become a default install for me.

Share this post


Link to post
Share on other sites

ok - thanks - i will work on it and get it configurable, and do some more dummy testing...

Share this post


Link to post
Share on other sites
9 hours ago, Macrura said:

I'm surprised this hasn't come up before, as i would think it quite common for clients to paste in HTML to the editor with external image references;

Hopefully their own content and not simply taken from others :D

  • Like 2

Share this post


Link to post
Share on other sites

Looking forward to check the plugin. Put me on the beta testers list please...

I will need to import few thousands images and this small plugin will save my life.

 

Share this post


Link to post
Share on other sites

ok sure - yeah, you can use it as is now if you hardcode the field names; (i will setup a proof of concept module shortly on GitHub to have a fully installable module..)

  • Like 1

Share this post


Link to post
Share on other sites

We are in the process of transferring a LARGE number of websites from an older proprietary CMS to PW and we typically have a huge amount of embedded files/images inside the content, so this will be a very useful module for us.  How difficult would it be to also import files (ie: pdf files inside a tags) from external sites, or is this functionality you have considered?  I think, just looking at the code, it would be fairly straightforward?

Share this post


Link to post
Share on other sites

it depends on how you setup the logic to replace the anchor tags, but it should be doable for sure; would require more changes to the module though as you'd need to have config to select the files field, possibly a setting to enable looking for files; and then all of the logic to cycle through the anchor tags and check to see if they have a specific extension; seems that if you were going to allow file imports, you'd want to make it configurable for the file extension, in case you wanted to import some other filetypes..

Share this post


Link to post
Share on other sites

Hey @Macrura - just wanted to let you know that I just started using this module and it's very handy. I don't have enough pages to bother with creating an import script, so copy paste is the way to go and this has made things much quicker.

Thanks!

  • Like 3

Share this post


Link to post
Share on other sites

Great module - any chance you can look into making it scan the field for different language versions as sometimes these have different images?

Share this post


Link to post
Share on other sites

Hi,

I'm importing a fairly large Joomla site that exist since 2009. The module is a lifesaver BUT I encounter 2 problems where nothing happens:

  • When the extension ends in .JPG (all caps) the image gets skipped
  • Even if one image width is expressed in e.g. 30%, the whole import process gets aborted

I'd be VERY happy to find out where I can adjust the script so that the module would run flawlessly. 

On a second breath, can someone point me to a script where all children of a parent get opened and saved one by one so that all the images get imported automatically with this module? 

Thanks in advance! 

Share this post


Link to post
Share on other sites

@Pretobrazza,

OK

(1) so i just made a small modification which will convert the extension to all lowercase before it checks the field settings - each field has a list of allowed extensions; i'm guessing you don't have both jpg and JPG, so this will solve this hopefully.

(2) Processwire expects an integer for image width, so if there is a % in the width, i guess the imagesizer is fatal error. I don't think it is technically correct to have anything other than an integer in the width attribute, but I have updated the module to skip resize for any images that have a percent in the width attribute.

(3) You can run a simple script in Tracy Debugger;

$p = $pages->get([page id]);
foreach($p->children as $c) {
    $c->of(false);
    $c->save();
}

@Pete - sorry for not replying to your post, i wasn't following this topic for some reason, so i didn't see your post. I'll look into the multilanguage thing, so that it scans all of the languages. I'm. guessing that right now it doesn't work at all for importing images in a multilanguage RTE field.

  • Thanks 1

Share this post


Link to post
Share on other sites

@Macrura - You really deserve a gold medal a) for the speed you responded and b) for how the module sailed through all the articles, pulling in all the images smoothly.

On (3) however, I wasn't successful as somehow the script didn't work in Tracy Debugger's console but not to worry ... I went to Batch Child editor where I 'Unpublished' the articles in bulk and published them again. In the process your module pulled in all the images. ;) 

Once again, Thank you very much! 

  • Like 2

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...