Jump to content

Image upload fails because PW gets confused about the file name


Jan Romero
 Share

Recommended Posts

Hi,

I would post an issue on GitHub, but I’d like to understand the problem better.

Because us Germans are very diligent and conscientious about that sort of thing, my users do things like upload images containing a copyright symbol in the file name…

ProcessWire wants to sanitize this and puts it through, among other things, Sanitizer->nameFilter(). This is called with Sanitizer::translate in the $beautify argument (what is beautify?), so it puts the file name through mb_strtolower() (why is it only lowercased if Sanitizer::translate is passed and multibyte support is on?). After that, because the $beautify argument was anything, not necessarily Sanitizer::translate, the file name is also put through iconv("UTF-8", "ASCII//TRANSLIT//IGNORE", $value). That’s the point at which “©” is converted to “(C)”, capital C. More things happen, but the capital C stays and becomes part of the file name in the page’s assets dir.

Then a bunch of other things happen and later PW wants to check the file with filemtime(), looks for the name in all lower and throws because it doesn’t find it. So somewhere in the process, PW handled the file name or path in two seperate ways, one time resulting in an all lower case version and one time still containing the C.

Now I’d just make a pull request where I do another mb_strtolower() after iconv, but I figure there is a reason why those things happen in separate code paths. Obviously the nameFilter() method is not expected to only ever give all-lower results. Here is the snippet:

		if($beautify && $needsWork) {
			if($beautify === self::translate && $this->multibyteSupport) {
				$value = mb_strtolower($value);

				if(empty($replacements)) {
					$configData = $this->wire('modules')->getModuleConfigData('InputfieldPageName');
					$replacements = empty($configData['replacements']) ? InputfieldPageName::$defaultReplacements : $configData['replacements'];
				}

				foreach($replacements as $from => $to) {
					if(mb_strpos($value, $from) !== false) {
						$value = mb_eregi_replace($from, $to, $value);
					}
				}
			}

			//this happens independently of the lowercasing
			if(function_exists("\\iconv")) {
				$v = iconv("UTF-8", "ASCII//TRANSLIT//IGNORE", $value);
				if($v) $value = $v;
			}
			$needsWork = strlen(str_replace($allowed, '', $value));
		}

So, maybe the iconv() business should go inside the preceding if block? Maybe it should just go above? Maybe the capital C is expected from this method and the fix is somewhere else entirely.

I’ve also tested ®, ™ and ℗. ℗ actually works because it’s just stripped at some point. ® and ™ both end up in caps and throw.

Thanks for reading ;‍)

  • Like 2
Link to comment
Share on other sites

6 hours ago, Jan Romero said:

why is it only lowercased

This one is because you cannot use case sensitive filenames on systems like windows. If you would have myFileName.jpg and then you would get and save MYfileNAME.JPG, the latter would overwrite the first.

And for the issue in general, I would file an issue on Github, so that Ryan get recognition of it. Maybe he than knows what is the best place to fix that. 

  • Like 3
Link to comment
Share on other sites

  • 2 weeks later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...