Jan Romero Posted April 21, 2020 Share Posted April 21, 2020 Hi, I would post an issue on GitHub, but I’d like to understand the problem better. Because us Germans are very diligent and conscientious about that sort of thing, my users do things like upload images containing a copyright symbol in the file name… ProcessWire wants to sanitize this and puts it through, among other things, Sanitizer->nameFilter(). This is called with Sanitizer::translate in the $beautify argument (what is beautify?), so it puts the file name through mb_strtolower() (why is it only lowercased if Sanitizer::translate is passed and multibyte support is on?). After that, because the $beautify argument was anything, not necessarily Sanitizer::translate, the file name is also put through iconv("UTF-8", "ASCII//TRANSLIT//IGNORE", $value). That’s the point at which “©” is converted to “(C)”, capital C. More things happen, but the capital C stays and becomes part of the file name in the page’s assets dir. Then a bunch of other things happen and later PW wants to check the file with filemtime(), looks for the name in all lower and throws because it doesn’t find it. So somewhere in the process, PW handled the file name or path in two seperate ways, one time resulting in an all lower case version and one time still containing the C. Now I’d just make a pull request where I do another mb_strtolower() after iconv, but I figure there is a reason why those things happen in separate code paths. Obviously the nameFilter() method is not expected to only ever give all-lower results. Here is the snippet: if($beautify && $needsWork) { if($beautify === self::translate && $this->multibyteSupport) { $value = mb_strtolower($value); if(empty($replacements)) { $configData = $this->wire('modules')->getModuleConfigData('InputfieldPageName'); $replacements = empty($configData['replacements']) ? InputfieldPageName::$defaultReplacements : $configData['replacements']; } foreach($replacements as $from => $to) { if(mb_strpos($value, $from) !== false) { $value = mb_eregi_replace($from, $to, $value); } } } //this happens independently of the lowercasing if(function_exists("\\iconv")) { $v = iconv("UTF-8", "ASCII//TRANSLIT//IGNORE", $value); if($v) $value = $v; } $needsWork = strlen(str_replace($allowed, '', $value)); } So, maybe the iconv() business should go inside the preceding if block? Maybe it should just go above? Maybe the capital C is expected from this method and the fix is somewhere else entirely. I’ve also tested ®, ™ and ℗. ℗ actually works because it’s just stripped at some point. ® and ™ both end up in caps and throw. Thanks for reading ;) 2 Link to comment Share on other sites More sharing options...
horst Posted April 22, 2020 Share Posted April 22, 2020 6 hours ago, Jan Romero said: why is it only lowercased This one is because you cannot use case sensitive filenames on systems like windows. If you would have myFileName.jpg and then you would get and save MYfileNAME.JPG, the latter would overwrite the first. And for the issue in general, I would file an issue on Github, so that Ryan get recognition of it. Maybe he than knows what is the best place to fix that. 3 Link to comment Share on other sites More sharing options...
Jan Romero Posted May 1, 2020 Author Share Posted May 1, 2020 Okaydokes, I've opened an issue here https://github.com/processwire/processwire-issues/issues/1166 2 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now