Jump to content

urlSegments: Underscores together with uppercase characters cause 404 redirect


ruuskju
 Share

Recommended Posts

Hi, 

i have a template with allowed urlSegments. No other restrictions in the template. My test template is simple. Just trying to get and echo the url segment. 
But the problem is, if I use _ (or - or .) together with uppercase characters, my template file is never processed. The 404 redirect happens before. 

this works: 
/url/aa_aa (or /url/aa-aa)

but these fail, also with - and . 
/url/Aa_aa
/url/aA_aa
/url/aa_Aa
/url/aa_aA

So it doesn't matter where the uppercase character is. 

<?php namespace ProcessWire;

if($input->urlSegment1) {
  echo $input->urlSegment1
}

Processwire version 3.0.210 with Php 8.2

$config->maxUrlSegments = 8;
$config->pageNameCharset = 'UTF8';
$config->pageNameWhitelist = '-_.abcdefghijklmnopqrstuvwxyz0123456789æåäßöüđжхцчшщюяàáâèéëêěìíïîõòóôøùúûůñçčćďĺľńňŕřšťýžабвгдеёзийклмнопрстуфыэęąśłżź-FISEN';

My site is multilingual and this might have something to do with that. In my 404 logs i get something like this:

2023-03-03 12:35:19	41	/url/aa-aA	page doesn't exist [IP: 127.xxx.xxx.xxx] [UA: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36]
2023-03-03 12:35:24	41	/fi/url/aa-aA	page doesn't exist [IP: 127.xxx.xxx.xxx] [UA: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36]

Any help appreciated. We actually have this bug in production at the moment...

Link to comment
Share on other sites

  • ruuskju changed the title to urlSegments: Underscores together with uppercase characters cause 404 redirect

It appears that pagesPathFinder.php checks for bad name in #519 and if there's a bad name then the response is set to 400. And although we have set the $config->pageNameCharset to UTF8 and have a $config->pageNameWhitelist it still doesn't work. Without pageNameCharset it works. So there's a difference in behavior with or without UTF8.

Bad name is checked with $sanitizer->pageNameUTF8($name) and this one always returns a lowercase name. When compared with original uppercase one it ends up in the $badNames array. We had to move fast so we changed the $namePrev value to $namePrev = strtolower($name);

foreach($parts as $n => $name) {
  $lastPart = $name;
  if(ctype_alnum($name)) continue;
  // $namePrev = $name; ORIGINAL
  $namePrev = strtolower($name); // QUICK FIX
  $name = $sanitizer->pageNameUTF8($name);
  $parts[$n] = $name;
  if($namePrev !== $name) $badNames[$n] = $namePrev;
}

if($result['response'] < 400 && count($badNames)) {
  $result['response'] = 400; // 400=Bad request
  $this->addResultError('pathBAD', 'Path contains invalid character(s)');
}
Link to comment
Share on other sites

10 hours ago, ruuskju said:

together with uppercase characters

According to the docs, uppercase letters are not valid for URL segments: https://processwire.com/docs/admin/setup/templates/#allow-url-segments

Quote

URL segments must follow the same format as page names. Meaning, they can be any combination of lowercase ASCII letters (a-z), numbers (0-9), dashes, underscores and periods.

 

Link to comment
Share on other sites

And isn't the idea of sanitizer here to sanitize the value to safe value. Aka to lowercase? For example this one works /url/AABB and it it sanitized to /url/aabb, so the only difference is underscore character, and it should work in UTF8 mode? 

Link to comment
Share on other sites

  • 1 month later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...