Jump to content

Sanitizer->pageName vs actual page names


Hurme
 Share

Recommended Posts

Just a short question about sanitizer->pageName.

if I have a page title that has scandic letters in it. "Pääskynen" as an example, the pagename will be "paaskynen", but if I use sanitizer->pageName on it, I will get "pskynen" instead. 

Would it be possible to get what the backend sanitizer does with page names, or is there another sanitizer that works in similar way (changing ä > a instead of dropping letters)?

Link to comment
Share on other sites

I think the backend uses  pageName with the argument/option 'beautify'.

You can either use this:

https://processwire.com/api/ref/sanitizer/page-name-translate/

Quote

Name filter for ProcessWire Page names with transliteration

This is the same as calling pageName with the Sanitizer::translate option for the $beautify argument.

Or this, with beautify argument:

https://processwire.com/api/ref/sanitizer/page-name/

Quote

Sanitize as a ProcessWire page name

  • Page names by default support lowercase ASCII letters, digits, underscore, hyphen and period.

  • Because page names are often generated from a UTF-8 title, UTF-8 to ASCII conversion will take place when $beautify is enabled.

  • You may optionally omit the $beautify and/or $maxLength arguments and substitute the $options array instead.

  • When substituted, the beautify and maxLength options can be specified in $options as well.

  • If $config->pageNameCharset is "UTF8" then non-ASCII page names will be converted to punycode ("xn-") ASCII page names, rather than converted, regardless of $beautify setting.

 

Edited by kongondo
Clarity
  • Like 2
Link to comment
Share on other sites

  • 1 year later...

I have been trying to do the same as @Hurme, without luck.

My page titles have `'` and `&` and other special characters. I.e. the page title "A Young Doctor's Notebook & Other Stories".

I have tried pageName, pageNameTranslate and pageNameUTF8 in different variations with the value "true" and/or "$beautify". But the results are always something like:

  • a-young-doctor-039-s-notebook-amp-other-stories
  • a-young-doctor--039-s-notebook--amp--other-stories

Copy/paste of the page title "A Young Doctor's Notebook & Other Stories" into the name edit field in dashboard results in a-young-doctors-notebook-other-stories , which is what I am trying to achieve.

(A bonus would be if "&" was translated to "and" as well.)

Link to comment
Share on other sites

The console and testing with a string works fine. And thanks for the input on the InputfieldPageName module settings, I'll try that.

The problem is when I am working with the page title ($page->title) and not a string. Using $sanitizer->pageNameTranslate($page->title) outputs "a-young-doctor-039-s-notebook-amp-other-stories" when in a template.

($sanitizer->pageNameTranslate("A Young Doctor's Notebook & Other Stories") works fine).

Update:

Turning off "HTML Entity Encoder" settings for the page title field works. Is that recommended?

Link to comment
Share on other sites

Been experiencing the same issue and the solutions are not working for me. Let's say I have a product titled "Blick & Docker". In my template file, I'd like to $sanitize that title and use it as a URL segment that will end up as /products/blick-docker/. This would match the page name for this product since I can find that product page from the URL segment. Currently, I am ending up with /blick-amp-docker/ which leads to a 404 as it doesn't match my product page at that URL segment.  I know I can use regex to replace 'amp' but that won't work for products that genuinely have 'amp' in them. Any ideas please? Thanks.

Edited by kongondo
Fixed typo
Link to comment
Share on other sites

Why do you want to have an ampersand in your urls? It's a special character used for defining get parameters so I don't think it's a good idea to have them in urls and I guess that's at least one reason why the sanitizer removes it from page names. Why not just use "blick-and-docker" as url segment?

  • Like 2
Link to comment
Share on other sites

1 hour ago, bernhard said:

Why do you want to have an ampersand in your urls? It's a special character used for defining get parameters so I don't think it's a good idea to have them in urls and I guess that's at least one reason why the sanitizer removes it from page names. Why not just use "blick-and-docker" as url segment?

Good point. I wasn't clear. I also had a typo in there. What I need is this: "blick-docker" to match what is generated in page name field. As you point out & in the URL is not desirable. However, I am facing the inconsistency  that @snobjorn mentioned. Here it is again + a strange inconsistency I have seen.

  1. User creates a product called "Blick & Docker" <- I have no control over this naming.
  2. ProcessWire access that product's title and gives it the name "blick-docker".
  3. In the frontend, there is a list of products. Their links to their single product pages are generated dynamically from their titles. However, in the frontend, ProcessWire $sanitizes the title of this specific product as "blick-amp-docker".  (see code below).
  4. Finding that page using its URL segment fails since #3 does not match #2. 

 

<?php namespace ProcessWire;

$productName = $sanitizer->pageName($product->title, $beautify = true);// "blick-amp-docker"

What'd I'd like is a $sanitizer that gives me "blick-docker", i.e. matches the sanitizer used in a page name field. Maybe it is done at the JavaScript level, hence doesn't exist in $sanitizer? I have tried all the things suggested here (except changing the page name settings as I would like as little effort as possible from the user). What is strange is that pageNameTranslate also adds 'amp' if used in a template file in the frontend. If used in Tracy, there is no 'amp'!

EDIT: Please ignore this . Output formatting on/off, duh!

I hope this is clearer. Thanks ?

Edited by kongondo
Nothing to see here; sorry
Link to comment
Share on other sites

On 3/29/2022 at 7:59 AM, bernhard said:
$sanitizer->pageNameTranslate($page->getUnformatted('title'))

Sorry this works, of course it does! Thanks @bernhard. Nothing strange about $sanitizer that I mentioned above...just me. Sorry for wasting your time ?‍♂️.

Edited by kongondo
  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...