Jump to content


Photo

Not (a-z) chars support in PageAutocomplete or in "Add New" Page inputfield


  • Please log in to reply
7 replies to this topic

#1 seddass

seddass

    Distinguished Member

  • Members
  • PipPipPip
  • 56 posts
  • 8

Posted 26 May 2012 - 12:51 PM

Hi all,

I am trying to create a tags functionality for hotel features, like "pool, sauna, etc", and the new tags to be created automatically when entered for the first time. The specific is that tags titles should to be with cyrillic characters.

I was hoping that PageAutocomplete will be ideal for tags system, but it doesn't support searching with cyrillic chars on keypress

Unfortunately "Create new" feature in Page input field doesn't support saving the items with cyrillic characters in their title, because it cant replace them automatically to their (a-z) equivalents for the "name" field.

1. Is there an easy way to use the PageName character replacement feature when using "Create new" in Page input field? Dont you think it will be great if sanitizer->name support such replacement internally?

2. And something related.. Someday I will ask if PW will support not (a-z) characters in URLs. I know that there are standarts and the cyrillic chars are not included in allowed chars. However... when searching for something in cyrillic, many of the Google results contain cyr characters in their URL. Probably we should to be competitive in SEO point of view and to be allowed to use the not a-z characters in the URL? The same for other specific chars in German and other languages. What do you think?

Thanks

#2 MadeMyDay

MadeMyDay

    Sr. Member

  • Members
  • PipPipPipPip
  • 139 posts
  • 125

Posted 27 May 2012 - 02:49 AM

Hi seddass,

Go to modules overview and look for the page name input field. Click on it, there you can define the rules for the char replacement. Try to include the Cyrillic characters there.

#3 seddass

seddass

    Distinguished Member

  • Members
  • PipPipPip
  • 56 posts
  • 8

Posted 27 May 2012 - 10:10 AM

Thanks MadeMyDay, the most of the cyrillic characters are already there by default.

Meanwhile I have found that the PageName input field replacement was NOT enabled by default in $sanitizer->pageName(). I have modified the Pages->setupNew() method to enable it and this allowed me to use "Create new" feature with not (a-z) characters.

#4 ryan

ryan

    Hero Member

  • Administrators
  • 5,812 posts
  • 3144

  • LocationAtlanta, GA

Posted 29 May 2012 - 10:25 AM

Meanwhile I have found that the PageName input field replacement was NOT enabled by default in $sanitizer->pageName().


Thanks, I will make the same change in the core, replacing the second 'true' param with 'Sanitizer::translate' in the setupNew() function. The translate option was added to the sanitizer pretty recently.

#5 seddass

seddass

    Distinguished Member

  • Members
  • PipPipPip
  • 56 posts
  • 8

Posted 29 May 2012 - 12:16 PM

Thanks Ryan!

I would like to remind about the second part of my post, about using PW with other than allowed (a-z-.) characters in the URLs. It seems that Google prioritize such sites compared to their competitors. Do you think it will be possible in some of the PW future releases and if it will worth the effort?

#6 ryan

ryan

    Hero Member

  • Administrators
  • 5,812 posts
  • 3144

  • LocationAtlanta, GA

Posted 31 May 2012 - 10:28 AM

While I know UTF-8 is possible in the query string of URLs, I had thought that domains/paths in URLs were limited to a subset of ascii characters (at least if we're trying to be standards compliant). I could be wrong about that, but honestly have not seen UTF-8 domains/paths before. (Or if I have, I didn't recognize it as that). Do you know of another open source CMS that supports this? I could take a closer look to see what's involved in the implementation and security of that, but would like to have other examples as this is something I'd not heard of before.

Regarding Google and prioritization, is there any research/documentation that supports the theory that it prioritizes sites using UTF-8 in URLs? I guess that would surprise me, but I always have an open mind. :) You've got me curious.

#7 apeisa

apeisa

    Hero Member

  • Moderators
  • 2,531 posts
  • 861

  • LocationVihti, Finland

Posted 31 May 2012 - 03:57 PM

There are other than a-z chars supported, but not sure how. It might be on browser level. If I go to http://fi.wikipedia.org/wiki/ääkköset it all works and looks nice... but when I copy & paste the url from address bar (chrome), I get this: http://fi.wikipedia....g/wiki/Ääkköset

EDIT: I mean I get this:
http://fi.wikipedia.org/wiki/%C3%84%C3%A4kk%C3%B6set

Edited by apeisa, 31 May 2012 - 03:58 PM.


#8 netcarver

netcarver

    Sr. Member

  • Members
  • PipPipPipPip
  • 428 posts
  • 341

  • LocationUK

Posted 31 May 2012 - 04:18 PM

Antti,

As far as I can tell, URIs are all represented in a subset of ASCII characters (see RFC3986) but allow for the embedding of other characters (including unicode characters) by allowing them to be percent encoded into the URI. Browsers understand this and decode URIs to display the correct characters in the address bar and they allow you to enter the unicode when typing the characters in the address, converting them on submission using URL encoding. You can do this yourself in PHP using urlencode() or rawurlencode().

Looks like copy and paste out of chrome is pulling the encoded string out of the address bar.

Edited to add: Just found the relevant part of the article I linked...

The generic URI syntax mandates that new URI schemes that provide for the representation of character data in a URI must, in effect, represent characters from the unreserved set without translation, and should convert all other characters to bytes according to UTF-8, and then percent-encode those values. This requirement was introduced in January 2005 with the publication of RFC 3986. URI schemes introduced before this date are not affected.


Edited by netcarver, 31 May 2012 - 04:24 PM.

Steve ☧




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users