seddass Posted May 26, 2012 Share Posted May 26, 2012 Hi all, I am trying to create a tags functionality for hotel features, like "pool, sauna, etc", and the new tags to be created automatically when entered for the first time. The specific is that tags titles should to be with cyrillic characters. I was hoping that PageAutocomplete will be ideal for tags system, but it doesn't support searching with cyrillic chars on keypress Unfortunately "Create new" feature in Page input field doesn't support saving the items with cyrillic characters in their title, because it cant replace them automatically to their (a-z) equivalents for the "name" field. 1. Is there an easy way to use the PageName character replacement feature when using "Create new" in Page input field? Dont you think it will be great if sanitizer->name support such replacement internally? 2. And something related.. Someday I will ask if PW will support not (a-z) characters in URLs. I know that there are standarts and the cyrillic chars are not included in allowed chars. However... when searching for something in cyrillic, many of the Google results contain cyr characters in their URL. Probably we should to be competitive in SEO point of view and to be allowed to use the not a-z characters in the URL? The same for other specific chars in German and other languages. What do you think? Thanks Link to comment Share on other sites More sharing options...
MadeMyDay Posted May 27, 2012 Share Posted May 27, 2012 Hi seddass, Go to modules overview and look for the page name input field. Click on it, there you can define the rules for the char replacement. Try to include the Cyrillic characters there. 1 Link to comment Share on other sites More sharing options...
seddass Posted May 27, 2012 Author Share Posted May 27, 2012 Thanks MadeMyDay, the most of the cyrillic characters are already there by default. Meanwhile I have found that the PageName input field replacement was NOT enabled by default in $sanitizer->pageName(). I have modified the Pages->setupNew() method to enable it and this allowed me to use "Create new" feature with not (a-z) characters. 1 Link to comment Share on other sites More sharing options...
ryan Posted May 29, 2012 Share Posted May 29, 2012 Quote Meanwhile I have found that the PageName input field replacement was NOT enabled by default in $sanitizer->pageName(). Thanks, I will make the same change in the core, replacing the second 'true' param with 'Sanitizer::translate' in the setupNew() function. The translate option was added to the sanitizer pretty recently. 2 Link to comment Share on other sites More sharing options...
seddass Posted May 29, 2012 Author Share Posted May 29, 2012 Thanks Ryan! I would like to remind about the second part of my post, about using PW with other than allowed (a-z-.) characters in the URLs. It seems that Google prioritize such sites compared to their competitors. Do you think it will be possible in some of the PW future releases and if it will worth the effort? Link to comment Share on other sites More sharing options...
ryan Posted May 31, 2012 Share Posted May 31, 2012 While I know UTF-8 is possible in the query string of URLs, I had thought that domains/paths in URLs were limited to a subset of ascii characters (at least if we're trying to be standards compliant). I could be wrong about that, but honestly have not seen UTF-8 domains/paths before. (Or if I have, I didn't recognize it as that). Do you know of another open source CMS that supports this? I could take a closer look to see what's involved in the implementation and security of that, but would like to have other examples as this is something I'd not heard of before. Regarding Google and prioritization, is there any research/documentation that supports the theory that it prioritizes sites using UTF-8 in URLs? I guess that would surprise me, but I always have an open mind. You've got me curious. Link to comment Share on other sites More sharing options...
apeisa Posted May 31, 2012 Share Posted May 31, 2012 (edited) There are other than a-z chars supported, but not sure how. It might be on browser level. If I go to http://fi.wikipedia.org/wiki/ääkköset it all works and looks nice... but when I copy & paste the url from address bar (chrome), I get this: http://fi.wikipedia....g/wiki/Ääkköset EDIT: I mean I get this: http://fi.wikipedia.org/wiki/%C3%84%C3%A4kk%C3%B6set Edited May 31, 2012 by apeisa Link to comment Share on other sites More sharing options...
netcarver Posted May 31, 2012 Share Posted May 31, 2012 (edited) Antti, As far as I can tell, URIs are all represented in a subset of ASCII characters (see RFC3986) but allow for the embedding of other characters (including unicode characters) by allowing them to be percent encoded into the URI. Browsers understand this and decode URIs to display the correct characters in the address bar and they allow you to enter the unicode when typing the characters in the address, converting them on submission using URL encoding. You can do this yourself in PHP using urlencode() or rawurlencode(). Looks like copy and paste out of chrome is pulling the encoded string out of the address bar. Edited to add: Just found the relevant part of the article I linked... Quote The generic URI syntax mandates that new URI schemes that provide for the representation of character data in a URI must, in effect, represent characters from the unreserved set without translation, and should convert all other characters to bytes according to UTF-8, and then percent-encode those values. This requirement was introduced in January 2005 with the publication of RFC 3986. URI schemes introduced before this date are not affected. Edited May 31, 2012 by netcarver 2 Link to comment Share on other sites More sharing options...
nfil Posted October 8, 2013 Share Posted October 8, 2013 On 5/27/2012 at 7:49 AM, MadeMyDay said: Hi seddass, Go to modules overview and look for the page name input field. Click on it, there you can define the rules for the char replacement. Try to include the Cyrillic characters there. Thank you seddass! I was looking for this. On the modules Page Name Settings I added a few latin characters for PW 2.3 auto generated URLS I was creating a page with the title: sopa de cação and the generated URL was sopa-de-cac-o These are the pt-pt characters I added to the module's Page Name: ã=a õ=o Not sure if PW installation could come with these two ã õ already? Just for reference: // Latin'À' => 'A', 'Á' => 'A', 'Â' => 'A', 'Ã' => 'A', 'Ä' => 'A', 'Å' => 'A', 'Æ' => 'AE', 'Ç' => 'C', 'È' => 'E', 'É' => 'E', 'Ê' => 'E', 'Ë' => 'E', 'Ì' => 'I', 'Í' => 'I', 'Î' => 'I', 'Ï' => 'I', 'Ð' => 'D', 'Ñ' => 'N', 'Ò' => 'O', 'Ó' => 'O', 'Ô' => 'O', 'Õ' => 'O', 'Ö' => 'O', 'Ő' => 'O', 'Ø' => 'O', 'Ù' => 'U', 'Ú' => 'U', 'Û' => 'U', 'Ü' => 'U', 'Ű' => 'U', 'Ý' => 'Y', 'Þ' => 'TH', 'ß' => 'ss', 'à' => 'a', 'á' => 'a', 'â' => 'a', 'ã' => 'a', 'ä' => 'a', 'å' => 'a', 'æ' => 'ae', 'ç' => 'c', 'è' => 'e', 'é' => 'e', 'ê' => 'e', 'ë' => 'e', 'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i', 'ð' => 'd', 'ñ' => 'n', 'ò' => 'o', 'ó' => 'o', 'ô' => 'o', 'õ' => 'o', 'ö' => 'o', 'ő' => 'o', 'ø' => 'o', 'ù' => 'u', 'ú' => 'u', 'û' => 'u', 'ü' => 'u', 'ű' => 'u', 'ý' => 'y', 'þ' => 'th', 'ÿ' => 'y', // Latin symbols '©' => '©', Link to comment Share on other sites More sharing options...
xweb Posted February 18, 2014 Share Posted February 18, 2014 i would like to second this last post, if those two characters could be included in the default install that would be great. Thanks AM Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now