Jump to content

PW 3.0.12: Support for extended (UTF8) page names/URLs


ryan
 Share

Recommended Posts

You guys have to do it through Google translate. I'll give you something straight from the target audience of this update.

ПроцессВаир теперь поддерживает кириллические урлы! С сегодняшнего дня сайты в зоне .рф можно полноценно создавать на этой замечательной системе. Благодарим Райана за заботу. Ура, товарищи!

  • Like 7
Link to comment
Share on other sites

  • 2 weeks later...

My pages_path table has a path field with ascii_general_ci collation. So the German Umlauts can not be stored in this field. In the page_path_history table the 'path' field is utf8_general_ci  - which is fine for the Umlauts.

Skipping the page-path module leaves me with the pages table and the name field, which is also ascii_general_ci.

What is the best way to handle this situation?

Link to comment
Share on other sites

@WillyC

I started looking at the tables because it did not work. Every time I am using an Umlaut I get this message:

Session: Achtung, der gewählte Name "bäckerei-testmann" wird bereits verwendet und wurde geändert auf "backerei-testmann".

Saying that this name already exists and needs to be changed in a name without the Umlaut  - but it is not a previously used name (it is a brand new phantasy name). I think the error message is wrong- the Umlaut just could not be written into the ascii field.

If nobody else is having similar effects it is eventually just my setup - it was upgraded from 2.7x through various steps to the latest one.

By the way, Soma ran into a similar (?) issue once:

https://processwire.com/talk/topic/6688-illegal-mix-of-collations-ascii-general-ciimplicit-and-utf8-general-cicoercible/

Link to comment
Share on other sites

Somas issue is totally different (also over a year old) in that umlauts would need to be stored as umlauts in his case. Pagenames on the other hand are encoded as ascii chars by using the puny code encoding, which is also used for international domain names (IDN). 

Link to comment
Share on other sites

Werner, I tried creating a page with name "bäckerei-testmann", as well as changing an existing page to have that name, but seems to work fine here. Double check that you've followed all the instructions in the blog post, as it sounds like something may potentially be missing. However, the error message you mentioned indicates that maybe there really is a page with that name already in there, perhaps as a temporary one that you created but never saved (i.e. queued for deletion). Experiment with other page names to see if you can duplicate the issue. Of course, double check that your PW version is 3.0.12 as well, as this won't work on earlier versions.

The guys mentioned it above already, but just wanted to repeat that PW always stores page names as ASCII so it's not going to matter what the collation is, and ascii is the correct one that it should have. UTF-8 page names are converted to and from ascii via punycode, just like IDNs.

  • Like 1
Link to comment
Share on other sites

Thanks Ryan, I verified the instructions step by step. I found two things:

- my system does not like to have a whitelist in config.php - it messes up my admin layout and blocks the log-off button. As you wrote it is not necessary to have it there.-so I left it out.

- secondly I tried to create a new page in my TAG area (based on another template than the 'faulty' one) - and it worked.

So what we got:

Template shop:

Session: Achtung, der gewählte Name "ryan-krämer" wird bereits verwendet und wurde geändert auf "ryan-kramer" (..already in use, but I swear I never tried this one before)

Template tag:

Session: Die Seite /tags/ryan-krämer, die das Template tag nutzt, wurde erstellt (so here everything went well- it ended up in the pages table as xn--ryan-krmer-w5a)

So the questions are: why do both templates act differently and what triggers the 'already in use' status of the shop template

Link to comment
Share on other sites

Horst, thank you for putting me one right track.

Actually the difference was in the templates, but I couldn't see it there. There is no 'UTF-8' switch or anything like that in the templates area.

I then went through my modules and found a suspect: it is called PageAutoName , and after turning it off everything was ok. Maybe it happened because this module was not approved for my actual version (3.0.12) or it just can't cope with the extended page name.

Anyhow, this situation shows the crux between being 'lazy' and using these comfortable helpers (modules) and somehow losing control over the process (who is doing what). In that context it would be helpful to see on the template view which modules are influencing the process of this template. Right now you have to go through all modules and look if they are set active for which template ( or is there another way finding that out?).

Thanks all for the patience!

  • Like 2
Link to comment
Share on other sites

Right now you have to go through all modules and look if they are set active for which template ( or is there another way finding that out?).

With Adrian's TracyDebugger module, we get a list of loaded modules in the Debug Mode panel. Maybe this is the tool you are looking for?

  • Like 1
Link to comment
Share on other sites

Also, every time if something weird is going on, having a look into the AdminDebugTools Hook-Section is useful (I believe you can call it from within TracyDebug too).

Here you may spot if multiple / different modules hook into the same methods, or into "nearly" the same methods, what always has potential to interfere. :)

  • Like 2
Link to comment
Share on other sites

For Chinese characters, I just copied the characters I needed to use in a page name and appended them to my $config->pageNameWhitelist, like this:

$config->pageNameCharset = 'UTF8';
$config->pageNameWhitelist = '-_.abcdefghijklmnopqrstuvwxyz0123456789' . 
  'æåäßöüđжхцчшщюяàáâèéëêěìíïîõòóôøùúûůñçčćďĺľńňŕřšťýžабвгдеёзийкл' . 
  'мнопрстуфыэęąśłżź健康長壽·繁榮昌盛';

You'll have to make sure that your /site/config.php file is UTF-8 encoded, which it should be by default. But depending on what editor you are using, it's always possible it's not. 

  • Like 3
Link to comment
Share on other sites

  • 1 year later...
On 4/5/2016 at 3:39 AM, ryan said:

For Chinese characters, I just copied the characters I needed to use in a page name and appended them to my $config->pageNameWhitelist, like this:


$config->pageNameCharset = 'UTF8';
$config->pageNameWhitelist = '-_.abcdefghijklmnopqrstuvwxyz0123456789' . 
  'æåäßöüđжхцчшщюяàáâèéëêěìíïîõòóôøùúûůñçčćďĺľńňŕřšťýžабвгдеёзийкл' . 
  'мнопрстуфыэęąśłżź健康長壽·繁榮昌盛';

You'll have to make sure that your /site/config.php file is UTF-8 encoded, which it should be by default. But depending on what editor you are using, it's always possible it's not. 

For chinese chars, in real case  the page name should be any of chinese chars.

putting used chinese chars on that config parameter is not applicable.

Link to comment
Share on other sites

10 hours ago, Gideon So said:

Hi Adrian,

I found a 5000- word list and copy all the characters to the config.php and don't  forget copy them to the .htaccess file too. But from time to time I need to add more.

Gideon

I'm wondering this is not a good implementation

How to handle utf-8 url on other php framework or CMSs ?  Are they do the same to put thousands of chars on a list ?

Link to comment
Share on other sites

  • 2 weeks later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...