Jump to content

Localised names for languages.


netcarver
 Share

Recommended Posts

I know I'm coming to this late and that Ryan has put a lot of effort into the multilingual features in pw. I'm not sure if this will be of any use or interest for presenting a language-select feature in pw but here's a gist of most ISO 693-1 Languages in their own script and includes direction rendering information for those that render right-to-left. Encoded in utf-8 so should render correctly if you have the necessary code-points installed in a font on your system and it also includes a few regional variations (eg. for Taiwanese vs mainland Chinese.)

I just pulled this out of my, now rather outdated, Multi-Lingual Publishing (MLP) Pack for Textpattern CMS.

Edited by netcarver
  • Like 2
Link to comment
Share on other sites

Thanks for posting that Netcarver. Definitely useful and interesting. Bookmarked and plan to revisit for sure. Also very impressive the work that you did with the MLP pack in Textpattern!

We've kind of skipped over the issue of language names in PW, and are letting people use whatever names they want for their languages (whether iso/codes, names, or something else). Though the <html> lang attribute, and PHP setlocale() codes are set with the translation files. But the intention here was just to broaden the utility of the language capabilities. For instance, one could have a language called "greenhorn" that provides lots of additional details for everything, and another called "coder" that has API/usage notes. There are all kinds of possibilities for using the language capabilities beyond languages themselves. Not that I'm suggesting it's something we need regularly, but just wanted to keep it open ended for when we do.

I've not done anything with regard to right-to-left languages, and don't have a good understanding of what is necessary to support them. The question has come up before. What is your experience in supporting these languages? I'm curious what steps we would need to take (if any?) to properly support them.

Thanks,

Ryan

Link to comment
Share on other sites

We've kind of skipped over the issue of language names in PW, and are letting people use whatever names they want for their languages (whether iso/codes, names, or something else). Though the <html> lang attribute, and PHP setlocale() codes are set with the translation files. But the intention here was just to broaden the utility of the language capabilities. For instance...

Ah, that opens up quite a lot of flexibility that my scheme didn't have without resorting to adding non-standard entries to the array I posted. I can see how that could work well for languages tailored to specific roles or domain knowledge, mainly in the admin interface I would have thought.

BTW, I did things this way in the MLP Pack as there was a lot of utility in combining the codes used for the HTML lang & dir attributes and the localized script names in public facing applications, not just in the admin interface. In public facing multilingual sites I wanted new visitors to be able to quickly locate and switch to their language of choice (if available) which is probably still best done with the local script name rather than something more arbitrary. There are other solutions to presenting such a selection rather than using some string. For instance, some sites adopt the use of flag icons to indicate language choice. I personally think this is a bad idea. Flags intrinsically represent nations, not languages, and there is seldom a one-to-one relationship between them (at least not now) and for residents of the many multilingual nations (like India, Canada, Malaysia, Indonesia, Singapore etc) using flags as language indicators just doesn't make sense. Linux installers take the local script name route too, presenting a big menu of language choices using the names for the language in the native script. That kind of covers the first base: what to call (label) your language choice.

The other base is the use of standard language codes -- which it sounds like you have covered as data in the translation files. Using the standard language codes also allows you to implement automatic language selection. For example, most browsers will submit an Accept-Language request header using 2 character codes. Server side you can pull these codes out of the headers and use them to auto-select the correct (or best) language for your visitor's browser and fall back on a site-wide default as a last resort. Even then you'll probably still need a language selector of some kind that makes sense to the visitor as often the visitor will not have configured (or cannot configure) their Accept-Language headers to correctly identify their preferred language.

Apologies if most of that covers ground you have nailed in PW, just dumping it in case.

I've not done anything with regard to right-to-left languages, and don't have a good understanding of what is necessary to support them. The question has come up before. What is your experience in supporting these languages? I'm curious what steps we would need to take (if any?) to properly support them.

The RTL support in Firefox seems pretty good to me but I can't really speak for other browsers. RTL layout is triggered by marking up an applicable place in your HTML with a dir="rtl" attribute (or "ltr" or "automatic"). You can apply this markup to multiple page elements as required and even nest them. So if you had an English document which quoted some RTL language like Urdu (ur) then you could do something like <body lang="en" dir="ltr"> and then mark all the block quotes with lang="ur" dir="rtl".

I didn't take RTL support any further than this and several RTL language users in the Textpattern community seemed happy enough with it.

Link to comment
Share on other sites

I think for the sites that adopt flags, they do it because it's pretty, looks international, and can catch the eye quicker. Also, maybe there isn't a technical 1-to-1 relationship among many, but the communication is still there (though I'm sure there are exceptions). For something that involves a very large number of languages, then I agree flags lead to inaccurate generalizations. But when I go to a site in another language, I see the US or British flag a lot quicker than I see the word "English", so I tend to have a slight usability preference for the flags (as a user). Even if they aren't perfect, they are a familiar landmark in a sea of unfamiliar language where it's easier to spot landmarks than words. For me the worst cases are when sites use language names in a dropdown selection, and the label on the dropown is not one that I can read to even know what it's for. :)

For example, most browsers will submit an Accept-Language request header using 2 character codes. Server side you can pull these codes out of the headers and use them to auto-select the correct (or best) language for your visitor's browser and fall back on a site-wide default as a last resort.

I'm a little afraid of this just from an SEO standpoint. This clearly violates Google's webmaster guidelines (delivering different textual content at the same URL based on client-side factors), but maybe they have an exception for a case like this? (at least that would make sense)

You can apply this markup to multiple page elements as required and even nest them. So if you had an English document which quoted some RTL language like Urdu (ur) then you could do something like <body lang="en" dir="ltr"> and then mark all the block quotes with lang="ur" dir="rtl".

It sounds like RTL support has more to do with the markup than anything else? Are you aware of any factors on the data storage side?

Thanks for sharing your expertise and experience in these areas -- glad to have you here,

Ryan

Link to comment
Share on other sites

Hi Ryan,

thanks for the reply.

I think for the sites that adopt flags, they do it because it's pretty, looks international, and can catch the eye quicker. Also, maybe there isn't a technical 1-to-1 relationship among many, but the communication is still there (though I'm sure there are exceptions).

For something that involves a very large number of languages, then I agree flags lead to inaccurate generalizations. But when I go to a site in another language, I see the US or British flag a lot quicker than I see the word "English", so I tend to have a slight usability preference for the flags (as a user). Even if they aren't perfect, they are a familiar landmark in a sea of unfamiliar language where it's easier to spot landmarks than words. For me the worst cases are when sites use language names in a dropdown selection, and the label on the dropown is not one that I can read to even know what it's for. :)

Yeah, they certainly draw the eye so I do get this as a positional anchor in unfamiliar text.

I guess I'm pretty biased as I lived in Malaysia for a number of years and there are four commonly used languages there (English, Malay, Chinese and Tamil) but one flag. Yes, you could chose the US or UK flag for English, the Malaysian for Malay, the Chinese for Chinese and the Indian for Tamil but for sites aimed at a domestic audience that would force many viewers to identify with outside nations when the majority of visitors would be Malaysian.

Despite my personal preferences, the use of a consistent markup of language codes as CSS classes gave site designers using the MLP Pack the flexibility to implement a number of alternate selection schemes. In fact, many sites that were made with the MLP Pack simply used CSS to add flag icons to their rendering of the site's language choices. If you keep all your flag icons in a common directory and name them according to the language code it's trivial to do.

I'm a little afraid of this just from an SEO standpoint. This clearly violates Google's webmaster guidelines (delivering different textual content at the same URL based on client-side factors), but maybe they have an exception for a case like this? (at least that would make sense)

Yes, and it also stops your browser caching the incorrect language output for a given URL. The use-case for the above was for when visitors would omit a language code from the URL, if the language code was present they'd always get the same text. Looking back on it, it does violate Google's guidelines for missing codes.

It sounds like RTL support has more to do with the markup than anything else? Are you aware of any factors on the data storage side?

Not aware of any. If I remember correctly, you still treat all your strings internally as left-to-right but the browser just starts over on the right-hand side of the containing block and works over leftward when rendering the strings in rtl blocks.

[Edited to add: I may be remember this wrong as a non RTL language user myself. I'll see if I can chase up some RTL language users from the Textpattern CMS forum and check it out.]

So this php...

$out = "abcdefg";
echo '<div dir="ltr">' . $out . '</div>';
echo '<div dir="rtl">' . $out . '</div>';

...would render text like...

abcdefg

gfedcba

I do remember there being some very strange looking rtl output for strings with punctuation, though I can't give any concrete examples here.

Edited to add: The above example is incorrect. Following further testing (see below) the output is as follows...

abcdefg

abcdefg

and the direction of the string itself seems to depend on the code-points used in the string with the browser automatically rendering them as required.

Edited by netcarver
Link to comment
Share on other sites

Thanks this is good to know. Based on that, it sounds like we probably don't need to do anything for RTL support on the front end, since we don't get involved with markup generation there. But we probably do need to make some updates on the admin side. I'm going to hold off on this until we have someone that's interested in making a language pack that requires RTL, but looking forward to the opportunity.

Link to comment
Share on other sites

Ok, I got some follow up from a helpful RTL language user from the Textpattern forum & things are different to what I initially memory dumped.

I started off by having Google translate the greeting "Hello"[en] into [ar] and got "مرحبا" as the result. My friend confirmed that "م" is the first letter (looks slightly different in the phrase as it joins to the next char) and "ا" is the last character. I then put together a small test page that puts out both the English and Arabic with various combinations of the dir and lang attributes to see what effects these have on the renderings and looked at the output in Firefox and Chromium.

A summary of the results...

  • Both browsers render the test output identically.
  • The dir="xyz" attribute only seems to control the alignment of the output, not the direction in which the string is rendered: the browsers themselves seem to be figuring out the correct way to render a sequence of characters.
  • The lang="xyz" does not change the alignment, nor the directionality of the rendered output but does make a difference to the font selected to do the rendering.
  • It seems that the editor you use to view the source file does make a difference to what you see in the php. On my linux box gedit renders the Arabic as RTL within the quotation marks, whilst Vim renders it LTR.
  • A hexdump of the php shows that the Arabic string is indeed stored in the file with the first character following the opening quote mark and the last character before the close quote. So, whilst vim shows it literally, it looks backward (to an Arabic reader.)

Hope that helps.

Link to comment
Share on other sites

Great info and tests, thanks Netcarver! I'm thinking RTL support should come pretty easily for ProcessWire. One person had asked about Hebrew awhile back, but I didn't know enough at the time to provide a good answer. Now I do. It sounds like you are a VIM user too?

Link to comment
Share on other sites

Great info and tests, thanks Netcarver! I'm thinking RTL support should come pretty easily for ProcessWire. One person had asked about Hebrew awhile back, but I didn't know enough at the time to provide a good answer. Now I do. It sounds like you are a VIM user too?

Glad that helps & yes, I use gvim most of the time.

Link to comment
Share on other sites

I will have to give that a try. But my like of VI/VIM goes well beyond the editor itself. It's the fact that it's available and installed everywhere. I can connect to nearly any server and it's already there. :)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...