Jump to content

TinyMCE converting html entities to characters


Recommended Posts

Hi All,

I've noticed that when I paste HTML text into the HTML code view of the body field, and when that HTML text contains html entities, such as "’", and then click update, PW / TinyMCE converts the entity into a character.

I'm not talking about the primary display view of the body field. What I mean is that when I go *back* into the code view, the entity is gone, and all I see are the characters.

I thought it might be my install, with some misconfiguration, but I also tested it on the PW Skyscraper demo, in the body field, and it did the same thing.

My goal is to paste text from LibreOffice Writer, that has curly quotes, etc, in the body field, and have TinyMCE convert those characters to entities. If that can't be done, I at least want to be able to paste converted HTML text into the code view, and NOT have TinyMCE convert them back, as it seems to be doing.

I've Googled around, and tried various TinyMCE config settings, to no avail.

Any clues?

Thanks!

Peter

Link to comment
Share on other sites

What settings have you tried and how?

In PW TinyMCE is set to 'entity_encoding: 'raw'. This happens in \wire\modules\Inputfield\InputfieldTinyMCE\InputfieldTinyMCE.js , line 35. This means that All characters will be stored in non-entity form except these XML default entities: & < > " (see http://www.tinymce.com/wiki.php/Configuration:entity_encoding )

Have you tried setting it to 'named'? Mind you, PW being a UTF-8 project i don't think this setting is desirable, but you could (and probably should) pair it with the entities option http://www.tinymce.com/wiki.php/Configuration:entities to make your own list.

For example this is what Drupal is using in it's TinyMCE config (named is default so they don't explicitly set it) :

   // The default entity_encoding ('named') converts too many characters in
    // languages (like Greek). Since Drupal supports Unicode, we only convert
    // HTML control characters and invisible characters. TinyMCE always converts
    // XML default characters '&', '<', '>'.
    'entities' => '160,nbsp,173,shy,8194,ensp,8195,emsp,8201,thinsp,8204,zwnj,8205,zwj,8206,lrm,8207,rlm',
Link to comment
Share on other sites

Dear SiNNuT,

In the body fields Input settings, under 'Additional TinyMCE Settings', I added this:

apply_source_formatting:true
entity_encoding:named
entities:'169,copy,8482,trade,ndash,8212,mdash,8216,lsquo,8217,rsquo,8220,ldquo,8221,rdquo,8364,euro'

Then, I went to a page and clicked on the Edit HTML Source icon and added this:

It’s not clear if this will work.

When I hit update, it displayed a single curly quote, but when I went back into the HTML source, the ’ code had been replaced by a curly quote.

I understand the concepts: I guess I'm either doing something wrong, or, does the InputfieldTinyMCE.js file need to be edited? Which I realize would be an edit to the core...

Thanks for your help!

Peter

Link to comment
Share on other sites

I've just tested it on a clean install (from the latest dev version)

Setting:

entity_encoding:named
entities:8212,mdash,8221,rdquo,233,eacute

in 'Additional TinyMCE Settings' works as expected. It replaces only the defined special characters to it's named entity upon opening and editing a page, including working with the html source modal window. It leaves the rest unbothered, so if i for example enter a '€' it will not convert it to the named equivalent.

So you don't need to mess with a 'core' setting in InputfieldTinyMCE.js.

Some reasons why things might not work for you:

  • lose the single quotes around the entities setting, i don't think they should be there
  • you've got two named entities following each other, where it should be a repetition of: odd(numeric),even(named)
  • try clearing your browser cache, re-login and see what that does.

PS i wouldn't bother with the apply_source_formatting setting; it seems to be removed or at least deprecated ( http://www.tinymce.com/wiki.php/Configuration3x:apply_source_formatting )

  • Like 2
Link to comment
Share on other sites

Dear SiNNuT,

I removed the quotes, and replaced the missing numeric, but it didn't work until I cleared my cache; and then it did.

I'm not sure if it was the quotes or the out of order list, but I'd bet it was the list. I was missing one number.

Thank you! I really appreciate it. I now works like a charm.

You wrote:

Have you tried setting it to 'named'? Mind you, PW being a UTF-8 project i don't think this setting is desirable, but you could (and probably should) pair it with the entities option http://www.tinymce.c...ration:entities to make your own list.

Could you elucidate more about the UTF-8 / named entity issue, and its impact? Would it make a difference if I used "numeric" instead of named?

I always thought that the entities were preferable to raw characters, because someone pasting in an article from a PC might paste characters that weren't viewable on a MAC, and vice versa. I've seen curly quotes look all garbled on web pages, and in emails, so I thought that the entities bypassed the problem.

I use LibreOffice on a Win7 machine, by the way.

Thanks!

Peter

Link to comment
Share on other sites

Curly qoutes looking garbled has got nothing to do with the fact that it's a raw character or its htmlentity equivalent. I'm not saying that it is very very wrong to store htmlentities in the db but it is just not neccesary, and there are some downsides to it.

http://stackoverflow.com/questions/9299152/do-i-need-to-use-html-entities-when-storing-data-in-the-database

There's a lot more to read if you google.

PS maybe you could mark this thread as solved.

  • Like 1
Link to comment
Share on other sites

Dear SiNNuT,

By the way, based on your excellent comment that since ProcessWire stores everything as UTF-8, and the link at StackOverflow that you listed, which then led me to this article (from 2003!):

"The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)", by Joel Spolsky

http://www.joelonsoftware.com/articles/Unicode.html

I educamated myself, and said, "Duh", and realized, as you so correctly pointed out, that I don't need the entities anyway.

I just need to make sure I have this line in my templates:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

When I checked, I was already using that line. So, I'm removing my entity declaration from TinyMCE.

Sigh. One learns something every day. One hopes.

Well... better to do it correctly now, than to continue in ignorance. :-)

Addendum: and then one reads an article like this:

http://line25.com/articles/10-html-entity-crimes-you-really-shouldnt-commit

which says that one should use html entities. Lot's of opinions.

But I just checked, and after removing my TinyMCE entity declarations, and purging my cache, and re-logging in,

I was able to paste a copyright symbol in the body text, and it displayed correctly -- no entity needed.

Thank you again, for your help.

Peter

  • Like 1
Link to comment
Share on other sites

Ah..that article of Joel Spolsky is somewhat of a classic and should be mandatory reading for every (new) (web) developer, who don't know something about this subject.

That other article is a bit misleading i think. It doesn't propagate storing/using htmlentities in the database. It's really about using the right character for the job; and in utf-8 the caracters in the htmlentities are just another character. http://www.fileformat.info/info/charset/UTF-8/list.htm

And remember, the so called htmlspecialchars (crime 1 on that link) are always converted by TinyMCE, even in setting raw: http://www.tinymce.com/wiki.php/Configuration:entity_encoding

So my recommendation: keep TinyMCE setting 'raw'. (If you want you can always use the php htmlentities function on your output, for example in a template file)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...