Jump to content
Peter Falkenberg Brown

TinyMCE converting html entities to characters

Recommended Posts

Hi All,

I've noticed that when I paste HTML text into the HTML code view of the body field, and when that HTML text contains html entities, such as "’", and then click update, PW / TinyMCE converts the entity into a character.

I'm not talking about the primary display view of the body field. What I mean is that when I go *back* into the code view, the entity is gone, and all I see are the characters.

I thought it might be my install, with some misconfiguration, but I also tested it on the PW Skyscraper demo, in the body field, and it did the same thing.

My goal is to paste text from LibreOffice Writer, that has curly quotes, etc, in the body field, and have TinyMCE convert those characters to entities. If that can't be done, I at least want to be able to paste converted HTML text into the code view, and NOT have TinyMCE convert them back, as it seems to be doing.

I've Googled around, and tried various TinyMCE config settings, to no avail.

Any clues?

Thanks!

Peter

Share this post


Link to post
Share on other sites

What settings have you tried and how?

In PW TinyMCE is set to 'entity_encoding: 'raw'. This happens in \wire\modules\Inputfield\InputfieldTinyMCE\InputfieldTinyMCE.js , line 35. This means that All characters will be stored in non-entity form except these XML default entities: & < > " (see http://www.tinymce.com/wiki.php/Configuration:entity_encoding )

Have you tried setting it to 'named'? Mind you, PW being a UTF-8 project i don't think this setting is desirable, but you could (and probably should) pair it with the entities option http://www.tinymce.com/wiki.php/Configuration:entities to make your own list.

For example this is what Drupal is using in it's TinyMCE config (named is default so they don't explicitly set it) :

   // The default entity_encoding ('named') converts too many characters in
    // languages (like Greek). Since Drupal supports Unicode, we only convert
    // HTML control characters and invisible characters. TinyMCE always converts
    // XML default characters '&', '<', '>'.
    'entities' => '160,nbsp,173,shy,8194,ensp,8195,emsp,8201,thinsp,8204,zwnj,8205,zwj,8206,lrm,8207,rlm',

Share this post


Link to post
Share on other sites

Dear SiNNuT,

In the body fields Input settings, under 'Additional TinyMCE Settings', I added this:

apply_source_formatting:true
entity_encoding:named
entities:'169,copy,8482,trade,ndash,8212,mdash,8216,lsquo,8217,rsquo,8220,ldquo,8221,rdquo,8364,euro'

Then, I went to a page and clicked on the Edit HTML Source icon and added this:

It’s not clear if this will work.

When I hit update, it displayed a single curly quote, but when I went back into the HTML source, the ’ code had been replaced by a curly quote.

I understand the concepts: I guess I'm either doing something wrong, or, does the InputfieldTinyMCE.js file need to be edited? Which I realize would be an edit to the core...

Thanks for your help!

Peter

Share this post


Link to post
Share on other sites

I've just tested it on a clean install (from the latest dev version)

Setting:

entity_encoding:named
entities:8212,mdash,8221,rdquo,233,eacute

in 'Additional TinyMCE Settings' works as expected. It replaces only the defined special characters to it's named entity upon opening and editing a page, including working with the html source modal window. It leaves the rest unbothered, so if i for example enter a '€' it will not convert it to the named equivalent.

So you don't need to mess with a 'core' setting in InputfieldTinyMCE.js.

Some reasons why things might not work for you:

  • lose the single quotes around the entities setting, i don't think they should be there
  • you've got two named entities following each other, where it should be a repetition of: odd(numeric),even(named)
  • try clearing your browser cache, re-login and see what that does.

PS i wouldn't bother with the apply_source_formatting setting; it seems to be removed or at least deprecated ( http://www.tinymce.com/wiki.php/Configuration3x:apply_source_formatting )

  • Like 2

Share this post


Link to post
Share on other sites

Dear SiNNuT,

I removed the quotes, and replaced the missing numeric, but it didn't work until I cleared my cache; and then it did.

I'm not sure if it was the quotes or the out of order list, but I'd bet it was the list. I was missing one number.

Thank you! I really appreciate it. I now works like a charm.

You wrote:

Have you tried setting it to 'named'? Mind you, PW being a UTF-8 project i don't think this setting is desirable, but you could (and probably should) pair it with the entities option http://www.tinymce.c...ration:entities to make your own list.

Could you elucidate more about the UTF-8 / named entity issue, and its impact? Would it make a difference if I used "numeric" instead of named?

I always thought that the entities were preferable to raw characters, because someone pasting in an article from a PC might paste characters that weren't viewable on a MAC, and vice versa. I've seen curly quotes look all garbled on web pages, and in emails, so I thought that the entities bypassed the problem.

I use LibreOffice on a Win7 machine, by the way.

Thanks!

Peter

Share this post


Link to post
Share on other sites

Curly qoutes looking garbled has got nothing to do with the fact that it's a raw character or its htmlentity equivalent. I'm not saying that it is very very wrong to store htmlentities in the db but it is just not neccesary, and there are some downsides to it.

http://stackoverflow.com/questions/9299152/do-i-need-to-use-html-entities-when-storing-data-in-the-database

There's a lot more to read if you google.

PS maybe you could mark this thread as solved.

  • Like 1

Share this post


Link to post
Share on other sites

Dear SiNNuT,

By the way, based on your excellent comment that since ProcessWire stores everything as UTF-8, and the link at StackOverflow that you listed, which then led me to this article (from 2003!):

"The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)", by Joel Spolsky

http://www.joelonsoftware.com/articles/Unicode.html

I educamated myself, and said, "Duh", and realized, as you so correctly pointed out, that I don't need the entities anyway.

I just need to make sure I have this line in my templates:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

When I checked, I was already using that line. So, I'm removing my entity declaration from TinyMCE.

Sigh. One learns something every day. One hopes.

Well... better to do it correctly now, than to continue in ignorance. :-)

Addendum: and then one reads an article like this:

http://line25.com/articles/10-html-entity-crimes-you-really-shouldnt-commit

which says that one should use html entities. Lot's of opinions.

But I just checked, and after removing my TinyMCE entity declarations, and purging my cache, and re-logging in,

I was able to paste a copyright symbol in the body text, and it displayed correctly -- no entity needed.

Thank you again, for your help.

Peter

  • Like 1

Share this post


Link to post
Share on other sites

Ah..that article of Joel Spolsky is somewhat of a classic and should be mandatory reading for every (new) (web) developer, who don't know something about this subject.

That other article is a bit misleading i think. It doesn't propagate storing/using htmlentities in the database. It's really about using the right character for the job; and in utf-8 the caracters in the htmlentities are just another character. http://www.fileformat.info/info/charset/UTF-8/list.htm

And remember, the so called htmlspecialchars (crime 1 on that link) are always converted by TinyMCE, even in setting raw: http://www.tinymce.com/wiki.php/Configuration:entity_encoding

So my recommendation: keep TinyMCE setting 'raw'. (If you want you can always use the php htmlentities function on your output, for example in a template file)

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By hellomoto
      I have a Pages field with a PageListSelectMultiple input fieldtype, and I cannot limit it through the "custom PHP code" filter in its settings. I don't want hidden pages to be selectable, because the field is supposed to be to select pages to be included in a frontend menu. How can I do this? and it would be nice to be able to use the custom php code way to filter...
      Also, I cannot install, or activate, rather, TinyMCE. It's claimed to be compatible with 2.5... I install it, and when I try to activate it, without fail, it "activates" CKEditor, which is already activated by default, and installs its files into the /site/modules/ directory. TinyMCE is still in there, just inactivated and useless, being that there's no workaround to activating modules besides from the admin (that I know of, and it should work via the admin anyway). I also tried using BatchChildEditor (supposed to work with 2.5) -- don't seem to do anything when activated. I selected some settings and nada.
      Does anybody have any idea why I might be having these problems? I got all green for all the install compatibility checks...
    • By Thomas108
      Hi everybody,

      I am trying for a while now to display the alt tags as captions from images inserted via tinymce.
      I read this (and a few other) threads: https://processwire.com/talk/topic/1344-captions-for-images-in-tinymce/

      ... and I installed Adrian's tinymce-image-caption,
      but it didn't work at all for me.

      Then I installed Teppo's TextformatterImageWrapper,
      again no change in the frontend.

      Finally I tried Ryan's approach which is also mentioned in the linked thread above.
      https://github.com/ryancramerdesign/FoundationSiteProfile/blob/master/templates/scripts/main.js
      and again no change in the frontend output.

      The last days I felt like I fell in love with processwire, everything went very smooth, but today seems to be not my day.
      I have no clue what's wrong, maybe some pw (2.4.7) or php (5.4.32) version  problem? My template is valid html5.

      I just made a fiddle with ryans approach: http://jsfiddle.net/wqo4fk8o/10/
      Maybe someone can have a look and point me in the right direction?

      Thanks in advance,

      Thomas
    • By Kae
      Hey, guys!!!
      I'm having a very big problem with Tinymce in processwire. I feel so stupid about it, but can't find the solution. Hope someone can help me.
      When I change to html and paste html tags inside a <code> tag, they get converted and disapear. How can I avoid this?
      All Iwanted was to show some code examples. I feel so noob right now '-'
      Thank you =)
    • By gRegor
      Hello,
      I have built a form that allows authenticated members to submit articles through a front-end form. I was under the impression that 2 newlines would automatically be converted to paragraph elements, but I'm pretty sure that's incorrect as I re-acquaint myself with this aspect of PW.**

      Then I was thinking there was a TinyMCE configuration to convert newlines to paragraphs. There is, but I'm pretty sure that applies only during entry into the textarea, not when the text data is first loaded into the TinyMCE textarea.
      First, is there a built-in PW paragraph formatting function that I'm missing/forgetting?
      If not, would the "preferred" method to achieve this be using TinyMCE on the front-end form, so newlines are converted to paragraphs before the form is submitted and the page fields are stored? If so, can someone point me to an example of using the PW API to include a TinyMCE field input?

      Or would the preferred method be to process the raw input to convert newlines to paragraphs (through another module, or custom code) before saving the page fields?

      I'm aware there is an Auto Paragraph TextFormatter module, but I believe that only applies when the data is output — not when the data is shown in backend.

      Thanks for any assistance!

      ** I have confirmed that the newlines are stored in the database, so I've confirmed they're neither being removed nor converted.
    • By landitus
      I'm having a weird issue with images in TinyMCE.
      When adding images to the images field (the default one), I add a description. Then I add the image to the body field via TinyMCE. Then save and the image has the correct alt text. So far so good... But, if I reopen the same image with the image button, the description is gone from the modal window. If I re-save, the image is displayed without the description. Looks like the image it's not retaining the original description value when editing. Is this a bug?
×
×
  • Create New...