Jump to content

CKEditor: HTML Purifier


patrick
 Share

Recommended Posts

Using the HTMLPurifier demo page, if you change the defined Doctype to either of the standard HTML Doctypes, the BR tag will not use the XHTML formatted version. That said, there may be other unforeseen consequences in making this change. If this is simply an aesthetic preference, I would personally recommend leaving it alone. If there's another reason for it, I'm thinking that a call to str_replace on your data would be far simpler.

If you do decide to customize HTMLPurifier, a good place to start for examples would be here and here.

Here's an example set call that would change the Doctype to HTML Transitional, adjusted from the PW API Documentation example:

$wireData = $markupHTMLPurifier->set('HTML.Doctype', 'HTML 4.01 Transitional');

 

  • Like 2
Link to comment
Share on other sites

In the context of CKEditor the <br /> tag is probably caused by the CKEditor settings rather than HTML Purifier.

Using this answer as a reference, you could put the following in /site/modules/InputfieldCKEditor/config.js:

// When CKEditor instance ready
CKEDITOR.on('instanceReady', function(event) {
	// Output self-closing tags the HTML5 way, like <br>
	event.editor.dataProcessor.writer.selfClosingEnd = '>';
});

 

  • Like 2
Link to comment
Share on other sites

1 hour ago, Robin S said:

In the context of CKEditor the <br /> tag is probably caused by the CKEditor settings rather than HTML Purifier.

Using this answer as a reference, you could put the following in /site/modules/InputfieldCKEditor/config.js:

// When CKEditor instance ready
CKEDITOR.on('instanceReady', function(event) {
	// Output self-closing tags the HTML5 way, like <br>
	event.editor.dataProcessor.writer.selfClosingEnd = '>';
});

 

Hi Robin

Thanks for your answer. I tried it.

With the above solution in the CKEditor config, looking at the source of CKEditor br tags are shown as <br>, but unfortunately are still saved in the database as <br />

Link to comment
Share on other sites

2 hours ago, BrendonKoz said:

Using the HTMLPurifier demo page, if you change the defined Doctype to either of the standard HTML Doctypes, the BR tag will not use the XHTML formatted version. That said, there may be other unforeseen consequences in making this change. If this is simply an aesthetic preference, I would personally recommend leaving it alone. If there's another reason for it, I'm thinking that a call to str_replace on your data would be far simpler.

If you do decide to customize HTMLPurifier, a good place to start for examples would be here and here.

Here's an example set call that would change the Doctype to HTML Transitional, adjusted from the PW API Documentation example:

$wireData = $markupHTMLPurifier->set('HTML.Doctype', 'HTML 4.01 Transitional');

 

Hi Brendon

Thanks for your answer.

The reason is to make the source validate as html5 at w3c. I was thinking about str_replace too, but was wondering, if there is a nicer way to save it already in the database as html5.

I tried the following in the admin.php:

$wire->addHookAfter('MarkupHTMLPurifier::initConfig', function(HookEvent $event) {
    $def = $event->arguments(1);
    $this->settings->set('HTML.Doctype', 'HTML 4.01 Transitional');
});

but that doesn't seem to do the trick ?

Link to comment
Share on other sites

On 10/10/2022 at 9:00 PM, patrick said:

I tried the following in the admin.php:

$wire->addHookAfter('MarkupHTMLPurifier::initConfig', function(HookEvent $event) {
    $def = $event->arguments(1);
    $this->settings->set('HTML.Doctype', 'HTML 4.01 Transitional');
});

but that doesn't seem to do the trick

@patrick, try this:

$wire->addHookAfter('MarkupHTMLPurifier::initConfig', function(HookEvent $event) {
	$settings = $event->arguments(0);
	$settings->set('HTML.Doctype', 'HTML 4.01 Transitional');
});

For this to take effect you'll also need to clear the HTML Purifier cache which you can do by executing the following once (the Tracy Debugger console is useful for this sort of thing): 

$purifier = new MarkupHTMLPurifier();
$purifier->clearCache();

 

  • Like 1
Link to comment
Share on other sites

1 hour ago, Robin S said:

@patrick, try this:

$wire->addHookAfter('MarkupHTMLPurifier::initConfig', function(HookEvent $event) {
	$settings = $event->arguments(0);
	$settings->set('HTML.Doctype', 'HTML 4.01 Transitional');
});

For this to take effect you'll also need to clear the HTML Purifier cache which you can do by executing the following once (the Tracy Debugger console is useful for this sort of thing): 

$purifier = new MarkupHTMLPurifier();
$purifier->clearCache();

 

Hi Robin

Cool, this is working ?. Many thanks for taking the time!

The only problem now: due to the html4 value other elements are getting converted, for example <figure> gets deleted (what shouldn't happen). Hopefully there will be a HTML.Doctype html5 in the future ?.

Have a nice day and Greetings from Switzerland to New Zealand
Patrick

  • Like 1
Link to comment
Share on other sites

Unfortunately from discussions I've seen, it's either unlikely that HTMLPurifier will be implementing a supported HTML5 Doctype, or it will be quite awhile before one arrives. They're taking contributions, but the amount of work (and surrounding understanding) seems daunting. The removal of unsupported elements would be part of the unforeseen consequences I mentioned. I'd still recommend using a call to str_replace instead if you do decide to stick with this.

That said, using <br/> or <br /> is not invalid, and is allowed. You're seeing info messages in the W3C validator service, not notices or warning messages. It's because HTML5 doesn't require (like XHTML did) that element attribute values are contained within quoted strings, and if an element using unquoted values ended with a trailing slash and no word boundary, it could cause confusion in the browser. See their example for more info:

https://github.com/validator/validator/wiki/Markup-»-Void-elements#trailing-slashes-directly-preceded-by-unquoted-attribute-values

This was a great exercise in how to customize the HTMLPurifier and CKEditor components, but I'd personally recommend, at least in this instance, not making that change. That said, it's completely up to you, and you have a working solution for your target goal now!

  • Like 1
Link to comment
Share on other sites

On 10/11/2022 at 11:23 PM, patrick said:

The only problem now: due to the html4 value other elements are getting converted, for example <figure> gets deleted (what shouldn't happen).

One more option...

You could copy MarkupHTMLPurifier from /wire/modules/Markup/ to /site/modules/ and then select it as the copy you want to use.

Then edit HTMLPurifier.standalone.php to replace this code with:

return '<' . $token->name . ($attr ? ' ' : '') . $attr . '>';

Seems to solve the slash issue without affecting HTML5 elements like <figure>

  • Like 1
Link to comment
Share on other sites

6 hours ago, Robin S said:

One more option...

You could copy MarkupHTMLPurifier from /wire/modules/Markup/ to /site/modules/ and then select it as the copy you want to use.

Then edit HTMLPurifier.standalone.php to replace this code with:

return '<' . $token->name . ($attr ? ' ' : '') . $attr . '>';

Seems to solve the slash issue without affecting HTML5 elements like <figure>

Hi Robin

Thanks a lot for your answer.

I copied MarkupHTMLPurifier to /site/modules/ and made the changes. I like this approach and I will stick with your solution!

Thanks again and have a nice day!

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...