Jump to content

htmLawed: flexible HTML cleaner and sanitizer


Valery
 Share

Recommended Posts

Hello!

I've been migrating a Joomla site to PW. As I was at it, I could not help notice that the text of the articles I was importing into PW was messy.  The site owner confessed that he simply pasted his text from MS Word into Joomla's editor, which resulted in very dirty HTML full of inline CSS and custom class names.

So this got me thinking of tools I could use to clean this dirty HTML. After some search I stumbled across a handy PHP tool named htmLawed developed by Santosh Patnaik.

Quote

htmLawed is a PHP script to process text with HTML markup to make it more compliant with HTML standards and with administrative policies. It works by making HTML well-formed with balanced and properly nested tags, neutralizing code that introduces a security vulnerability or is used for cross-site scripting (XSS) attacks, allowing only specified HTML tags and attributes, and so on. Such lawing in of HTML code ensures that it is in accordance with the aesthetics, safety and usability requirements set by administrators.

  htmLawed is highly customizable, and fast with low memory usage. Its free and open-source code is in one small file. It does not require extensions or libraries, and works in older versions of PHP as well. It is a good alternative to the HTML Tidy application.

 

What I liked about htmLawed was its flexibility in filtering and cleaning HTML.  My goal was to strip off the style attributes from <p>'s in my HTML, which was way below htmLawed's capabilities.  It can do a lot more -- reading the list of features got me hooked up!

The htmLawed download contains just two files: the software itself, and a script to measure performance. Including htmLawed in a PW template is a matter of one include(), and after that you are ready to go.

Using any HTML formatter adds a certain memory footprint, so using htmLawed for outputting each and every piece of HTML might not be the best idea; but nevertheless it's a handy and flexible tool for sanitizing and beautifying any HTML before saving it in your PW page.

 

htmLawed is free, well-documented and flexible. Give it a try if you feel you need more control over HTML that your users post.

http://www.bioinformatics.org/phplabware/internal_utilities/htmLawed/

  • Like 4
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...