Zeka Posted April 5, 2019 Share Posted April 5, 2019 Hi. I'm migrating WP site to PW. There are several thousand of posts with very "dirty" HTML. I'm looking for a good way how automatically remove all unwanted styles and replace all H1-H4 tags with P tag. Is there is something that can help in PW? Thanks in advance. ------------- It seems that HTML Purifier can do it with some additional configuration. 1 Link to comment Share on other sites More sharing options...
wbmnfktr Posted April 5, 2019 Share Posted April 5, 2019 In my experience it depends on how you import the content. Via Copy&Paste... just paste as plain text or go the extra route and paste before in a text editor of your choice. While using database import or migrator module you might want to trust the textarea field settings. It takes care of a lot of extra markup already. You might have to iterate via API over all pages to make the clean up happen or push it manually from the settings. You won't get rid of wrong H-tags but most of the other clutter will go. 1 Link to comment Share on other sites More sharing options...
Valery Posted April 5, 2019 Share Posted April 5, 2019 Hey there! While not a part of PW, you might want to check out htmLawed, an advanced HTML filter/purifier. 1 Link to comment Share on other sites More sharing options...
pwired Posted April 5, 2019 Share Posted April 5, 2019 Quote posts with very "dirty" HTML A lot of "dirty" HTML is generated by those fancy visual editors for wp. Try to uninstall them or maybe there is a cleaner plugin for that purpose. 1 Link to comment Share on other sites More sharing options...
Zeka Posted April 5, 2019 Author Share Posted April 5, 2019 @wbmnfktr, @Valery, @pwired Thanks for suggestions. I managed to get the desired result with HTML Purifier. $dirty = pages(11896)->archive_wysiwyg; $purifier = $sanitizer->purifier(); $purifier->set('AutoFormat.RemoveEmpty', true); $purifier->set('AutoFormat.AutoParagraph', true); $purifier->set('AutoFormat.RemoveEmpty.RemoveNbsp', true); $purifier->set('CSS.AllowedProperties', array()); $purifier->set('HTML.ForbiddenAttributes', array('*@class', 'img@width', 'img@height')); $purifier->set('HTML.ForbiddenElements', array('span', 'strong')); $settings = $purifier->getConfig(); $def = $settings->getHTMLDefinition(); $def->info_tag_transform['h1'] = new HTMLPurifier_TagTransform_Simple('p'); $def->info_tag_transform['h2'] = new HTMLPurifier_TagTransform_Simple('p'); $def->info_tag_transform['h3'] = new HTMLPurifier_TagTransform_Simple('p'); $def->info_tag_transform['h4'] = new HTMLPurifier_TagTransform_Simple('p'); $def->info_tag_transform['h5'] = new HTMLPurifier_TagTransform_Simple('p'); $def->info_tag_transform['h6'] = new HTMLPurifier_TagTransform_Simple('p'); $clean = $purifier->purify($dirty); // do something 7 2 Link to comment Share on other sites More sharing options...
wbmnfktr Posted April 5, 2019 Share Posted April 5, 2019 $settings = $purifier->getConfig(); $def = $settings->getHTMLDefinition(); $def->info_tag_transform['h1'] = new HTMLPurifier_TagTransform_Simple('p'); $def->info_tag_transform['h2'] = new HTMLPurifier_TagTransform_Simple('p'); $def->info_tag_transform['h3'] = new HTMLPurifier_TagTransform_Simple('p'); $def->info_tag_transform['h4'] = new HTMLPurifier_TagTransform_Simple('p'); $def->info_tag_transform['h5'] = new HTMLPurifier_TagTransform_Simple('p'); $def->info_tag_transform['h6'] = new HTMLPurifier_TagTransform_Simple('p'); $clean = $purifier->purify($dirty); That part is awesome. Should bookmark this for later migrations. 1 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now