Jump to content

[SOLVED] Best way to clean up HTML and replace H tags with P


Zeka
 Share

Recommended Posts

Hi.

I'm migrating WP site to PW. There are several thousand of posts with very "dirty" HTML. I'm looking for a good way how automatically remove all unwanted styles and replace all H1-H4 tags with P tag. 

Is there is something that can help in PW? 

Thanks in advance. 

-------------

It seems that HTML Purifier can do it with some additional configuration. 

  • Like 1
Link to comment
Share on other sites

In my experience it depends on how you import the content.

Via Copy&Paste... just paste as plain text or go the extra route and paste before in a text editor of your choice.

While using database import or migrator module you might want to trust the textarea field settings. It takes care of a lot of extra markup already. You might have to iterate via API over all pages to make the clean up happen or push it manually from the settings. You won't get rid of wrong H-tags but most of the other clutter will go.

1959599139_2019-04-0516_28_30.png.bad41e50fcdb53a6a36fc7a3e4a9a58b.png

562668400_2019-04-0516_28_14.png.421a9bbd736e9c030d567ec1793b03d2.png

 

  • Like 1
Link to comment
Share on other sites

@wbmnfktr, @Valery, @pwired

Thanks for suggestions. 

I managed to get the desired result with HTML Purifier. 

$dirty = pages(11896)->archive_wysiwyg;
$purifier = $sanitizer->purifier();
$purifier->set('AutoFormat.RemoveEmpty', true);
$purifier->set('AutoFormat.AutoParagraph', true);
$purifier->set('AutoFormat.RemoveEmpty.RemoveNbsp', true);
$purifier->set('CSS.AllowedProperties', array());
$purifier->set('HTML.ForbiddenAttributes', array('*@class', 'img@width', 'img@height'));
$purifier->set('HTML.ForbiddenElements', array('span', 'strong'));
$settings = $purifier->getConfig();
$def = $settings->getHTMLDefinition();
$def->info_tag_transform['h1'] = new HTMLPurifier_TagTransform_Simple('p');
$def->info_tag_transform['h2'] = new HTMLPurifier_TagTransform_Simple('p');
$def->info_tag_transform['h3'] = new HTMLPurifier_TagTransform_Simple('p');
$def->info_tag_transform['h4'] = new HTMLPurifier_TagTransform_Simple('p');
$def->info_tag_transform['h5'] = new HTMLPurifier_TagTransform_Simple('p');
$def->info_tag_transform['h6'] = new HTMLPurifier_TagTransform_Simple('p');

$clean = $purifier->purify($dirty);
// do something

 

  • Like 6
  • Thanks 2
Link to comment
Share on other sites

$settings = $purifier->getConfig();
$def = $settings->getHTMLDefinition();
$def->info_tag_transform['h1'] = new HTMLPurifier_TagTransform_Simple('p');
$def->info_tag_transform['h2'] = new HTMLPurifier_TagTransform_Simple('p');
$def->info_tag_transform['h3'] = new HTMLPurifier_TagTransform_Simple('p');
$def->info_tag_transform['h4'] = new HTMLPurifier_TagTransform_Simple('p');
$def->info_tag_transform['h5'] = new HTMLPurifier_TagTransform_Simple('p');
$def->info_tag_transform['h6'] = new HTMLPurifier_TagTransform_Simple('p');
$clean = $purifier->purify($dirty);

That part is awesome. Should bookmark this for later migrations.

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...