MarcC Posted April 26, 2013 Share Posted April 26, 2013 I have a client who had a previous site where they were just pasting analytics code (two types) into TinyMCE, so now all of their bodytext fields are kind of polluted with that stuff. What's a good way to remove this--assuming it'd be best just to do it using the PW API? These are script tags bracketed by HTML comments. Link to comment Share on other sites More sharing options...
Macrura Posted April 27, 2013 Share Posted April 27, 2013 maybe regex when you run the search/replace with api.. i'm no regex guru, but i did find this...maybe a start /<script.*>.*<script src='.+gaAddons.js' type='.+'><\/script>/s or maybe html dom parser http://simplehtmldom.sourceforge.net/ also - not sure how many pages there are, but in some cases i have used phpMyAdmin with inline mode and it's pretty fast to get through a lot of content Link to comment Share on other sites More sharing options...
diogo Posted April 27, 2013 Share Posted April 27, 2013 Find a regexp that matches comments and script tags and remove them with preg_replace() from all the bodytext fields. If you already imported everything to PW, you could do something like: foreach($mypages as $p){ $p->of(true); $p->body = preg_replace($pattern_for_script_tags,'', $p->body); $p->body = preg_replace($pattern_for_html_comments,'', $p->body); $p->save() } Link to comment Share on other sites More sharing options...
ryan Posted April 29, 2013 Share Posted April 29, 2013 $body = preg_replace('{<script[^>]*>.*?</script>}is', '', $body); $body = preg_replace('{<!--.*?-->}is', '', $body); The key here is to change the default "greedy" matching to be "lazy" matching using the .* followed by a question mark: .*? That ensures that it will match only to the closest closing tag rather than the [default] furthest one. That way it won't wipe out legitimate copy. Also the "s" at the very end lets it traverse as many lines as needed to complete the match. Without that, it would only match opening and closing tags on the same line. 3 Link to comment Share on other sites More sharing options...
MarcC Posted April 29, 2013 Author Share Posted April 29, 2013 Awesome, thanks everybody. And thanks for the explanation, Ryan. I'm looking forward to improving my regexing 1 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now