MarcC Posted April 26, 2013 Posted April 26, 2013 I have a client who had a previous site where they were just pasting analytics code (two types) into TinyMCE, so now all of their bodytext fields are kind of polluted with that stuff. What's a good way to remove this--assuming it'd be best just to do it using the PW API? These are script tags bracketed by HTML comments.
Macrura Posted April 27, 2013 Posted April 27, 2013 maybe regex when you run the search/replace with api.. i'm no regex guru, but i did find this...maybe a start /<script.*>.*<script src='.+gaAddons.js' type='.+'><\/script>/s or maybe html dom parser http://simplehtmldom.sourceforge.net/ also - not sure how many pages there are, but in some cases i have used phpMyAdmin with inline mode and it's pretty fast to get through a lot of content
diogo Posted April 27, 2013 Posted April 27, 2013 Find a regexp that matches comments and script tags and remove them with preg_replace() from all the bodytext fields. If you already imported everything to PW, you could do something like: foreach($mypages as $p){ $p->of(true); $p->body = preg_replace($pattern_for_script_tags,'', $p->body); $p->body = preg_replace($pattern_for_html_comments,'', $p->body); $p->save() }
ryan Posted April 29, 2013 Posted April 29, 2013 $body = preg_replace('{<script[^>]*>.*?</script>}is', '', $body); $body = preg_replace('{<!--.*?-->}is', '', $body); The key here is to change the default "greedy" matching to be "lazy" matching using the .* followed by a question mark: .*? That ensures that it will match only to the closest closing tag rather than the [default] furthest one. That way it won't wipe out legitimate copy. Also the "s" at the very end lets it traverse as many lines as needed to complete the match. Without that, it would only match opening and closing tags on the same line. 3
MarcC Posted April 29, 2013 Author Posted April 29, 2013 Awesome, thanks everybody. And thanks for the explanation, Ryan. I'm looking forward to improving my regexing 1
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now