elabx Posted August 21, 2018 Share Posted August 21, 2018 Hi! Wanted to share this and ask for opinions: I see a lot of regex approach to finding/changing links but the DOMDocument class seems very useful, is there a downside to this? Any other comment to improve this? ? (naming convention, code style?) <?php /** * Textformatter to output local links inside RTA textarea fields, as their language equivalents * by Eduardo San Miguel * * ProcessWire 2.x * Copyright (C) 2011 by Ryan Cramer * Licensed under GNU/GPL v2, see LICENSE.TXT * * http://www.processwire.com * http://www.ryancramer.com * */ class TextformatterLanguageLinks extends Textformatter implements Module { public static function getModuleInfo() { return array( 'title' => "Multilanguage links", 'version' => "100", 'summary' => "Textformatter to output local links inside RTA textarea fields, as their language equivalents", 'author' => "Eduardo San Miguel", 'requires' => array( "PHP>=5.4.0", "ProcessWire>=2.5.28" ), 'icon' => 'languages' ); } public function format(&$str) { $dom = new DOMDocument(); $dom->loadHTML($str); $tags = $dom->getElementsByTagName('a'); foreach ($tags as $tag) { $path = wire("sanitizer")->path($tag->getAttribute("href")); if($path){ $possiblePage = wire('pages')->get($path); } else{ continue; } if($possiblePage->id){ $langPageUrl = $possiblePage->localUrl(wire("user")->language); } //If empty? if($langPageUrl){ $str = str_replace($tag->getAttribute('href'), $langPageUrl, $str); } } } } Edit: Already found bugs because I wasn't sanitising the assumed path in the href attributes. 2 Link to comment Share on other sites More sharing options...
BrendonKoz Posted August 24, 2018 Share Posted August 24, 2018 So, food for thought in terms of REGEX vs DOMDocument: Benefits of using REGEX: Essentially faster/more efficient for processing of the data Doesn't care about valid source structure as it's parsing straight text, not XML nodes Implementation is unlikely to change Detriments of REGEX: Writing a perfect implementation of a REGEX when dealing with HTML to handle all use-cases without experiencing any edge-cases is difficult (might "greedily" match more than intended) It definitely works, but the developer argument is: is it the best (most appropriate) tool for the job? Without a good knowledge of REGEX, harder to understand the underlying code if changes/updates are required Benefits of using DOMDocument: Written specifically for the purposes of this type of task (searching/modifying the DOM) DOMDocument shouldn't ever be "greedy" over what it matches, like REGEX unintentionally tends to do Detriments of DOMDocument: May require valid HTML, but with iterations of HTML, what exactly is considered valid? Would different versions of PHP handle the DOM differently with version differences? Potential of implementation changes. loadHTML() may modify your source - what goes in might not be what comes out Character encodings may cause unforeseen issues (don't they always!) Without a good knowledge of PHP's approach to using DOMDocument, the code process can get rather difficult to understand if changes/updates are required Some further reading from someone else with more thorough testing:https://blog.futtta.be/2014/05/01/php-html-parsing-performance-shootout-regex-vs-dom/https://blog.futtta.be/2014/04/17/some-html-dom-parsing-gotchas-in-phps-domdocument/ Realistically it's a judgment call. Speed and server efficiency versus (one would hope) better valid modifications/detections. I don't think there's really a right or wrong solution. Some shared hosting servers don't install the DOMDocument PHP extension by default though, so you'd want to check for the existence of the function during your module's install method. P.S. - Thanks for asking the question -- I knew DOMDocument was slower, but haven't compared in awhile. The articles I saw above were an interesting read. ? 5 1 Link to comment Share on other sites More sharing options...
elabx Posted August 29, 2018 Author Share Posted August 29, 2018 Wow! Amazing educational answer, thanks a lot, this kind of conversation really helps anyone move forward, thanks for your dedication! 2 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now