Jump to content

Textformatter for local language links


elabx
 Share

Recommended Posts

Hi! Wanted to share this and ask for opinions:

  • I see a lot of regex approach to finding/changing links but the DOMDocument class seems very useful, is there a downside to this?
  • Any other comment to improve this? ? (naming convention, code style?)

 

<?php
/**
 * Textformatter to output local links inside RTA textarea fields, as their language equivalents
 * by Eduardo San Miguel
 *
 * ProcessWire 2.x
 * Copyright (C) 2011 by Ryan Cramer
 * Licensed under GNU/GPL v2, see LICENSE.TXT
 *
 * http://www.processwire.com
 * http://www.ryancramer.com
 *
 */
class TextformatterLanguageLinks extends Textformatter implements Module
{
    
    public static function getModuleInfo()
    {
        return array(
            'title' => "Multilanguage links",
            'version' => "100",
            'summary' => "Textformatter to output local links inside RTA textarea fields, as their language equivalents",
            'author' => "Eduardo San Miguel",
            'requires' => array(
                "PHP>=5.4.0",
                "ProcessWire>=2.5.28"
            ),
            'icon' => 'languages'
        );
    }
    
    public function format(&$str) 
    {
        $dom = new DOMDocument();
        $dom->loadHTML($str);

        $tags = $dom->getElementsByTagName('a');
        foreach ($tags as $tag) {
            $path = wire("sanitizer")->path($tag->getAttribute("href"));
            if($path){
                $possiblePage = wire('pages')->get($path);
            } else{
                continue;
            }
            
            if($possiblePage->id){
                $langPageUrl = $possiblePage->localUrl(wire("user")->language);
            }
            //If empty?
            if($langPageUrl){
                $str = str_replace($tag->getAttribute('href'), $langPageUrl, $str);
            }
        }
    }
    
}

Edit: Already found bugs because I wasn't sanitising the assumed path in the href attributes.

  • Like 2
Link to comment
Share on other sites

So, food for thought in terms of REGEX vs DOMDocument:

Benefits of using REGEX:

  • Essentially faster/more efficient for processing of the data
  • Doesn't care about valid source structure as it's parsing straight text, not XML nodes
  • Implementation is unlikely to change

Detriments of REGEX:

  • Writing a perfect implementation of a REGEX when dealing with HTML to handle all use-cases without experiencing any edge-cases is difficult (might "greedily" match more than intended)
  • It definitely works, but the developer argument is: is it the best (most appropriate) tool for the job?
  • Without a good knowledge of REGEX, harder to understand the underlying code if changes/updates are required


Benefits of using DOMDocument:

  • Written specifically for the purposes of this type of task (searching/modifying the DOM)
  • DOMDocument shouldn't ever be "greedy" over what it matches, like REGEX unintentionally tends to do

Detriments of DOMDocument:

  • May require valid HTML, but with iterations of HTML, what exactly is considered valid? Would different versions of PHP handle the DOM differently with version differences? Potential of implementation changes.
  • loadHTML() may modify your source - what goes in might not be what comes out
  • Character encodings may cause unforeseen issues (don't they always!)
  • Without a good knowledge of PHP's approach to using DOMDocument, the code process can get rather difficult to understand if changes/updates are required

Some further reading from someone else with more thorough testing:
https://blog.futtta.be/2014/05/01/php-html-parsing-performance-shootout-regex-vs-dom/
https://blog.futtta.be/2014/04/17/some-html-dom-parsing-gotchas-in-phps-domdocument/

Realistically it's a judgment call. Speed and server efficiency versus (one would hope) better valid modifications/detections. I don't think there's really a right or wrong solution. Some shared hosting servers don't install the DOMDocument PHP extension by default though, so you'd want to check for the existence of the function during your module's install method.

P.S. - Thanks for asking the question -- I knew DOMDocument was slower, but haven't compared in awhile. The articles I saw above were an interesting read. ?

  • Like 5
  • Thanks 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...