Jump to content

Textformatter to convert www.domain.com to hyperlinks


apeisa
 Share

Recommended Posts

I build simple textformatter to convert text like www.something.com or www.something.com/site/index.html into:

<a href='http://www,something.com'>www.something.com</a>

Please test and comment (my regexp skills aren't that great...).

<?php
class TextformatterConvertLinks extends Textformatter implements Module {
public static function getModuleInfo() {
 return array(
  'title' => 'Text links to anchors',
  'version' => 100,
  'summary' =>
   "Convert text links like www.domain.com to hyperlinks",
  );
}

public function format(&$str) {
 if(strpos($str, 'www.') !== false) {
  if(preg_match_all('#\s*(www\.\S+\.\w{2,4}(\.|\,|\s+|/\S*))#', $str, $matches)) {
   foreach($matches[0] as $key => $line) {
 $url = $matches[1][$key];
 $endChar = '';
 $lastChar  = substr($url, -1);
 if ($lastChar === "." || $lastChar === ",") {
  $url = substr($url, 0, strlen($url) - 1);
  $endChar = $lastChar;
 }
 $str = str_replace($line, " <a href='http://". $url ."'>". trim($url) ."</a>{$endChar} ", $str);
   }
  }
 }
}
public function ___install() {
}
public function ___uninstall() {
}
}
  • Like 1
Link to comment
Share on other sites

I like the idea of this, but one potential issue: not all URLs begin with 'www'. For instance, you can't access processwire.com from www.processwire.com (it'll redirect you). So the proper way to link processwire.com is just http://processwire.com. Another example would be http://store.di.net, which is something different from www.di.net.

I think what would be better is to have it auto-link URLs that start with a http:// or https:// on the front. That way there's not much chance of it autolinking things that it shouldn't. The regex would have to check that the http:// doesn't have a quote or equals sign in front of it (indicating an already-linked URL). This could be done by checking to make sure the http:// is either at the beginning of the source string (no characters preceding http://) or the preceding character is [^\w"\'] meaning not a word character, a double quote, a single quote, or an equals sign. I think that could be placed in a lookbehind to avoid including the preceding character in the match.

http://www.regular-expressions.info/lookaround.html

--

edit: looks like the forum has troubles with it's URL matching :)

  • Like 1
Link to comment
Share on other sites

I like Ryan's idea. I have a dislike of situations where the same address is repeatedly auto-linked, far better to let the author decide which of the links in their writing should be linked and Ryan's suggestion would take care of that.

Could your module be given an option to also hide the scheme from the linked text? So "http://processwire.com" would become

<a href="http://processwire.com">processwire.com</a>
Link to comment
Share on other sites

I did think about adding http:// and want to add that. But client wants and expects in this case is actually urls that start with www., so I implemented that first. But I do agree that http:// would be nice addition. I don't think there is real danger to have strings that starts with www. then have some chars, dot and 2-4 chars and not meaning to have url?

I think typing http:// might be more annoyance for many non tech savvy users than using link tool?

Link to comment
Share on other sites

I think typing http:// might be more annoyance for many non tech savvy users than using link tool?

Isn't this kind of a standard though? When I type out a text-based email message, I don't expect the email client to auto-link it unless I put an http:// or https:// in front of it. I suppose it depends on the email client.

There's not any way tell for certain if something is a URL if it doesn't have a scheme. Consider the forum user here named aw.be (I'm curious to see if IPB links that). :) But if it comes down to client-specific stuff, then of course it's safe to do whatever the particular client need is. I'm just speaking on general non-client-specific terms.

I would agree though that it's a fairly safe bet to link any text that starts with "www." and is otherwise consistent with the format of a URL. But it does also create an expectation of URLs being auto-linked… and once you take out the "www." there's not a sure way of telling if something is meant to be a hostname or not. Ultimately the scheme is what tells us if it's a URL or not. Otherwise it could just as easily be a filename, a page name, username and any number of other things.

Link to comment
Share on other sites

I have to disagree with this. The services and software that most people use do convert www.something.fi to links (Facebook, IM clients, MS Word, Google Docs etc.) and I think it is pretty much expected behavior for many. And I did fair share of testing those and it is the www. or http:// that makes the link on those occasions. something.fi is not link, but www.something.fi or http://something.fi are.

(and as an expection to the rule, IP.Board doesn't think www.something.fi is a link)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...