apeisa Posted February 23, 2011 Share Posted February 23, 2011 I have few additions for $sanitizier->url(). Currently it allows values like "I am url" -> "iamurl" and "www.url.com" -> "www.url.com". I made variable that allows us to drop relative urls, so these come always as: "" and "http://www.url.com" Of course this could be much better, but this suits my needs and I think many times needed: public function url($value, $allowRelative = true) { if(!strlen($value)) return ''; if(!strpos($value, '://')) { // URL is missing protocol, or is local/relative $dotPos = strpos($value, "."); $slashPos = strpos($value, "/"); if($dotPos !== false) { // something like: www.company.com/about or company.com $value = "http://$value"; } else if($dotPos === false) { if (!$allowRelative) { // We don't allow relative urls, so return blank $value = ''; } else { // relative URL like: /about/ or about/ // leave it alone } } } $value = filter_var($value, FILTER_SANITIZE_URL); return $value ? $value : ''; } Ryan, please code check this and make needed corrections. Also: what would be best way to contribute in code wise? Through GitHub? (I am posting this here now for just this reason). Link to comment Share on other sites More sharing options...
ryan Posted February 23, 2011 Share Posted February 23, 2011 The only issue I see is that dots can be in page names and filenames. So that leaves the question of whether "company.com" or "sitemap.xml" is a domain name or a relative path/file... This is a problem in the existing url() function too, I'm just not sure how to solve it. I think I'll err on the side of assuming a domain name if the path doesn't start with a ".", like "./sitemap.xml" or "../../sitemap.xml". I like your addition of the allowRelative option. GitHub is great, or forum and/or email is fine too. Whatever you prefer. Thanks, Ryan Link to comment Share on other sites More sharing options...
ryan Posted February 23, 2011 Share Posted February 23, 2011 I've got to do more testing, but here's the solution I came up with that I think accomplishes what you want. I added an extra path() function to the Sanitizer class, to handle the relative URLs. Also, the class file is attached (in a ZIP) if you want to try it. <?php /** * Return the given path if valid, or blank if not. * * Path is validated per ProcessWire "name" convention of ascii only [-_./a-z0-9] * As a result, this function is primarily useful for validating ProcessWire paths, * and won't always work with paths outside ProcessWire. * * @param string $value Path * */ public function path($value) { if(!preg_match('{^[-_./a-z0-9]+$}iD', $value)) return ''; if(strpos($value, '/./') !== false || strpos($value, '//') !== false) $value = ''; return $value; } /** * Returns a valid URL, or blank if it can't be made valid * * Performs some basic sanitization like adding a protocol to the front if it's missing, but leaves alone local/relative URLs. * * URL is not required to confirm to ProcessWire conventions unless a relative path is given. * * Please note that URLs should always be entity encoded in your output. <script> is technically allowed in a valid URL, so * your output should always entity encoded any URLs that came from user input. * * @param string $value URL * @param bool $allowRelative Whether to allow relative URLs * @return string * @todo add TLD validation * */ public function url($value, $allowRelative = true) { if(!strlen($value)) return ''; // this filter_var sanitizer just removes invalid characters that don't appear in domains or paths $value = filter_var($value, FILTER_SANITIZE_URL); if(!strpos($value, ".") && $allowRelative) { // if there's no dot (or it's in position 0) and relative paths are allowed, // we can safely assume this is a relative path. // relative paths must follow ProcessWire convention of ascii-only, // so they are passed through the $sanitizer->path() function. return $this->path($value); } if(!strpos($value, '://')) { // URL is missing protocol, or is local/relative if($allowRelative) { // determine if this is a domain name // regex legend: (www.)? company. com ( .uk or / or : or # or end) if(preg_match('{^([^\s_.]+\.)?[^-_\s.][^\s_.]+\.([a-z]{2,6})([./:#]|$)}i', $value, $matches)) { // most likely a domain name // $tld = $matches[3]; // TODO add TLD validation to confirm it's a domain name $value = filter_var("http://$value", FILTER_VALIDATE_URL); } else { // most likely a relative path $value = $this->path($value); } } else { // relative urls aren't allowed, so add the protocol and validate $value = filter_var("http://$value", FILTER_VALIDATE_URL); } } return $value ? $value : ''; } Let me know if you think anything is missing here? I tried to duplicate what you added, and also account for the relative paths vs. domain issue. Thanks, Ryan Sanitizer-php.zip Link to comment Share on other sites More sharing options...
apeisa Posted February 23, 2011 Author Share Posted February 23, 2011 Big thanks! Looks great and I will test this tomorrow at work. Link to comment Share on other sites More sharing options...
apeisa Posted February 24, 2011 Author Share Posted February 24, 2011 It works. I only tested with allowRelative=false, but that side seems to work nicely! Thank you. Link to comment Share on other sites More sharing options...
ryan Posted February 24, 2011 Share Posted February 24, 2011 Great, thanks. I will test some more here and then commit it. Good idea about adding the allowRelative option! Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now