Jump to content

Recommended Posts

Posted

I have few additions for $sanitizier->url().

Currently it allows values like "I am url" -> "iamurl" and "www.url.com" -> "www.url.com". I made variable that allows us to drop relative urls, so these come always as: "" and "http://www.url.com"

Of course this could be much better, but this suits my needs and I think many times needed:

public function url($value, $allowRelative = true) {

	if(!strlen($value)) return '';

	if(!strpos($value, '://')) {
		// URL is missing protocol, or is local/relative

		$dotPos = strpos($value, "."); 
		$slashPos = strpos($value, "/"); 

		if($dotPos !== false) {
			// something like: www.company.com/about or company.com
			$value = "http://$value";

		} else if($dotPos === false) {
			if (!$allowRelative) {
				// We don't allow relative urls, so return blank
				$value = '';
			} else {
				// relative URL like: /about/ or about/
				// leave it alone
			}
		}
	}

	$value = filter_var($value, FILTER_SANITIZE_URL); 
	return $value ? $value : '';
}

Ryan, please code check this and make needed corrections. Also: what would be best way to contribute in code wise? Through GitHub? (I am posting this here now for just this reason).

Posted

The only issue I see is that dots can be in page names and filenames. So that leaves the question of whether "company.com" or "sitemap.xml" is a domain name or a relative path/file... This is a problem in the existing url() function too, I'm just not sure how to solve it. I think I'll err on the side of assuming a domain name if the path doesn't start with a ".", like "./sitemap.xml" or "../../sitemap.xml".  I like your addition of the allowRelative option.

GitHub is great, or forum and/or email is fine too. Whatever you prefer.

Thanks,

Ryan

Posted

I've got to do more testing, but here's the solution I came up with that I think accomplishes what you want. I added an extra path() function to the Sanitizer class, to handle the relative URLs. Also, the class file is attached (in a ZIP) if you want to try it.

<?php
/**
* Return the given path if valid, or blank if not. 
*
* Path is validated per ProcessWire "name" convention of ascii only [-_./a-z0-9]
* As a result, this function is primarily useful for validating ProcessWire paths,
* and won't always work with paths outside ProcessWire. 
*
* @param string $value Path 
*
*/
public function path($value) {
if(!preg_match('{^[-_./a-z0-9]+$}iD', $value)) return '';
if(strpos($value, '/./') !== false || strpos($value, '//') !== false) $value = '';
return $value;
}

/**
* Returns a valid URL, or blank if it can't be made valid 
*
* Performs some basic sanitization like adding a protocol to the front if it's missing, but leaves alone local/relative URLs. 
*
* URL is not required to confirm to ProcessWire conventions unless a relative path is given.
*
* Please note that URLs should always be entity encoded in your output. <script> is technically allowed in a valid URL, so 
* your output should always entity encoded any URLs that came from user input. 
*
* @param string $value URL
* @param bool $allowRelative Whether to allow relative URLs
* @return string
* @todo add TLD validation
*
*/
public function url($value, $allowRelative = true) {

if(!strlen($value)) return '';

// this filter_var sanitizer just removes invalid characters that don't appear in domains or paths
$value = filter_var($value, FILTER_SANITIZE_URL);

if(!strpos($value, ".") && $allowRelative) {
	// if there's no dot (or it's in position 0) and relative paths are allowed, 
	// we can safely assume this is a relative path.
	// relative paths must follow ProcessWire convention of ascii-only, 
	// so they are passed through the $sanitizer->path() function.
	return $this->path($value); 
}

if(!strpos($value, '://')) {
	// URL is missing protocol, or is local/relative

	if($allowRelative) {
		// determine if this is a domain name 
		// regex legend:       (www.)?      company.         com       ( .uk or / or : or # or end)
		if(preg_match('{^([^\s_.]+\.)?[^-_\s.][^\s_.]+\.([a-z]{2,6})([./:#]|$)}i', $value, $matches)) {
			// most likely a domain name
			// $tld = $matches[3]; // TODO add TLD validation to confirm it's a domain name
			$value = filter_var("http://$value", FILTER_VALIDATE_URL); 

		} else {
			// most likely a relative path
			$value = $this->path($value); 
		}

	} else {
		// relative urls aren't allowed, so add the protocol and validate
		$value = filter_var("http://$value", FILTER_VALIDATE_URL);
	}
}

return $value ? $value : '';
}

Let me know if you think anything is missing here? I tried to duplicate what you added, and also account for the relative paths vs. domain issue.

Thanks,

Ryan

Sanitizer-php.zip

Posted

Great, thanks. I will test some more here and then commit it. Good idea about adding the allowRelative option!

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...