Jump to content

sanitizer->email() for IDN Domain


sujag
 Share

Recommended Posts

Is there a way to sanitize email-Adresses from IDN Domains?

$connInfo = $sanitizer->email($connInfoRaw);

'post@kirchlein-im-grnen.de'
This is obviously wrong. $sanitizer->email() has no options to allwo IDN.
Any ideas, workaround how to fix this? Or at least get a warning? Since this a mass import there might be more than just this single case, which I know.
Link to comment
Share on other sites

The gist of this issue is that $sanitizer->email() is a just a very thin wrapper around PHP's filter_var() with FILTER_SANITIZE_EMAIL. Since FILTER_SANITIZE_EMAIL doesn't, at least as far as I know, support non-ASCII letters, neither does $sanitizer->email().

In my opinion it would make sense to handle this at core level, but in the short term your best bet might be writing your own function for sanitizing these addresses, or alternatively using some existing library. One relatively straightforward solution might be converting the address to ASCII (PHP has idn_to_ascii() for this), validating that, and if it's valid then assuming that the original address is valid as well.

(By the way: feel free to open a feature request for this at https://github.com/processwire/processwire-requests. Or alternatively an issue at https://github.com/processwire/processwire-issues, though technically current implementation "works" — it just doesn't support international characters, which are a later addition to the spec.)

Link to comment
Share on other sites

You can use https://doc.nette.org/en/utils/validators#toc-isemail I think that should work?

Or just copy the single method? https://github.com/nette/utils/blob/35f77d3cef633ea75c1a620f40211b1494bfe00b/src/Utils/Validators.php#L304-L319

$isEmail = function(string $value): bool
{
	$atom = "[-a-z0-9!#$%&'*+/=?^_`{|}~]"; // RFC 5322 unquoted characters in local-part
	$alpha = "a-z\x80-\xFF"; // superset of IDN
	return (bool) preg_match(<<<XX
		(^
			("([ !#-[\\]-~]*|\\\\[ -~])+"|$atom+(\\.$atom+)*)  # quoted or unquoted
			@
			([0-9$alpha]([-0-9$alpha]{0,61}[0-9$alpha])?\\.)+  # domain - RFC 1034
			[$alpha]([-0-9$alpha]{0,17}[$alpha])?              # top domain
		$)Dix
		XX, $value);
};
db($isEmail("post@kirchlein-im-grünen.de")); // true

 

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...