Jump to content

Sanitizer->textarea attacking my heart <3


Can
 Share

Recommended Posts

Buenos dias amigos,
 
We just noticed that our contact form, and for sure comment form, too (custom build / not FieldtypeComments) occasionally strips whole paragraphs of the user content. (A girl told us that she wrote more than she saw in the replies quote)
 
I'm cleaning input right away using $sanitizer->textarea
So I tested a little and could narrow it down to strip_tags which is part of $sanitizer->text
Commenting it out kept all paragraphs of my test string which were mainly lorem ipsum. With it enabled only the first line would come through.
After a lot of searching and trying to PM users here in the forum in hope they would reply soon, I found the actual issue.
I started the test mail with a line dedicated to my girlfriend (because she would read it) and ended it with a heart <3
strip_tags things it's the beginning of a tag and therefore strips not only the heart itself but everything after it. O.o
 
By the way, FieldtypeComments is using strip_tags, too. And I just commented the newest Blog post about 3.0.9 and my "<3 Processwire" got stripped, too.
 
Then I wasn't sure how to sanitize the input, didn't wanted to loose any more content, since our crowdfunding we're getting a huge load of mails every day.
 
Thought about entities/entitiesMarkdown but when using it right on the input I needed to unentities on ouput which doesn't make sense because everything like <a onclick="alert('fooo')">click</a> would stay intact.. Many people are suggesting htmlentities for user input..when outputting though.
But I don't want/need any tags except for hearts and stuff because we're hippies (quote of my girl^^)
 
Right now I'm using
$sanitizer->purify($str, array('HTML.Allowed' => ''));

which works at the moment, maybe there other options?

 
Ah, one mentioned to not sanitize input at all but store it as is in db and only escaping (e.g. htmlentities) on output..
 
I was quite astonished that strip_tags still considers <3 as html, even though emojis exist for decades..
 
What do you think, or what is your way of dealing with user input?
 
Saludos and good night
Can
 
Ah, as far as I know it's not possible to declare <3 as valid tag to strip_tags because it's not an actual tag right? At least my testing didn't work..
  • Like 1
Link to comment
Share on other sites

Not sure if you read this: http://stackoverflow.com/questions/5055845/php-strip-tags-allow-3-text-hearts

As you said, it's not possible to use the allowableTags option of the textarea sanitizer either because it's not actually a real tag.

You could convert "<3" to "<3" during your form submission (before you sanitize) and then it will be preserved.

  • Like 3
Link to comment
Share on other sites

Wow, that's fast! :D

Yeah read it, haven't tried htmlspecialchars, but it's not what I want..

You're right that would be way..thanks for sharing adrian, maybe I'll give it a shot :)

Are there other things strip_tags would consider a tag and strip?

  • Like 1
Link to comment
Share on other sites

strip_tags removes anything after a "<" until the closing ">". If it is never closed, then the rest of the content will be removed. I think the best option might be the purify sanitizer like you are already using.

I get the feeling though that the textarea sanitizer should probably replace strip_tags with purify - obviously the way it currently works is not satisfactory because it will also delete <$20 and anything else similar and obviously valid.

  • Like 2
Link to comment
Share on other sites

I think so, too. And comment system should change this as well..

Actually I figured it strips everything where a non whitespace character follows an opening bracket

so your example gets stripped whereas < $20 wouldn't.. just for the record ;-)

Alright.. I really need to crack the bed now^^

So love <3 to all you PW lovers  ;)

  • Like 2
Link to comment
Share on other sites

Just a bit of a followup here. Have you tried the stripTags option for the textarea sanitizer?

echo $sanitizer->textarea('I <span style="color:red"><3</span> ProcessWire', array('stripTags' => false));

That will allow all tags, including <3 to be submitted.

Of course I still think the purifier might be a better/safer option, but thought I'd mention in case you didn't know about it. It's actually not listed on the $sanitizer docs page, but you can see it here in the source code: https://github.com/ryancramerdesign/ProcessWire/blob/b95e36a8d3071139bea5ed72b8b5025b876df976/wire/core/Sanitizer.php#L430

  • Like 3
Link to comment
Share on other sites

Forgot to mention it, was too late. I tried the option already, but I want tags to be stripped, so right now the purifier ($sanitizer->purify) seems to be the best option right now.

Thanks for mentioning bbcode textformatter mr-fan, but for now I stick to plain text for mails and comments. URLs are auto linkified on output anyways..

I got the whole processwire of the project opened in sublime, so cmd+t and the desired class, module, whatever brings me straight to what I want to know, still the fastest way. :)

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...