Jump to content

sanitise broken HTML


fruid
 Share

Recommended Posts

I have a form where some user input is sent to the server via ajax and then returned to the frontend and displayed.
That input is sanitised on the server like so

$entry['message'] = $input->post->textarea('message'); // the server receives a formdata object that stores the user input as stringified JSON 

which works fine when it's proper HTML with a < and a corresponding > or < and a corresponding /> 

so

<h1>John Doe</h1>

is stripped to 

John Doe

So far so good, but what about broken HTML tags?
If the user send some BS like <h1John Doe </h1
the whole script breaks, the input is not processed properly, data is lost and the ajax reponse is empty too.
How can I sanitise this and avoid this behaviour?

Should that be done in the frontend before sending to the server anyway? Frontend uses Vanilla JS.

The input is used to send an automatic email later on and though the email is sent, it's completely broken. 
I mainly need to avoid that of course, so I guess I can just check for empty values before that happens.
However, the ajax response needs to have proper markup too and then I wonder if there are any other dangers? Cause I'm also storing the input in some PW fields of a page…

Should I use

->purify()

Thanks for help!

Link to comment
Share on other sites

the following doesn't work…

  protected function bb($i) {
    $o = str_replace("<", "", str_replace(">", "", $i));
    return $o;
  }
$_SESSION['message'] = $this->bb($input->post->textarea('message'));

nor does the following work…

$_SESSION['message'] = wire('sanitizer')->purify($input->post->textarea('message'));

The < or > and anything after that is not stored.

How can that be?

Link to comment
Share on other sites

OK turns out this is a different issue. The sanitizer API works fine, even with broken HTML.

The issue is rather, that the JS formData object sent via AJAX doesn't reach the server properly when it contains some < and/or > (special characters?).

So I guess it's another header-issue. I use:

XHR.setRequestHeader('X-Requested-With', 'XMLHttpRequest');
XHR.setRequestHeader('Content-Type', 'application/x-www-form-urlencoded');

But they don't solve the issue. I never know which one is right because I never understood headers. 

Any ideas?

Link to comment
Share on other sites

there's no URL involved so I think this header is unnecessary. It works without it in one scenario. But in another one I create a formData object which just doesn't make it through if it contains < or >

var formData = new FormData();

fillFormData(formData);

sendFormData(formData)

function fillFormData(formData) {
    formfields = document.getElementsByClassName('formfield');
    content = new Object();
    for (i = 0; i < formfields.length; i++) {
      value = formfields[i].value;
      content[formfields[i].title] = value
    };
    formData.set("content", JSON.stringify(content));
}

function sendFormData(formData) {
    var XHR = new XMLHttpRequest();
    XHR.onreadystatechange = function () {

        if (XHR.readyState !== 4) return;
        if (XHR.status >= 200 && XHR.status < 300) {
            let response = XHR.responseText;
            response = JSON.parse(response);
            console.log(JSON.parse);
        }

    };

    XHR.open('POST', '', true);
    XHR.setRequestHeader('X-Requested-With', 'XMLHttpRequest');
    XHR.send(formData);

}

 

Link to comment
Share on other sites

it seems like when I do

formData.set('content', content);

the formData doesn't make it to the server, proper HTML or not.

And when I do 

formData.set('content', JSON.stringify(content)); 

proper HTML is passed but broken HTML is not.

The other scenario where it works, the form fields are directly stored in the formdata separately and not stored in an object (content). I guess that's why it works there and not here.

Link to comment
Share on other sites

XHR.setRequestHeader('Content-Type', 'application/x-www-form-urlencoded')

this header doesn't work either, need to omit it for it to work at all.

Any other suggestions for headers? Can't find a clear documentation on headers anywhere, so to me it remains a mystery.

I'm basically just passing simple strings and an email via AJAX, can't be that hard, can it?

Link to comment
Share on other sites

13 hours ago, Jan Romero said:

Headers are fine. So is your request content actually urlencoded?

@fruid can you confirm that your data is urlencoded? You might want to have a read over at MDN https://developer.mozilla.org/en-US/docs/Learn/Forms/Sending_forms_through_JavaScript . Especially the part where they are using encodeURIComponent() to encode the data.

12 hours ago, fruid said:

proper HTML is passed but broken HTML is not.

What do you mean by broken HTML. Do you have an example?

5 hours ago, fruid said:

I'm basically just passing simple strings and an email via AJAX, can't be that hard, can it?

Passing an email, do you mean the HTML or text content of an email or an email address?

In your code you are taking all form fields, putting them into an object, then converting that object into a JSON string and assign that string to a property 'content' of a formData object and send that object via XHR.
To me it looks like what you actually want to do is send JSON data to the server. If this is the case, you could omit the step of assigning your JSON string to a formData object and use request header content-type 'application/json' instead. No need for url encoding the data then. Simplified example:  

// create an object
const content = {
  formfield1: 'value',
  formfield2: 'value'
}

// open request
xhr.open('POST', 'yourURL')

// set Content-Type header
xhr.setRequestHeader('Content-Type', 'application/json')

// send rquest with JSON payload
xhr.send(JSON.stringify(content))

On the PHP side where you receive the request, you can get the JSON data like this:

// Takes raw data from the request
$json = file_get_contents('php://input');

// Converts it into a PHP object
$data = json_decode($json);

 

  • Like 2
Link to comment
Share on other sites

6 hours ago, gebeer said:

can you confirm that your data is urlencoded

I cannot confirm that. Reading the doc I understand that when I use formData I don't have to worry about url encoding, which explains why it actually works without the header

approveOrderXHR.setRequestHeader('Content-Type', 'application/x-www-form-urlencoded');

(and doesn't work with it).

Anyways, after more investigation I have to circle back to my first suspicion, namely a problem with sanitizing. The formdata makes it to the server with or without broken HTML.

8 hours ago, gebeer said:

What do you mean by broken HTML. Do you have an example?

I mean stuff Like John< Do><<e Main< street <>12>3 

which is exaggerated and unlikely to happen but even one < breaks my entire logic.

It's the further processing of the input that runs into issues and I can now confirm that the problem must have always been the santize API.

$input->post->textarea('message');

strips anything past a > or <. Highly undesired behaviour. This however:

 $input->post->message;

works just fine. But proper html like <h1>John</h1><h1>Doe</h1> is stored as is and not what I want to see in my AJAX response which is rendered to markup.

How can I have the best of both worlds?

Link to comment
Share on other sites

18 minutes ago, fruid said:

Reading the doc I understand that when I use formData I don't have to worry about url encoding, which explains why it actually works without the header

If you are building the formData object like you do it, you have to take care about url encoding yourself. Only if you pass in the form when building formData, encoding is taken care of. 

20 minutes ago, fruid said:

How can I have the best of both worlds?

Have a look at https://processwire.com/api/ref/sanitizer/ and try different methods. But I'm afraid that the sanitizer methods that handle HTML tags will always run into problems when there is broken HTML. So you should take care in the first place that HTML is not broken if possible. 

Link to comment
Share on other sites

8 minutes ago, gebeer said:

So you should take care in the first place that HTML is not broken if possible. 

so I should sanitise the input in JS before sending it to the server? with regex I suppose? 

I guess these are now two unrelated issues, one is the formData and the other is the sanitising, be it on client or server side.

Can I avoid the url encoding by building my own much simpler object that I would send to the server? I'm not sending files anyway.

Link to comment
Share on other sites

so I tried with a couple of other sanitisers, all with the same issues.

The way it looks to me now is, I better sanitise the HTML in the frontend with some sort of REGEX before sending it to the server and then NOT sanitise anything on the server so that special characters like < and >, mostly submitted by mistake are still processed. There's no danger of SQL injection in PW anyway in my understanding…

Link to comment
Share on other sites

35 minutes ago, fruid said:

so I should sanitise the input in JS before sending it to the server? with regex I suppose?

I don't know where your input comes from. I would take care at the source that there is no broken HTML in the first place.

 

38 minutes ago, fruid said:

Can I avoid the url encoding by building my own much simpler object that I would send to the server? I'm not sending files anyway.

I already provided some sample code how you can do this above 

  • Like 1
Link to comment
Share on other sites

I will use 

htmlspecialchars($input->post->message, ENT_SUBSTITUTE)

on the server before storing anything and chuck the sanitizer entirely. It's up to the user to not do typos, it's just up to me to make the logic not break.

Thanks for your input!

  • Like 1
Link to comment
Share on other sites

actually I'm not done.

I noticed that on some page where I use a simple contact form, all sanitizer APIs like

$input->post->textarea('message');
$input->post->text('name'); 

do exactly what I expect.

Proper HTML don't break the input and special HTML characters make it through no problem.

so some input like

<h1>two > one</h1>

becomes

two > one

So the questions are:

how I replicate that behaviour with a formdata object send over AJAX?
how can I store the values not directly to the formdata but in an array or object that is then stored in some formdata key or property of the formdata object?
what are the headers that I need?
and what else is to consider?

 

Link to comment
Share on other sites

Here is a minimal example that works for me:

<?php namespace ProcessWire;

    if (input()->post('message')) {
        header('content-type: text/plain'); //this makes the browser show the "unformatted" response body, i.e. it won’t render HTML
        die(input()->post->textarea('message'));
    }
?>

<form id="fruidform" method="POST" action="./">
    <textarea name="message">&lt;h1&gt;two &gt; one&lt;/h1&gt;</textarea>    
	<button id="urlencoded" type="submit">Tu es urlencoded</button>
	<button id="formdata"   type="submit">Tu es als form-data</button>
</form>

<script>
    document.getElementById('urlencoded').addEventListener('click', async function(event)
    {
        event.preventDefault();
        const response = await fetch('./', {
            method: 'POST',
            body: new URLSearchParams([['message', document.forms.fruidform.message.value]])
            //you can also make a URLSearchParams from a form automatically, so you don’t have to reassemble all fields yourself:
            //new URLSearchParams(new FormData(document.forms.fruidform))
        });
    });
  
    document.getElementById('formdata').addEventListener('click', async function(event)
    {
        event.preventDefault();
        const response = await fetch('./', {
            method: 'POST',
            body: new FormData(document.forms.fruidform)
        });
    });
</script>

Observe how you get back “two > one” from the server in both cases. What are you doing differently?

Also see how you don’t need to put the content-type header explicitly, because fetch() infers it from the body’s type automatically, but it is sent in both cases!

If you look at the unformatted request body in the browser console, you’ll see that the first one is is:

message=%3Ch1%3Etwo+%3E+one%3C%2Fh1%3E"

message=%3Ch1%3Etwo+%3E+one%3C%2Fh1%3E"

That mess of % symbols is “urlencoded” and the request has a header that announces this to the server, so the server will know how to decode it: “Content-Type: application/x-www-form-urlencoded;charset=UTF-8”. It’s called “urlencoded” because it specifically exists to encode GET parameters as part of URLs, but you can use it for POST as well, as you can see.

The form-data request body looks like this:

-----------------------------9162892302224017952318706005
Content-Disposition: form-data; name="message"

<h1>two > one</h1>
-----------------------------9162892302224017952318706005--

(Your boundary may vary. The browser generates automatically.)

Again the request’s content-type header tells the server that it’s sending this format: “Content-Type: multipart/form-data; boundary=---------------------------9162892302224017952318706005”.

If you use XMLHttpRequest, you may need to set the content-type explicitly according to the format you’re sending, I’m not sure, but it can't hurt.

Another thing is that these are (I believe) the only two content-types that PHP will put into its $_POST and $_GET variables. That’s why @gebeer’s example had to use file_get_contents('php://input') to get the JSON. Of course you can also send JSON as urlencoded or form-data. Then you can use json_decode(input()->post('myjson')).

  • Like 3
Link to comment
Share on other sites

I think to have read somewhere that fetch doesn't work with every browser. Also, changing everything to use fetch instead of XMLHttpRequest would mean a lot of work, so I'd rather get the latter working…

Next, in my setup I use formdata, but I don't just add the entire form to the formdata object but instead set the properties programmatically. So I guess that's why I cannot get it to send the required header along (which it would automatically when just adding the entire form).  

So that's what I have to work with in order to not have to change everything

- use formdata
- set formdata values like so formdata.set('foo', bar); (not using a <form></form> tag at all)
- assign arrays or json objects to some of the formdata's properties
- send formdata via XMLHttpRequest

Is that a lot to ask?

Anyways, from reading your suggestions, it shouldn't be a problem, all I need to do is

#1 set this header in my php code when receiving the data:

On 11/15/2022 at 12:58 PM, Jan Romero said:
        header('content-type: text/plain'); //this makes the browser show the "unformatted" response body, i.e. it won’t render HTML

#2 and since it's an XMLHttpRequest, like you said "it can't hurt", add this header before sending the request:

XHR.setRequestHeader('Content-Type', 'application/x-www-form-urlencoded');

#3 and since most of the send content is json, keep using this on the server:

On 11/15/2022 at 12:58 PM, Jan Romero said:

json_decode(input()->post('myjson')).

And if I got all of this right, all that was missing was #1

Will try that then.

 

Link to comment
Share on other sites

I mean we still don’t know what exactly you’re doing, but you definitely don’t need #1, it’s just there to show the result of my test POST.

42 minutes ago, fruid said:

#2 and since it's an XMLHttpRequest, like you said "it can't hurt", add this header before sending the request:

XHR.setRequestHeader('Content-Type', 'application/x-www-form-urlencoded');

You just said you’re using formdata, so DON’T send the urlencoded header. In fact I just tested it with XMLHttpRequest and it also automatically builds the Content-Type header for you, so you don’t need that part at all. Building on my example above, this is all you need:

document.getElementById('formdataxhr').addEventListener('click', async function(event)
{
    event.preventDefault();
    const request = new XMLHttpRequest();
    request.open('POST', './');
    request.send(new FormData(document.forms.fruidform)); //or manually use FormData.append() or FormData.set() as you said
});

Yes, you need to add the X-Requested-With header, but that’s ONLY there to set ProcessWire’s config()->ajax property to true. Everything else works fine without it.

Link to comment
Share on other sites

The way it looks, again, is that it wasn't an issue with sending, rather interpreting the input on the server.

<textarea name="message"><h1>one < two, two > one</h1></textarea>    
$message = $input->post->textarea('message'); // one < two, two > one

<textarea name="message"><h1>one<two, two>one</h1></textarea>    
$message = $input->post->textarea('message'); // oneone

<textarea name="message"><h1>one<two, two>one</h1></textarea>    
$message = htmlspecialchars($input->post->message, ENT_SUBSTITUTE); // &lt;h1&gt;one&lt;two, two&gt;one&lt;/h1&gt;

regarding example 3, in one of my templates, I do

$mail = wireMail();
$mail->bodyHTML($message); // <h1>one < two, two > one</h1>

the HTML characters get transformed back into < and > in the email, no rich text which is fine.

but in another

$mail = wireMail();
$mail->body($message); // &lt;h1&gt;one&lt;two, two&gt;one&lt;/h1&gt;

So ->bodyHTML() is the better choice, the user is not supposed to add HTML tags anyway but should be able to add < or >

However, then the user cannot use breaks \r\n in their input 🤪

How can I have the best of both worlds?

 

 

never mind this, it's not really related

Also, now having a new issue, maybe it's related?

https://processwire.com/talk/topic/27813-unknown-invalid-mailbox-list/#comment-228446

 

Link to comment
Share on other sites

  • 3 weeks later...
On 11/16/2022 at 6:16 PM, Jan Romero said:

you need to add the X-Requested-With header, but that’s ONLY there to set ProcessWire’s config()->ajax property to true. Everything else works fine without it.

It has come to my attention that this is not strictly true. The XHR header makes modern browsers send the CORS preflight request, so requiring config()->ajax for your requests on the server side can save your users from CSRF shenanigans.

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...