Jump to content
HerTha

Secure search form processing - best practice?

Recommended Posts

I am setting up a front-end form to allow visitors to do a fulltext site search.

When external input arrives at the system to be processed in database queries, security is a main issue. Therefore, all the (text) input for this form goes through a $sanitizer->text(), as a first step.

Then, I thought using a $sanitizer->selectorValue() would be a good idea for building the selector:

$results = $pages->find("template=contentpage,title|headline|body%={$sanitizer->selectorValue($search_term)}");

Unfortunately, it turned out to be quite easy to crash the whole thing, for instance by typing a single ! in the search which gives a 500 server error for the visitor.

Obviously, there are a number of scenarios (including the ! char) which are not catched by the sanitizer, but need to be taken care of. Anyone who can share a list of forbidden chars or an algorithm to handle the filtering more securely in such a situation?

Of course, the whole filtering should not be overly strict, so the visitor can still find a few things...

Share this post


Link to post
Share on other sites

Does the error log tell you more about the 500 error?

Just wondering since I basically do the same on the pwflex.team-tofahrn.de search:

if($config->ajax
&& (($query = $input->get->q) != ''))
	{
	$query = $sanitizer->selectorValue($query);
	$search = "text_index~=$query";
	$selector = "{$search}, limit=".($Limit+1);
	$matches = $pages->find($selector); 
	

Only difference may be, that my script requires three chars before it sends the query (makes no difference, just tried).

  • Like 1

Share this post


Link to post
Share on other sites

You are using "~=" while I'm prefering "%=" in this place - that's the only difference I can see right now. Should not make a difference, anyway?

The error message, when admin, is like:

Error: Exception: Unknown Selector operator: '%=!' -- was your selector value properly escaped? (in /home/abcdefg/public_html/wire/core/Selectors.php line 419)

Other characters can also do such harm, for instance "?" or ","

That site is running on PW 3.0.123, BTW

Share this post


Link to post
Share on other sites

The search on that site can actually do both, depending if the search phrase starts with a ". No, it does not make a difference (just tried).

Just logged my resulting $selector and saw the ! is encapsulated in ":

'text_index%="!", limit=16'

Which version of PW are you running (I'm on 3.0.123)? Maybe there's a fix in $sanitizer->selectorValue(). (Edit: Ok, should read the last paragraph as well).

Share this post


Link to post
Share on other sites
31 minutes ago, HerTha said:

Other characters can also do such harm, for instance "?" or ","

Seems like your $sanitizer->selectorValue() does not work at all for some reason. Could you check with Tracy?

$search_term = '!';
$val = "template=contentpage,title|headline|body%={$sanitizer->selectorValue($search_term)}";
d($val);

Should output:

"template=contentpage,title|headline|body%="!"" (45)

 

  • Like 1

Share this post


Link to post
Share on other sites

Okay, as I have multiple calls to find() on that page, I tried to narrow it down a bit and removed all but one.

If the search is for, e.g., "hallo" then all is fine.

However, if searching for "!hallo", that particular error (see above) occurs.

The offending selector, as Tracy bd() reports:

"sort=name,limit=50,template=downfile,title|name|indexer%="!hallo""

I can't see anything wrong with it...

Share this post


Link to post
Share on other sites
39 minutes ago, HerTha said:

I can't see anything wrong with it...

neither me. Could you set $config->debug=true; in your site/config.php? Then there should be some more information in that error message.

Share this post


Link to post
Share on other sites

@HerTha

I use such code:

if (input('get', 'q')) {
	$query = input('get', 'q', 'text');
	input()->whitelist('q', $query);

	$selector = [
		['template', 'post'],
		['title|builder.post_wysiwyg|builder.blockquote_wysiwyg', '%=', $query],
		['limit', '10']
	];
}

and it also works with '!' without errors.

Consider using selectors as arrays as it more clear way and have on a small benefit that you don't have to use $sanitizer->selectorValue()

Quote

Using array syntax, it's no longer necessary to use the $sanitizer->selectorValue() method to sanitize text that goes into selectors. That makes it simpler and safer when building a selector that involves user input. That's because the separation fields and values is handled at the PHP language level (as an array declaration), rather than as a runtime construction. Of course, you should still sanitize user input, but you no longer need to sanitize a value to be selector-friendly.

 

  • Like 4

Share this post


Link to post
Share on other sites

@Autofahrn thanks for your input! Initially, I thought I am dealing with a general problem. Now, it seems more related to a special combination of details.

@Zeka array syntax looks good - worth trying, I'd say!

I'll report back once I find more...

  • Like 1

Share this post


Link to post
Share on other sites

I have rearranged the code to use array syntax for the selector:

$a_sel = [
    ['sort', 'name'],
    ['limit', $limit_per_section],
    ['template', 'downfile'],
    ['title|name|indexer', '%=', $a_terms]
];
$results = $pages->find($a_sel);

Now, if $a_terms contains search term "!hallo" then all is fine. Problem solved!?

Well, not really. Searching for "&hallo" still crashes:

Quote

Exception: Unknown Selector operator: '%=&' -- was your selector value properly escaped? field='title', value='hallo', selector: 'include=all, title%=&hallo' (in /home/abcdefg/public_html/wire/core/Selectors.php line 419)

#0 /home/abcdefg/public_html/wire/core/Selectors.php(460): ProcessWire\Selectors->create('title', '%=&', 'hallo')

#1 /home/abcdefg/public_html/wire/core/Selectors.php(142): ProcessWire\Selectors->extractString('')

...

I understand this should not happen, my debugging skills are a bit limited, though...

Share this post


Link to post
Share on other sites
13 minutes ago, HerTha said:

Searching for "&hallo"

Thanks for the pointer, searching for "&something" does not work with AjaxSearch either, since it sends the field unescaped through a get request, so "something" effectively gets a second parameter and the search string is empty (does not contain the ampersand). But that's another story.

Could you try this line (or better run $a_terms through $sanitizer->selectorValue()):

['title|name|indexer', '%=', "'$a_terms'" ]

 

Share this post


Link to post
Share on other sites

well, that mixed lot of quotes makes the error disappear 🌤️

<scratchinghead>how did you come up with that?</scratchinghead>

Share this post


Link to post
Share on other sites
6 minutes ago, HerTha said:

how did you come up with that?

It's just manually enforcing the string is escaped in quotes, but single quotes this time.

Since I guess that double quotes are removed somewhere, you may try with this one as well:

['title|name|indexer', '%=', '"'.$a_terms.'"' ]

In both cases you really need to ensure the search term does not contain quotes itself, searching for "it's" will fail now. That's normally something $sanitizer->selectorValue() takes care of.

  • Like 1

Share this post


Link to post
Share on other sites

@HerTha

Just curious is this code fails? 

$a_sel = [
    ['sort', 'name'],
    ['limit', $limit_per_section],
    ['template', 'downfile'],
    ['title|name|indexer', '%=', $sanitizer->selectorValue($a_terms)]
];
$results = $pages->find($a_sel);

 

  • Like 3

Share this post


Link to post
Share on other sites

 

25 minutes ago, Zeka said:

$sanitizer->selectorValue($a_terms)

that extra sanitizer removes the error, Tracy reports $a_sel to be:

image.png

So, for this case, the documentation seems to be wrong:

19 hours ago, Zeka said:

Using array syntax, it's no longer necessary to use the $sanitizer->selectorValue() method to sanitize text that goes into selectors.

 

 

  • Like 1

Share this post


Link to post
Share on other sites

further investigation -

I simplified the code to:

$a_sel = [
    ['template', 'downfile'],
    ['title|name|indexer', '%=', $a_terms]
];
$results = $pages->find($a_sel);

So, there are 3 fields to search: "title", "name" and "indexer" (a textarea field).

Still using $a_sel = "&hallo".

It turns out, that the error occurs only if:

  • "name" field is present and
  • at least one more field is present

The combination "title" with "indexer" works just fine.

I have never filed an issue report, but this one could be worth it. Anyone who can duplicate this error?

  • Like 1

Share this post


Link to post
Share on other sites
35 minutes ago, HerTha said:

Anyone who can duplicate this error?

yes, instantly! Probably because name is handled something special.

Maybe worth triggering @ryan...

Share this post


Link to post
Share on other sites

I found some other strange stuff (only tested in Tracy):

$q = "*";
$query = $sanitizer->selectorValue($q);
$pgs = $pages->find("parent=1041, include=all, vertec|title|name|remarks%=$query");
d($pgs);

This returns everything. Every single page inside parent id 1041.

! returns a seemingly random amount of results (74 out of 1061) - completely wrong.

? does the same, but returns 73 instead of 74.

So, I guess you'd have to manually sanitize the query yourself: Check if minimum string length is 3. Remove charactes such as ! & *. And of course, make it a habit to narrow results down (according to template, parent, not in trash, etc.). I would expect PW to throw some sort of error if a search string is less than 3 characters.

Maybe the results may differ if I actually used it in a template (outside Tracy) and not logged in as superuser, or with debug off. But still, this behavior truly puzzles me.

  • Like 1

Share this post


Link to post
Share on other sites

$sanitizer->selectorValue("*"); effectively returns an empty string. So searching for nothing obviously isn't intended, but probably correct to return everything... 😉

Share this post


Link to post
Share on other sites
6 minutes ago, Autofahrn said:

So searching for nothing obviously isn't intended, but probably correct to return everything

I strongly disagree. An empty string is useless. (maybe "correct" in a strictly SQL / theoretical / technical view, but not in the context discussed here)

It's funny that * returns empty string, but ? or ! return ""?"" and ""!"".

I'm really surprised (knowing how important security is for @ryan). Does that mean that a malicious attacker could potentially bring a server to its knees by searching with wildcard automatically x times per second?

Share this post


Link to post
Share on other sites

Using %= means your value will be escaped and wrapped in "%$value%". Using that with LIKE in sql means you'll get all results returned for an empty string value. Both – returning an empty or a full resultset – can be valid expected results for a filtering operation so it's best to align with the default sql behavior and not adding special cases, which is at least the less surprising option. If you don't want to allow searches for empty string is ultimately a business decision you have to take care for on your own.

29 minutes ago, dragan said:

Does that mean that a malicious attacker could potentially bring a server to its knees by searching with wildcard automatically x times per second?

If that's a concern I'm wondering if you don't set limit on your query. Limit (and in extension pagination) makes the problem of an empty string not filtering the results a non-issue.

  • Like 1

Share this post


Link to post
Share on other sites
Posted (edited)
2 hours ago, LostKobrakai said:

If that's a concern I'm wondering if you don't set limit on your query. Limit (and in extension pagination) makes the problem of an empty string not filtering the results a non-issue.

Oh, don't worry, I always do that, in addition to all sorts of user-input sanitation/validation. I was rather just speaking theoretically. Thankfully, PW code in the site profiles has plenty of useful examples with comments, which will hopefully prevent newbies to omit such checks. After all, we can't expect PW to handle everything for us; it's always up to the developer to take care of security and best practice.

Edited by dragan
re-phrasing

Share this post


Link to post
Share on other sites

Thank you guys for digging into this! This forum is just awesome!

I am trying to summarize:

We found out that array syntax is able to handle "!" properly.

We also found that there are other workarounds for accepting (or filtering) "&".

So, that are all interesting details, good to know. However, the question from my original post is still circulating my head:

On 5/18/2019 at 12:21 PM, HerTha said:

Obviously, there are a number of scenarios (including the ! char) which are not catched by the sanitizer, but need to be taken care of. 

Anyone who can share a list of forbidden chars or an algorithm to handle the filtering more securely in such a situation?

I thought this has been done a thousand times...

 

 

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...