Jump to content

Search not working as intended?


Tom.
 Share

Recommended Posts

Hello,

I'm just having a little trouble with the search functionality with ProcessWire. I have a title "Disability Discrimination Act" and I have the search:

However, searching "Disability Act" doesn't bring a result, however searching "Disability" or "Act" does. According to the documentation, this shouldn't be the case?

$search = $input->get->text("search");
if($search) {
    $search = $sanitizer->selectorValue($search);
    $results = $pages->find("template=document, title%=$search");
} else {
    $error = "Please enter a search term in the search box above.";
}

I'm using ProcessWire 3.0.2 - If it's a bug?

EDIT:

Figured it out, but it's still pausing a problem I understand that %= is more like *= (exact phrase) rather than ~= (contain phrase). I would use ~= but it fall just short of the 4 letter minimum.

EDIT 3:

Using "~=" searching the Title exactly "Disability Discrimination Act" - Returns nothing, searching "Help with disability" returns the right result. However searching "Help with disability act" returns nothing. - Still confused

Would the use of a 3 letter word cancel the search? - Is there a way of searching like %= but acting like ~= 

Link to comment
Share on other sites

I have a similar problem, but I guess there isn't much to do here. The problem derives from MySQLs fulltext search.

If you have root access, you can always change the letter minimum (http://stackoverflow.com/questions/1585611/mysql-full-text-search-for-words-with-three-or-less-letters) and rebuild indexes (be aware of performance impacts).

SQL fulltext searches are very basic and imperfect. They are fine for most of us, but if you need a real reliable, tolerant/fuzzy site search you would need to implement something more professional.

There are many products on this topic, like elasticsearch (awesome), Lucene or Solr, but they all need a server with root access. The topic is comprehensive. Easy to learn, hard to master.

If you can't be bothered with that, there is still this Lucene implementation from Zend, which is just a buch of PHP files without a full blown server (http://framework.zend.com/manual/current/en/modules/zendsearch.lucene.searching.html), badly documented though.
It's a nice start and you can learn the very basics of search engines/indexers with it. But, naturally to PHP, this solution will not be able to handle big amounts of data. Af few thousand documents in the index and the fun is over already.

  • Like 1
Link to comment
Share on other sites

Tom, there's a setting in MySQL called ft_min_word_len and the default is 4. That means that if you have any word in your query that is less than 4 characters, MySQL won't be able to match it unless that setting has been adjusted (and you've forced it to rebuild the indexes on your db/table). That's why you aren't able to match "Disability Act", because "Act" likely falls under your ft_min_word_len. You should be able to get the result you are looking for by using %= instead. But since you want to match "disability" and "act" anywhere in your search, you'd need to do this:

template=document, title%=disability, title%=act

The following code should use %= for words less than 4 characters and ~= for the rest. 

$search = $input->get->text("search");
if($search) {
  $selector = ''; 
  $parts = explode(' ', $search); 
  foreach($parts as $key => $part) {
    if(strlen($part) > 3) continue;
    // use MySQL LIKE for words under 4 characters in length
    $selector .= ", title%=" . $sanitizer->selectorValue($part);
    unset($parts[$key]); 
  }
  if(count($parts)) {
    // use MySQL fulltext index for words 4 characters and higher
    $selector .= ", title~=" . $sanitizer->selectorValue(implode(' ', $parts)); 
  }
  $results = $pages->find("template=document" . $selector);
} else {
  $error = "Please enter a search term in the search box above.";
  $results = null;
}

Come to think of it, it might be nice if PW just did this for you automatically when it detected the condition... but we don't know what the ft_min_word_len setting is. 

  • Like 4
Link to comment
Share on other sites

Tom, there's a setting in MySQL called ft_min_word_len and the default is 4. That means that if you have any word in your query that is less than 4 characters, MySQL won't be able to match it unless that setting has been adjusted (and you've forced it to rebuild the indexes on your db/table). That's why you aren't able to match "Disability Act", because "Act" likely falls under your ft_min_word_len. You should be able to get the result you are looking for by using %= instead. But since you want to match "disability" and "act" anywhere in your search, you'd need to do this:

template=document, title%=disability, title%=act

The following code should use %= for words less than 4 characters and ~= for the rest. 

$search = $input->get->text("search");
if($search) {
  $selector = ''; 
  $parts = explode(' ', $search); 
  foreach($parts as $key => $part) {
    if(strlen($part) > 3) continue;
    // use MySQL LIKE for words under 4 characters in length
    $selector .= ", title%=" . $sanitizer->selectorValue($part);
    unset($parts[$key]); 
  }
  if(count($parts)) {
    // use MySQL fulltext index for words 4 characters and higher
    $selector .= ", title~=" . $sanitizer->selectorValue(implode(' ', $parts)); 
  }
  $results = $pages->find("template=document" . $selector);
} else {
  $error = "Please enter a search term in the search box above.";
  $results = null;
}

Come to think of it, it might be nice if PW just did this for you automatically when it detected the condition... but we don't know what the ft_min_word_len setting is. 

Thank you very much Ryan, this much appreciated :-) 

Link to comment
Share on other sites

@ryan

How about using this http://stackoverflow.com/questions/5716362/how-can-i-get-the-value-of-mysqls-ft-min-word-len-configuration-variable-usin to retrieve the value on install and maybe a setting somewhere, where it can be readjusted ($config maybe)? It's hard to imagine this value to change often.

  • Like 7
Link to comment
Share on other sites

@ryan

How about using this http://stackoverflow.com/questions/5716362/how-can-i-get-the-value-of-mysqls-ft-min-word-len-configuration-variable-usin to retrieve the value on install and maybe a setting somewhere, where it can be readjusted ($config maybe)? It's hard to imagine this value to change often.

 "Come to think of it, it might be nice if PW just did this for you automatically when it detected the condition... but we don't know what the ft_min_word_len setting is. " 

+1 This would be awesome.

Ryan, you'll have to let me buy you a drink :-) - (Do you have PayPal?)

Link to comment
Share on other sites

Hey guys, this upgrade now appears on PW 3.0 (devns) branch. If you are using the ~= operator in a search, like "title~=disability act" it now asks MySQL for the ft_min_word_len (per @LostKobrakai's suggestion above). If any word length falls under that threshold, it delegates that word to a separate REGEXP command in MySQL that matches whole words.

This is actually an improvement over the code I mentioned earlier because the previous code that converts it to use a "%=" for short words could match things like "disability fact" or "react to this disability" etc. Whereas the new core addition will only match whole words in the same way that the MySQL index would, meaning the field would only match if it contained the exact words "disability" AND "act" somewhere in the field, and not some word that happens to contain the partial word "act".

To take it further, it also now uses this technique to match stopwords that MySQL doesn't full-text index. Previously it would just exclude them from the search since the presence of a stop word would prevent a search from matching. Now you can perform searches that include those stop words and match them. 

  • Like 14
Link to comment
Share on other sites

  • 1 month later...
  • 4 years later...

@ryan: Reopening this old thread... Searching for a short word (three letters) via the search.php template (which uses the ~= operator) works as described in pw 3.0.148, but does not seem to work anymore in 3.0.165.

Attached the images Pre and Post upgrade on a clean installation of the Minimal Site Profile from v 3.0.133 to v 3.0.165

 

Pre-upgrades-A.png

Pre-upgrades-C.png

Pre-upgrades-B.png

Post-upgrades-A.png

Post-upgrades-B.png

Edited by Mike-it
Tested also with the classic Minimal Site Profile
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...