Tom. Posted December 22, 2015 Share Posted December 22, 2015 Hello, I'm just having a little trouble with the search functionality with ProcessWire. I have a title "Disability Discrimination Act" and I have the search: However, searching "Disability Act" doesn't bring a result, however searching "Disability" or "Act" does. According to the documentation, this shouldn't be the case? $search = $input->get->text("search"); if($search) { $search = $sanitizer->selectorValue($search); $results = $pages->find("template=document, title%=$search"); } else { $error = "Please enter a search term in the search box above."; } I'm using ProcessWire 3.0.2 - If it's a bug? EDIT: Figured it out, but it's still pausing a problem I understand that %= is more like *= (exact phrase) rather than ~= (contain phrase). I would use ~= but it fall just short of the 4 letter minimum. EDIT 3: Using "~=" searching the Title exactly "Disability Discrimination Act" - Returns nothing, searching "Help with disability" returns the right result. However searching "Help with disability act" returns nothing. - Still confused Would the use of a 3 letter word cancel the search? - Is there a way of searching like %= but acting like ~= Link to comment Share on other sites More sharing options...
tsdtsdtsd Posted December 22, 2015 Share Posted December 22, 2015 I have a similar problem, but I guess there isn't much to do here. The problem derives from MySQLs fulltext search. If you have root access, you can always change the letter minimum (http://stackoverflow.com/questions/1585611/mysql-full-text-search-for-words-with-three-or-less-letters) and rebuild indexes (be aware of performance impacts). SQL fulltext searches are very basic and imperfect. They are fine for most of us, but if you need a real reliable, tolerant/fuzzy site search you would need to implement something more professional. There are many products on this topic, like elasticsearch (awesome), Lucene or Solr, but they all need a server with root access. The topic is comprehensive. Easy to learn, hard to master. If you can't be bothered with that, there is still this Lucene implementation from Zend, which is just a buch of PHP files without a full blown server (http://framework.zend.com/manual/current/en/modules/zendsearch.lucene.searching.html), badly documented though.It's a nice start and you can learn the very basics of search engines/indexers with it. But, naturally to PHP, this solution will not be able to handle big amounts of data. Af few thousand documents in the index and the fun is over already. 1 Link to comment Share on other sites More sharing options...
ryan Posted December 22, 2015 Share Posted December 22, 2015 Tom, there's a setting in MySQL called ft_min_word_len and the default is 4. That means that if you have any word in your query that is less than 4 characters, MySQL won't be able to match it unless that setting has been adjusted (and you've forced it to rebuild the indexes on your db/table). That's why you aren't able to match "Disability Act", because "Act" likely falls under your ft_min_word_len. You should be able to get the result you are looking for by using %= instead. But since you want to match "disability" and "act" anywhere in your search, you'd need to do this: template=document, title%=disability, title%=act The following code should use %= for words less than 4 characters and ~= for the rest. $search = $input->get->text("search"); if($search) { $selector = ''; $parts = explode(' ', $search); foreach($parts as $key => $part) { if(strlen($part) > 3) continue; // use MySQL LIKE for words under 4 characters in length $selector .= ", title%=" . $sanitizer->selectorValue($part); unset($parts[$key]); } if(count($parts)) { // use MySQL fulltext index for words 4 characters and higher $selector .= ", title~=" . $sanitizer->selectorValue(implode(' ', $parts)); } $results = $pages->find("template=document" . $selector); } else { $error = "Please enter a search term in the search box above."; $results = null; } Come to think of it, it might be nice if PW just did this for you automatically when it detected the condition... but we don't know what the ft_min_word_len setting is. 4 Link to comment Share on other sites More sharing options...
Tom. Posted December 22, 2015 Author Share Posted December 22, 2015 Tom, there's a setting in MySQL called ft_min_word_len and the default is 4. That means that if you have any word in your query that is less than 4 characters, MySQL won't be able to match it unless that setting has been adjusted (and you've forced it to rebuild the indexes on your db/table). That's why you aren't able to match "Disability Act", because "Act" likely falls under your ft_min_word_len. You should be able to get the result you are looking for by using %= instead. But since you want to match "disability" and "act" anywhere in your search, you'd need to do this: template=document, title%=disability, title%=act The following code should use %= for words less than 4 characters and ~= for the rest. $search = $input->get->text("search"); if($search) { $selector = ''; $parts = explode(' ', $search); foreach($parts as $key => $part) { if(strlen($part) > 3) continue; // use MySQL LIKE for words under 4 characters in length $selector .= ", title%=" . $sanitizer->selectorValue($part); unset($parts[$key]); } if(count($parts)) { // use MySQL fulltext index for words 4 characters and higher $selector .= ", title~=" . $sanitizer->selectorValue(implode(' ', $parts)); } $results = $pages->find("template=document" . $selector); } else { $error = "Please enter a search term in the search box above."; $results = null; } Come to think of it, it might be nice if PW just did this for you automatically when it detected the condition... but we don't know what the ft_min_word_len setting is. Thank you very much Ryan, this much appreciated Link to comment Share on other sites More sharing options...
LostKobrakai Posted December 22, 2015 Share Posted December 22, 2015 @ryan How about using this http://stackoverflow.com/questions/5716362/how-can-i-get-the-value-of-mysqls-ft-min-word-len-configuration-variable-usin to retrieve the value on install and maybe a setting somewhere, where it can be readjusted ($config maybe)? It's hard to imagine this value to change often. 7 Link to comment Share on other sites More sharing options...
Tom. Posted December 22, 2015 Author Share Posted December 22, 2015 @ryan How about using this http://stackoverflow.com/questions/5716362/how-can-i-get-the-value-of-mysqls-ft-min-word-len-configuration-variable-usin to retrieve the value on install and maybe a setting somewhere, where it can be readjusted ($config maybe)? It's hard to imagine this value to change often. "Come to think of it, it might be nice if PW just did this for you automatically when it detected the condition... but we don't know what the ft_min_word_len setting is. " +1 This would be awesome. Ryan, you'll have to let me buy you a drink - (Do you have PayPal?) Link to comment Share on other sites More sharing options...
ryan Posted December 23, 2015 Share Posted December 23, 2015 Hey guys, this upgrade now appears on PW 3.0 (devns) branch. If you are using the ~= operator in a search, like "title~=disability act" it now asks MySQL for the ft_min_word_len (per @LostKobrakai's suggestion above). If any word length falls under that threshold, it delegates that word to a separate REGEXP command in MySQL that matches whole words. This is actually an improvement over the code I mentioned earlier because the previous code that converts it to use a "%=" for short words could match things like "disability fact" or "react to this disability" etc. Whereas the new core addition will only match whole words in the same way that the MySQL index would, meaning the field would only match if it contained the exact words "disability" AND "act" somewhere in the field, and not some word that happens to contain the partial word "act". To take it further, it also now uses this technique to match stopwords that MySQL doesn't full-text index. Previously it would just exclude them from the search since the presence of a stop word would prevent a search from matching. Now you can perform searches that include those stop words and match them. 14 Link to comment Share on other sites More sharing options...
opalepatrick Posted February 3, 2016 Share Posted February 3, 2016 Sorry about this I should have started a new thread. Link to comment Share on other sites More sharing options...
Mike-it Posted October 28, 2020 Share Posted October 28, 2020 (edited) @ryan: Reopening this old thread... Searching for a short word (three letters) via the search.php template (which uses the ~= operator) works as described in pw 3.0.148, but does not seem to work anymore in 3.0.165. Attached the images Pre and Post upgrade on a clean installation of the Minimal Site Profile from v 3.0.133 to v 3.0.165 Edited October 30, 2020 by Mike-it Tested also with the classic Minimal Site Profile Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now