Jump to content

Pages not showing up in site search


Lance O.
 Share

Recommended Posts

I've set up search on a client site and discovered that the search results are a bit finicky.

Searching for meter returns a faculty page with the title of "Timothy L Van Meter". But searching for van meter returns no results.

My selector looks like this:

$pages->find(template=campus-view|campus-view-archive|campus-view-article|component-event|component-feature|component-post|component-showcase|component-video|faculty|faculty-directory|home|news-archive|news-publications|page|post|section|seminary-hill-farm|slide|staff|staff-directory|videos,title|headline|heading|subhead|lead|body|profile|summary|meta_keywords~=van Meter,limit=12);

Changing the selector operator to %= produces results, but the client doesn't want visitors to the site to have to enter an exact word or phrase.

Can someone explain why the more specific "van meter" does not display in results, but that "meter" does?

Link to comment
Share on other sites

MySQL's fulltext search has a minimum word length, which is 4 by default. Shorter words are not included in the index (and neither are stopwords, i.e. common words like 'the', 'have' or 'some'). You can read a bit about it in PW's selector documentation, where you'll also find a link to MySQL's docs on changing minimum word length. Fulltext search is always a bit of double-edged sword, as the price for fast search for natural language is payed with omission of short and very common words (words present in more than 50% of the examined rows aren't taken into consideration), no matter if you use MySQL, Lucene or any other fulltext engine.

Splitting search terms on whitespace and adding a % selector expression for each of them may be an option, though that puts more load on the server.

  • Like 1
Link to comment
Share on other sites

Splitting search terms on whitespace and adding a % selector expression for each of them may be an option, though that puts more load on the server.

What does this exactly do?

Can you give a short example how the search phrase has to passed to PW?

Link to comment
Share on other sites

Nothing exotic, just changing

title|headline|heading|subhead|lead|body|profile|summary|meta_keywords~=van Meter

to

title|headline|heading|subhead|lead|body|profile|summary|meta_keywords%=van,
title|headline|heading|subhead|lead|body|profile|summary|meta_keywords%=Meter

In our PW based intranet, where I use OpenSearchServer to search pages, attachments and external files, I've had the same problem (with a few tweaks due to stemming and synonyms), and after a lot of testing this and that, I compromised on setting minimum word length to three and adding an "exact search" option for those cases where fulltext searching returned too many or too few results. Unfortunately, I haven't found the perfect, intuitive find-it-all solution yet :(

  • Like 1
Link to comment
Share on other sites

Thank you, BitPoet and horst.

It looks like some things have changed in ProcessWire 3.x. See "Improvements to the ~= operator in page finding operations" in the following post:

https://processwire.com/blog/posts/merry-christmas-heres-processwire-3.0.3-and-2.7.3-and-some-more/

I'm going to upgrade the site and see if this helps with the results. If not, I'll try splitting each search term as mentioned above.

  • Like 2
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...