Jump to content

PW 3.0.161 – Core updates


ryan
 Share

Recommended Posts

Version 3.0.161 on the dev branch continues with the updates optimizing our support for the new selector operators introduced in last week's blog post for 3.0.160. Last week I was still kind of figuring it out and the code still needed some refactoring and optimization. This week several parts have been rewritten and it's been improved quite a bit. Though the end result is still very similar to what was demonstrated last week, but now it's a lot more performant and solid.

One thing new this week that's also kind of fun: you can now use more than one operator in any "field=value" selector expression, at least from the API side. And it works anywhere that you might use a selector, whether querying the database or something in memory. I think the best way to explain it is with an example. Let's say that you want to find all pages with a title containing the phrase "hello world" — you'd do this using the "contains text/phrase" operator: *= 

$pages->find("title*=hello world"); 

But let's also say that if you don't find any matches, you want to fallback to find any pages that contain the words "hello" and "world" anywhere in the title, in any order. We'd use the "contains all words" operator "~=" to do that. So now you can add that operator to the existing one, and it'll fallback to it if the first operator fails to match. So we'll append the "contains all words" operator to the previous one: ~=

$pages->find("title*=~=hello world"); 

Cool huh? But maybe we still aren't finding any matches, so we want to fallback to something even broader. So if the phrase match fails, and the words match fails, now we want to fallback to find any pages that contain the world "hello" OR "world", rather than requiring them both. For that we can use our new "contains any words" operator: ~|= 

$pages->find("title*=~=~|=hello world"); 

This example is getting a bit contrived now, but let's say that if we still haven't found a match, we want it to find any pages that have any words starting with "hello" or "world", so it would find pages with words like "helloes", "worlds", "worldwide", etc. That's a job for our new "contains any partial words" operator: ~|*=

$pages->find("title*=~=~|=~|*=hello world"); 

Okay last one I promise—you probably wouldn't stack this many in real life, but stay with me here. Let's say the query still didn't find anything, and as a final fallback, we want it to find any words LIKE "hello" or "world", so that those terms can match anywhere in words, enabling us to find pages with words like in the previous example, but also words like "phellogen", "othello", "underworld", "otherworldly", etc., and that's a job for our new "contains any words like" operator: ~|%=

$pages->find("title*=~=~|=~|*=~|%=hello world"); 

So that looks like a pretty complex operator there, but as you've seen by following the example, it's just these 5 appended operators to each other:

*=   Contains phrase
~=   Contains all whole words
~|=  Contains any whole words
~|*= Contains any partial words
~|%= Contains any words like

I think a more likely scenario in a site search is that you might stack two operators, such as the *= followed by the ~|*=, or whatever combination suits your need. By the way, you can do this with any operators, not just the text searching ones. But if you didn't read the blog post last week, also be sure to check out the other new operators  in addition to those above: *+=, **=, **+=, ~*=, ~+=, ~~=, ~%=, #=. 

I think these new operators help out quite a bit with ProcessWire's text searching abilities. But there's one thing that's kind of a common need in search engines that's not easily solved, and that's the handling of singular vs. plural. At least, that's a common issue when it comes to English (I'm assuming so for other languages, but not positive?) MySQL fulltext indexes can't differentiate between singular and plural versions of words, so they index and match them independently as completely different words. This can be unexpected as clients might type in "goose" and expect it's also going to match pages with "geese". I've already got something in the works for this, so stay tuned. Thanks for reading and hope you have a great weekend!

  • Like 16
  • Thanks 2
Link to comment
Share on other sites

11 hours ago, ryan said:

I think these new operators help out quite a bit with ProcessWire's text searching abilities. But there's one thing that's kind of a common need in search engines that's not easily solved, and that's the handling of singular vs. plural. At least, that's a common issue when it comes to English (I'm assuming so for other languages, but not positive?) MySQL fulltext indexes can't differentiate between singular and plural versions of words, so they index and match them independently as completely different words. This can be unexpected as clients might type in "goose" and expect it's also going to match pages with "geese". I've already got something in the works for this, so stay tuned. Thanks for reading and hope you have a great weekend!

First of all: I definitely agree with your first point. I need to do some testing and probably add additional configurability to the SearchEngine module after these updates. Some pretty interesting options we've got now!

In the past I've steered away from the full-text index because it has resulted in way too many "oddities", i.e. some queries just don't work either due to stopwords or something else, and it seems to be (in my experience) particularly problematic if the site is authored in some other language than English. You've given me a reason to revisit this, though ?

As for plurals, it sure is an issue with other languages as well. I have no idea what you've got planned in that regard, but one problem here is that it gets pretty complex if you want to support other languages than just English (which is, actually, one of the easiest languages in this regard). And then there's of course the bigger issue of stemming — considering your goose example, if a client was searching for "hanhi" (goose in Finnish), it's true that they would likely expect it to also match "hanhet" (plural), but if that works then why not other forms such as "hanhia", "hanhen", "hanhien", "hanhesta", "hanheen", etc.

Anyway, very much looking forward to what you've got planned! Just wanted to point out that this can be a pretty big issue to tackle, at least unless limited to English only, or perhaps made so modular that each language can have its own set of rules. Also it's a slippery slope, and once things like stemming come into play we're in really deep waters ?

  • Like 7
Link to comment
Share on other sites

  • 2 weeks later...

Thank you, these new search operators sound great. ?

What combination would you recommend for a classic website search form? I use until now %= (contains text like). Maybe I could improve my search form with a combination like ~|%= or so.

I use for my search form also this:

FieldtypeText.extends%=

But this is not supported by the new search operators yet?

Regards, Andreas

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...