This week I'm not bumping the version number just yet because I've got lots of work in progress. The biggest thing so far is something I hinted at last week. Basically, I like what the addition of the MySQL query expansion operators have brought (per posts last week and week before), but they also reveal what's lacking: something as simple as a search for "books" still can't directly match the word "book". But that's the most basic example. It's not a limitation of ProcessWire, but just the type of database indexes in general.
I think it'd be amazing if ProcessWire had the ability of being really smart about this stuff, able to interpolate not just plurals vs. singulars but related words. In a perfect world, this is what query expansion would do (in addition to what it already does). But the reality is that it involves all kinds of complicated logic, rules and dictionaries; well beyond the scope of even a database. And it can be vastly different depending on the language. So this isn't something we can just add to the core and have it work.
On the other hand, I figured maybe we should just put in a hookable method that just pretends the ability was there. Then people could hook it and make it respond with variations of words, according to their needs. The searches that use query expansion could then call this method and use whatever it returns... for when someday the ability is there.
So I went ahead and added that hook — WireTextTools::wordAlternates(). And our database-searching class (DatabaseQuerySelectFulltext) now calls upon it, just in case an implementation is available.
Well, after getting that hook added and having our class call it, naturally I wanted to test it out. So I got to work on it and came up with this module: WireWordTools.
The WireWordTools module provides an API for English word inflection and lemmatisation. And it hooks that new method mentioned above, so that you can install it and immediately have it bring your searches to the next level. While it only helps for English-language searches, maybe we'll be able to add more languages to it, or maybe it'll lead to other modules that do the same thing for other languages.
The expanded/alternate words are only used for searches that use the new query expansion operators, which are the ones that have a "+" in them: ~+=, ~|+=, *+=, **+=. They all can return similar results, but are weighted differently. Unlike most operators, where the logic is direct and you can expect them to always behave the same way, these query expansion operators are more subjective, and ones I think we should intend to keep tweaking and improving over time to continually improve the quality of the results they return. Basically, they are geared towards building site search engines, so I think it makes sense for us to pursue anything that makes them better at that, rather than aiming to always have them return the same thing. I am currently testing out the ~|+= operator ("contains any words expand") on our main site search engine here, along with the WireWordTools module. Finally, searching for "books" does match "book" too, and a lot more. More to be done here, but it's a good start hopefully.