Powerful new text-searching abilities in 3.0.160

In ProcessWire 3.0.160 we’ve got some major upgrades and additions to our text-matching selectors and operators. This brings a whole new level of power to $pages->find() and similar API calls, especially when it comes to search engine type queries.

We will briefly walk through all the new operators in this post to introduce them. Following that is a demo search engine that lets you test all of them out with pages on this site. All of these new operators are available and ready to use in ProcessWire 3.0.160 API calls. Note however that they are not yet available interactively in InputfieldSelector (Lister/ListerPro) or other places where you might select operators interactively in the admin. They should be by 3.0.161 though. Just in case you missed the forum post last week, 3.0.160 also includes some nice two-factor authentication upgrades as well.

In the examples below, you'll see unique search queries included with each. If you don't know what terms to try in the search engine further down on this page, I recommend using the terms in the examples for results that help to demonstrate how the operator works.

Newly added search operators

Contains words partial ~*=

$pages->find('title~*=web image'); 

This is like the existing "match words" ~= operator except that rather than just matching whole words, it can match partial words as well. So that means that a search for "web image" on this site will match terms like "WebP" and "images" (plural), rather than only matching "web" and "image" (singular).

Contains words live ~~=

$pages->find('title~~=api pro'); 

This new operator is designed to work exactly like the existing "match words" ~= operator except that the last word is considered a "partial match" word rather than a full match word. That makes this particular operator useful in live-search situations where you are returning results as someone types.

Contains words like ~%=

$pages->find('title~%=build site');

This operator matches all words in the query in full or in part. It can perform partial matches not just from the beginning of the word, but anywhere within the word. That means that a word like "build" can match words like "building" and "rebuild", and words like "site" can match words like "website" and "sites".

Contains words and expand

$pages->find('title~+=books');

This operator works exactly like the regular "match words" ~= operator except that it also adds in "query expansion". This is a feature of MySQL fulltext indexes where in the best case it seems to magically come up with related matches, even if they don't contain the original search terms. As far as I can tell, it analyzes the matching results and looks for words in the match that might be fairly unique, checks if those words appear on any other page titles, and bundles them into the results when they do. As an example, use this operator in the search engine below to search for the term "books", and notice how it matches "Canongate Books" and a related blog post that doesn't even mention the term "books" in the title — pretty cool huh? Well, at least with short queries it can be. The longer the query, the more likely that query expansion is to introduce noise into the results, but that's to be expected.

Contains any words ~|=

$pages->find('title~|=architecture engineering construction');

ProcessWire's word matching operators have required that all of the mentioned words match, but not anymore. Now we have a few new implied OR operators like ~|= (and others, mentioned below) that specify "any" words can match. As you might know, you can also do this by separating each word with a pipe, i.e. "this|that", but that's a bit more work than we'd like when processing searches from input. In addition, because the feature is native to this operator, it is able to perform the OR match more quickly and efficiently than if we were separating our search values with pipes.

Contains any partial words ~|*=

$pages->find('title~|*=auth edit');

This is just like the previous operator except that rather than just matching any whole word, it'll also match any partial word. That means that words like "auth" will match "author" and "authentication", and words like "edit" will match "editor" and "editing".

Contains any words like ~|%=

$pages->find('title~|%=grade port');

This is just like the above mentioned ~|*= operator except that it uses LIKE rather than the fulltext index to perform matches. That means that it can be slower (at scale) but can match anywhere in words, rather than just from the beginning of them. That means that a term like "grade" can also match "upgrade" and a term like "port" can match "import", "export", "portal", etc.

Contains phrase and expand *+=

$pages->find('title*+=CDN');

This is a variation of our common "phrase match" *= operator that adds in MySQL query expansion. That enables it to add in other potentially relevant pages into the matches, even if they don't have the target phrase. For instance, notice that a search for "CDN" also matches pages about ProCache, even if they don't mention CDNs in the title.

Contains match **=

$pages->find('title**=hook selectors');

This uses something more like the standard fulltext MATCH/AGAINST logic included with MySQL than most of the other operators. For those that want this more traditional search logic, this operator provides it. It behaves in an OR fashion with the words. Since fulltext indexes consider "book" and "books" to be unrelated words (unless partial matching), ProcessWire also includes plural versions of singular words, when possible.

Contains match and expand **+=

$pages->find('title**+=login register');

This is just like the "contains match" operator mentioned above, except that it also adds in MySQL query expansion, discussed in earlier operators.

Advanced text search #=

$pages->find('title#=+image* -file*');

Searches using this operator recognize special command characters that designate what is, and is not included. When "+" is prefixed to a word or quoted phrase, it indicates that term must be included. When "-" is prefixed to a quoted word or phrase, it indicates that term must NOT be included. When there is no prefix, then it means the term or phrase is optional but its presence will increase ranking. Any word can be appended with an asterisk "*" to indicate that you also want to match any words that begin with the term. For example "bar*" will match not just bar, but also barn, barbell, barge, etc.

Test out any of these new operators and/or compare the results with existing operators below. This engine searches mostly blog posts and sites directory pages on this site, and a few others. Note that it is limited to 10 results per operator, per search. If you aren't sure what to search for, try using the terms from any of the examples above, all of which were identified as a good demonstration of the operator they appear with.

The search tests demo has expired since this post is more than 3 months old. Please use your copy of ProcessWire 3.0.164+ to test the new search operators.

Comments

  • HMCB

    HMCB

    • 5 months ago
    • 11
    Might awesome. Thank you.

    Wondering if you can show an example of this in use: Contains match **=

    • ryan

      ryan

      • 5 months ago
      • 00
      Pretty much any search is going to be a good example to use with the **= operator. That's because it uses the standard match/against logic, which I think is pretty useful. It'll match any of the words in the query, and I think the default MySQL ranking/order tends to be pretty good here too. Try searching for "conditional hooks" in the search engine demo.
  • Pete

    Pete

    • 5 months ago
    • 41
    Is it possible for "contains any words" to return the results that match more words first?

    If I search for "processwire website" (without the quotes) there's a result with both words but it's halfway down the list.

    • ryan

      ryan

      • 5 months ago
      • 30
      That particular search might be problematic to use here because the term "processwire" likely appears on at least half the blog posts being searched, so as far as the fulltext index goes, I think it might be considered a noise word at that point, but I will look into it further. I do see we've at least got a score on a match we'd like near the top so there's a good chance we can solve this by PW explicitly specifying the sort here. I'll experiment with it more here.
  • MrSnoozles

    MrSnoozles

    • 5 months ago
    • 70
    Wow. This was unexpected and is incredibly helpful.
  • Ivan Gretsky

    Ivan Gretsky

    • 5 months ago
    • 11
    Great addition!
 

NextProcessWire 3.0.164 new master version

2

This week we’re proud to announce the newest ProcessWire master version 3.0.164. Relative to the previous master version (3.0.148) this version adds a ton of new and useful features and fixes more than 85 issues, with more than 225 commits over a period of 7 months. More 

Twitter updates

  • There’s a new modules directory on the ProcessWire site now up and running. In this post we’ll cover a few details about what’s changed and what’s new—More
    20 November 2020
  • ProcessWire 3.0.168 core updates — More
    26 October 2020
  • This week a 2nd new module for processing Stripe payments has been added to FormBuilder. Unlike our other Stripe Inputfield, this new one supports 3D Secure (SCA) payments. We’ll take a closer look at it in this post, plus a live demo— More
    16 October 2020

Latest news

  • ProcessWire Weekly #341
    In the 341st issue of ProcessWire Weekly we're going to check out the latest processwire.com blog post, introduce upcoming commercial module called NiftyPasswordsPlus, and check out a brand new site of the week. Read on!
    Weekly.pw / 21 November 2020
  • New ProcessWire modules directory
    There’s a new modules directory on the ProcessWire site now up and running. In this post we’ll cover a few details about what’s changed and what’s new.
    Blog / 20 November 2020
  • Subscribe to weekly ProcessWire news

“I am currently managing a ProcessWire site with 2 million+ pages. It’s admirably fast, and much, much faster than any other CMS we tested.” —Nickie, Web developer