Powerful new text-searching abilities in 3.0.160

In ProcessWire 3.0.160 we’ve got some major upgrades and additions to our text-matching selectors and operators. This brings a whole new level of power to $pages->find() and similar API calls, especially when it comes to search engine type queries.

We will briefly walk through all the new operators in this post to introduce them. Following that is a demo search engine that lets you test all of them out with pages on this site. All of these new operators are available and ready to use in ProcessWire 3.0.160 API calls. Note however that they are not yet available interactively in InputfieldSelector (Lister/ListerPro) or other places where you might select operators interactively in the admin. They should be by 3.0.161 though. Just in case you missed the forum post last week, 3.0.160 also includes some nice two-factor authentication upgrades as well.

In the examples below, you'll see unique search queries included with each. If you don't know what terms to try in the search engine further down on this page, I recommend using the terms in the examples for results that help to demonstrate how the operator works.

Newly added search operators

Contains words partial ~*=

$pages->find('title~*=web image'); 

This is like the existing "match words" ~= operator except that rather than just matching whole words, it can match partial words as well. So that means that a search for "web image" on this site will match terms like "WebP" and "images" (plural), rather than only matching "web" and "image" (singular).

Contains words live ~~=

$pages->find('title~~=api pro'); 

This new operator is designed to work exactly like the existing "match words" ~= operator except that the last word is considered a "partial match" word rather than a full match word. That makes this particular operator useful in live-search situations where you are returning results as someone types.

Contains words like ~%=

$pages->find('title~%=build site');

This operator matches all words in the query in full or in part. It can perform partial matches not just from the beginning of the word, but anywhere within the word. That means that a word like "build" can match words like "building" and "rebuild", and words like "site" can match words like "website" and "sites".

Contains words and expand

$pages->find('title~+=books');

This operator works exactly like the regular "match words" ~= operator except that it also adds in "query expansion". This is a feature of MySQL fulltext indexes where in the best case it seems to magically come up with related matches, even if they don't contain the original search terms. As far as I can tell, it analyzes the matching results and looks for words in the match that might be fairly unique, checks if those words appear on any other page titles, and bundles them into the results when they do. As an example, use this operator in the search engine below to search for the term "books", and notice how it matches "Canongate Books" and a related blog post that doesn't even mention the term "books" in the title — pretty cool huh? Well, at least with short queries it can be. The longer the query, the more likely that query expansion is to introduce noise into the results, but that's to be expected.

Contains any words ~|=

$pages->find('title~|=architecture engineering construction');

ProcessWire's word matching operators have required that all of the mentioned words match, but not anymore. Now we have a few new implied OR operators like ~|= (and others, mentioned below) that specify "any" words can match. As you might know, you can also do this by separating each word with a pipe, i.e. "this|that", but that's a bit more work than we'd like when processing searches from input. In addition, because the feature is native to this operator, it is able to perform the OR match more quickly and efficiently than if we were separating our search values with pipes.

Contains any partial words ~|*=

$pages->find('title~|*=auth edit');

This is just like the previous operator except that rather than just matching any whole word, it'll also match any partial word. That means that words like "auth" will match "author" and "authentication", and words like "edit" will match "editor" and "editing".

Contains any words like ~|%=

$pages->find('title~|%=grade port');

This is just like the above mentioned ~|*= operator except that it uses LIKE rather than the fulltext index to perform matches. That means that it can be slower (at scale) but can match anywhere in words, rather than just from the beginning of them. That means that a term like "grade" can also match "upgrade" and a term like "port" can match "import", "export", "portal", etc.

Contains phrase and expand *+=

$pages->find('title*+=CDN');

This is a variation of our common "phrase match" *= operator that adds in MySQL query expansion. That enables it to add in other potentially relevant pages into the matches, even if they don't have the target phrase. For instance, notice that a search for "CDN" also matches pages about ProCache, even if they don't mention CDNs in the title.

Contains match **=

$pages->find('title**=hook selectors');

This uses something more like the standard fulltext MATCH/AGAINST logic included with MySQL than most of the other operators. For those that want this more traditional search logic, this operator provides it. It behaves in an OR fashion with the words. Since fulltext indexes consider "book" and "books" to be unrelated words (unless partial matching), ProcessWire also includes plural versions of singular words, when possible.

Contains match and expand **+=

$pages->find('title**+=login register');

This is just like the "contains match" operator mentioned above, except that it also adds in MySQL query expansion, discussed in earlier operators.

Advanced text search #=

$pages->find('title#=+image* -file*');

Searches using this operator recognize special command characters that designate what is, and is not included. When "+" is prefixed to a word or quoted phrase, it indicates that term must be included. When "-" is prefixed to a quoted word or phrase, it indicates that term must NOT be included. When there is no prefix, then it means the term or phrase is optional but its presence will increase ranking. Any word can be appended with an asterisk "*" to indicate that you also want to match any words that begin with the term. For example "bar*" will match not just bar, but also barn, barbell, barge, etc.

Test out any of these new operators and/or compare the results with existing operators below. This engine searches mostly blog posts and sites directory pages on this site, and a few others. Note that it is limited to 10 results per operator, per search. If you aren't sure what to search for, try using the terms from any of the examples above, all of which were identified as a good demonstration of the operator they appear with.

1. Check boxes for the operators you want to test

TypeOpDescription
*=Given phrase or word appears in value compared to.
%=Given text appears in compared value, without regard to word boundaries.
~=All given whole words appear in compared value, in any order.
~*=All given partial and whole words appear in compared value, in any order. Partial matches from beginning of words.
~~=All given whole words—and at least partial last word—appear in compared value, in any order.
~%=All given partial or whole words appear in compared value (in any order) without regard to word boundaries.
~+=All given whole words appear in compared value (in any order) and expand to match related values.
~|=Any of the given whole words appear in compared value.
~|*=Any of the given partial or whole words appear in compared value. Partial matches from beginning of words.
~|%=Any of the given partial or whole words appear in compared value, without regard to word boundaries.
*+=Given phrase, word or related terms appear in value compared to.
**=Any or all of the given words match compared value using default database logic and score.
**+=Any or all of the given words match compared value using default database logic and score. Plus, expands to include potentially related results.
#=Match values with commands: +Word MUST appear, -Word MUST NOT appear, and unprefixed Word MAY appear (at least one matches). Add asterisk for partial match: Bar* or +Bar* matches bar, barn, barge; while -Bar* prevents matching them. Use quotes to match phrases: +"Must Match", -"Must Not Match", or "May Match".
^=Given word or phrase appears at beginning of compared value.
%^=Given text appears at beginning of compared value, without regard for word boundaries.
$=Given word or phrase appears at end of compared value.
%$=Given text appears at end of compared value, without regard for word boundaries.

2. Enter your search query

Comments

  • HMCB

    HMCB

    • 7 days ago
    • 00
    Might awesome. Thank you.

    Wondering if you can show an example of this in use: Contains match **=

    • ryan

      ryan

      • 5 days ago
      • 00
      Pretty much any search is going to be a good example to use with the **= operator. That's because it uses the standard match/against logic, which I think is pretty useful. It'll match any of the words in the query, and I think the default MySQL ranking/order tends to be pretty good here too. Try searching for "conditional hooks" in the search engine demo.
  • Pete

    Pete

    • 7 days ago
    • 30
    Is it possible for "contains any words" to return the results that match more words first?

    If I search for "processwire website" (without the quotes) there's a result with both words but it's halfway down the list.

    • ryan

      ryan

      • 5 days ago
      • 10
      That particular search might be problematic to use here because the term "processwire" likely appears on at least half the blog posts being searched, so as far as the fulltext index goes, I think it might be considered a noise word at that point, but I will look into it further. I do see we've at least got a score on a match we'd like near the top so there's a good chance we can solve this by PW explicitly specifying the sort here. I'll experiment with it more here.
  • MrSnoozles

    MrSnoozles

    • 6 days ago
    • 30
    Wow. This was unexpected and is incredibly helpful.

Post a comment

 

PrevProcessWire 3.0.154 and 3.0.155 core updates

This post covers a few of the bigger updates in ProcessWire 3.0.154 and 3.0.155 on the dev branch. This includes a new function for live replacement of text in core and modules, a new method for creating canonical URLs, and some major upgrades to our $input->urlSegment() method that I think you’ll like! More 

Twitter updates

  • ProcessWire 3.0.160 adds powerful new text-searching operators, bringing a new level of power to page-finding API calls, especially when it comes to search engine type queries. Post also includes a demo search engine where you can test it all out live— More
    19 June 2020
  • Preview of ProcessWire 3.0.160 with auto-enable of two-factor authentication, new version of TfaEmail and TfaTotp, and new selector operators coming next week. More
    12 June 2020
  • New post: ProcessWire 3.0.159 brings some useful and time-saving upgrades to the core two-factor authentication system— More
    5 June 2020

Latest news

  • ProcessWire Weekly #319
    In the 319th issue of ProcessWire Weekly we're going to check out the latest core updates, introduce a couple of new third party modules, and highlight the downright stunning new website of Studio Pixelgold. Read on!
    Weekly.pw / 21 June 2020
  • Powerful new text-searching abilities in 3.0.160
    In ProcessWire 3.0.160 we’ve got some major upgrades and additions to our text-search abilities. This brings a whole new level of power to $pages->find() and similar API calls, especially when it comes to search engine type queries.
    Blog / 19 June 2020
  • Subscribe to weekly ProcessWire news

I just love the easy and intuitive ProcessWire API. ProcessWire rocks!” —Jens Martsch, Web developer