Jump to content

wildcard


peterb
 Share

Recommended Posts

The ~= and *= assume wildcard operations in selectors. They can be used for searching text, textarea and related fields. The ~= operator says to find all the words in the given copy (not necessary together),  and the *= says to find the given phrase in the copy. The ~= operator assumes wildcard matches with any of the words you provide, so a search for "apple" will also match "applesauce", "applet" and "apples", and likewise for any other words in the selector. The *= operator is more restrictive in that it's looking for your words in a specific order. But if searching for only one word, both operators would produce the same result and work with partial matches. Where they differ is when your search includes multiple words.

MySQL does not support wildcards with indexed content beyond this. But it does support more selective wildcards with non-indexed LIKE operations (with the "%" and "_" wildcards). We don't use those in ProcessWire at this time because they are very slow in MySQL. In fact, too slow to suggest they be used for everyday operations. LIKE operations require full table scans, so are not a good thing to hide the implementation details of in ProcessWire. However, you can still use them if your circumstance calls for it. Here's an example searching the title field using a LIKE operation:

<?php
$result = $this->db->query("SELECT pages_id FROM field_title WHERE data LIKE '%juice%"); 
$ids = array();
while($row = $result->fetch_assoc()) $ids[] = $row['pages_id']; 
$matches = $pages->getById($ids); 

That would return pages with a title field matching "juice", "applejuice", "orangejuice" and "juices". ProcessWire selectors can match "juices" from "juice", but can't match "applejuice" or "orangejuice" from "juice" because of the way MySQL fulltext indexes work ... the match has to start from the beginning. But performing a LIKE match is a slow process, so something I recommend avoiding unless the scale of usage is under your control. So while I agree with you that additional wildcard selectors would be nice, we can't go much further without abandoning MySQL for searches and using something like Lucene (which is written in Java).

Link to comment
Share on other sites

thanks ryan,

Any idea why this works:

$results = $page->children("report_color|report_style|report_file=$s, limit=15, sort=$sort ");

finds all entries that are just "white"

But this does not?:

$results = $page->children("report_color|report_style|report_file~=$s, limit=15, sort=$sort ");

finds nothing???

I am searching for "white", and want to find data that also contains "white/blue"

Link to comment
Share on other sites

I tried doing a test locally using "title|body~=white" and it worked for me on pages titled "white" as well as "white/blue". So the next thing to look at may be the fields you mentioned. What type of fields are report_color, report_style and report_file? All text fields? If report_file is a file field, you probably need to do report_file.description, otherwise it's searching the filename.

Another thing to look at is the word and how MySQL is treating it. Is it at least 4 characters? Is it a stopword? MySQL has some limitations with fulltext indexes that make them really useful for searching sentences, paragraphs and documents, but less useful in some other instances. Though all can be adjusted in MySQL's settings. Though the only adjustment I've ever had to make to MySQL regularly is to set it to index words down to 2 characters rather than just down to 4 characters.

Regarding stopwords: With fulltext indexes, MySQL doesn't index "stopwords", defined as words that are extremely common and not unique enough for search purposes. These are words like: and, get, thus, than, and a hundred or so others.  Though "white" does not appear to be a stopword, so I don't think that's the issue. But if one of the stopwords appears in your selector that uses a ~=  operator (or *= with a single word), the word is not used in the search .. unless custom stopwords are configured in MySQL and PW. The full list of stopwords can be found here:

http://dev.mysql.com/doc/refman/5.5/en/fulltext-stopwords.html

Link to comment
Share on other sites

thanks, that helped, and now this works

$results = $page->children("report_color~=$s, limit=15, sort=$sort ");

however is does not work when I am include a page field "report_file":

$results = $page->children("report_color|report_style|report_file~=$s, limit=15, sort=$sort ");

the page field does work however, without the "~"

Link to comment
Share on other sites

The file fieldtype doesn't carry a fulltext index on the filename, but it does on the description. See if this works:

report_color|report_style|report_file.description~=$s

Link to comment
Share on other sites

Just checking, it looks like I don't have the ~= operator implemented with the file fieldtype -- I have just added it to the latest commit. But the *= operator was already implemented, so that one should already work.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...