Jump to content

Selector issue and differences


celfred
 Share

Recommended Posts

Hello,

I've tested some selectors on my site and I'm having a little question here. My 'page tree' look like this :

/guests

/guests/admin/

/guests/admin/Laurence

/guests/admin/Le p'tit

/guests/admin/Élodie

As you can see, I have names with special characters.

I'm trying to retrieve the pages for an autocomplete field on the front-end. So far so good, it worked ok, until I notices I had no result for 'l', that's when I added the 'Laurence' page and it worked. So I thought the problem was due to the apostrophe...

Anyway, I used the new 'Selector test' module (thanks to the creator!) and noticed something :

parent=/guests/41,title*=l >> returns 1 page found : Laurence (Why not 'Le p'tit' ?)
parent=/guests/41,title^=l >> returns 1 page found : Laurence
parent=/guests/41,title%=l >> returns 3 pages found : Laurence, Le p'tit, Élodie (no apostrophe problem?)

And :

parent=/guests/41,title$=ie >> returns 1 page found : Élodie
parent=/guests/41,title$=die >> returns 0 pages found ? Why not 'Élodie'?

So I don't really know what to think... Am I doing something wrong here?

Thanks if anyone has time to give me a clue ;-)

Link to comment
Share on other sites

The problem with "Le" is that, by default, mysql fulltext searches don't return words smaller than 3 letters (you can confirm it by searching for small words here in the forum). See here http://dev.mysql.com...l-language.html. You could change this behaviuor in your mysql install, but it would affect all your sites, and it's not that easy or recommended: http://dev.mysql.com...ine-tuning.html

I'm don't know about the "die" thing... still scratching my head ???

Link to comment
Share on other sites

Diogo and netcarver both make valid point (although I thought fulltext indexes (indices?) didn't index words less than 4 characters long). The other thing is that mySQL doesn't come with French stopwords by default, just English.

Try using the %= (mySQL LIKE - ie not fulltext) comparison operator (http://processwire.com/api/selectors/) to make sure that fulltext issues are not affecting your results.

Link to comment
Share on other sites

Well, that was a lot more technical than I thought it would be!

Anyway, I understand a little better now. I actually didn't even notice my 'Le p'tit' search was not 1 word but 2 words...

Looking at the links above was quite helpful and enlightening. In the PW documentation, I didn't see the line concerning %= :

While slower, this operator has an advantage over the *= operator when you may need to match very short words or stopwords.

Still, I can't understand the $= issue : ie vs. die : 1 result vs. 0 result ? Anyway, this was just a test, I don't need it (now)...

Thanks a lot for your help and explanations!

Link to comment
Share on other sites

It seems that "$=ie/die" issue has to do with several things. Buggish behaviour, I'd say. I'll try to explain some of the things that are happening with operators ^=, $= and *= inside the core and database as well.

MySQL fulltext searches are able to match either whole words or word beginnings - so no way to match a word ending with a fulltext search only. PW tries to anchor search term to the beginning/end of the field value by adding a RLIKE with some regex magic. While this works nicely in the beginning on the value, it quite often fails when matching end part. Actually, when it doesn't fail, is only when

  • searching for complete words (the very meaning in the first place I suppose) OR
  • the search term is a predefined stopword.

Yes, this does sound a bit backwards, but that's how it's implemented at the moment. Stopwords ("ie" being one!) are filtered out by PW, thus leaving only that RLIKE - which matches on its own (this is why $=ie matches "Élodie").

Additionally PW forces the search term to exists (by adding a '+' operator to the beginning) and tries to find partial matches (by adding a '*' operator to the end). This is problematic because the wildcard operator only matches word beginnings and that will never happen for a word ending shorter than the whole word (this is why $=die does not match "Élodie").

And as MySQL doesn't include short words in the fulltext index at all, there's no way those searches will ever match even a whole word if it's short enough (using "^=Le" doesn't match "Le p'tit").

Well, I guess here's enough explanation for Ryan to get a hold of this when he has time. Hope I got all of the above right... But as I said before, when using fulltext searches, it's all about whole words (mostly at least). And words long enough. And not in the stopword list. :)

  • Like 5
Link to comment
Share on other sites

Thanks for the great explanation nik!

For what I understand about fulltext searches, it's also not useful to use them in such a small amount of pages because it will ignore everything that matches more than half of them.

Link to comment
Share on other sites

For your beginning/ending matches, you might also try using the %^= and %$= operators. They are the same as ^= and $= except that they bypass the fulltext index and the limitations (and benefits) that go along with that.

  • Like 1
Link to comment
Share on other sites

See... Again something I would never have thought about by myself... Thanks !

This was recently contributed by another user via a pull request, so it's currently undocumented (aside from here). I'll add this when updating the documentation for stuff in 2.3. Though these operators were added in the 2.2 stable.

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...