Jump to content

Selectors - differences between use on $pages and PageArray due to character collations

Recommended Posts

I've just discovered something that appeared strange: selectors work transparently, but are actually context sensitive, yielding different results.

// Check for a 'hello' tag:
$selector = 'template=tag, title=hello';
$existingPages = $pages->find($selector);
d($existingPages->count, "✔ No pages, as expected");

// Create the hello tag (noting capital letter):
$aTagPage = $pages->newPage(['template' => 'tag', 'title' => 'Hello']);
// Add the newly created page to the empty $existingPages PageArray

// Repeat the search for the hello tag:
$pageArray1 = $pages->find($selector);
d($pageArray1->count, '✔ Works ok');

// The following methods fail.
d($pageArray1->get($selector), 'Tag hello not found, despite same selector');
d($existingPages->get($selector), 'Tag hello also not found this way.');
d($pages->find('template=tag')->get('title=hello'), 'Not found like this, either');



I believe what's happening here is that when the data is loaded, the selector uses a PHP implementation, and when the data is not loaded, it uses an SQL implementation. And the results likely differ (depending on your exact set-up of database tables and collations).

I can see that one solution might be to set collation of all text fields to binary/case sensitive, and just do without being able to search case insensitively; but this would create a lot of problems, since search, especially user-entered search, typically needs to be case insensitive to meet user expectations.

So the solution I'm going with for now is to try to programmatically avoid using find() etc. methods on a PageArray. This is quite inefficient.

I'd love to know if there is another solution? I read through the Select operators documentation but couldn't find mention of case sensitivity.




Link to comment
Share on other sites

The rabbit hole gets weirder: it seems the SelectorEnds `a$=b` is case-insensitive - at least to simple western languages (it uses `strcasecmp` which does not handle UTF-8 multibyte characters). But that's the only one of all the selectors to use it. Though others do use `preg_match` with the `i` flag (though not the `u` flag for unicode, so probably same as strcasecmp)... and others use stripos.

At this point in my digging, I thought: I'll consult the unit tests, then realised there aren't any - I've made a suggestion that tests would be a good thing (EDIT: and I see others have gone further and implemented some frameworks)

  • Like 1
Link to comment
Share on other sites

Art bot,
Databass selector  and.WireArray selector diffnt things and not do same thing .this is the way

one.is airplane / other is tractor

WireArray selector for any.things in memory : fields : templates : fieldgroupos : smogashsnags : ur own tipos : and if necessario - pages in memory { b.coz PageArray extendo WireArray } but u shlud use airplane for.pages
not tractor

WireArray selectors = genral purpoose for any things u can hold in WireArray in mem. ,, tractor

Databass selector == more power ful and optimizely justed for.pages ,, airplane

!!! if u using WireArray selector for |find|filter| pages 
  then you are loaded pages u should not / not necessario / not fficent / 
  donut push pages round in.tractor 


  • Like 4
Link to comment
Share on other sites

@WillyC yes, the use case was an import - a big migration from another CMS; I wanted to maintain a known list of tags in memory for efficiency, rather than sending an airplane to fetch all tags again and again - you know, for the climate's sake 😸 But anyway, it's fine. I suppose I could have maintained my own vanilla array index instead of a PageArray but I thought I was being cool using the new tools, until I found out it's a tractor. Anyway, tis done now.

@flydev thanks, but that requires me to know that there aren't two distinct items: "system change" and "change system".

At first I posted because I thought there might be something I was missing (like WireArray::setCaseInsensitiveMatching() or such). But then as I dug into it I found that the limitations are hard coded.

None of the help/docs pages I read have any warning of this; there's many examples of chaining from a db query ($pages->find()) into some other filter without warning.

  • I think there should be warnings in the docs about the different implementations, when suggesting filtering on a WireArray, as opposed to a $pages query.
  • I think there should be an equals operator that is explicitly case insensitive. (though perhaps we've exhausted every combination of symbol already...!)
  • I think the codebase should be updated to be UTF-8 compatible so that É matches é, for example on the case-insensitive matchers.

But I'm too much of a noob to do PRs yet (and I have to train my editor to use tabs not spaces 🤣)

Anyhoo, thanks for the suggestions, all.

  • Like 3
Link to comment
Share on other sites

On 4/19/2023 at 6:31 PM, artfulrobot said:

(and I have to train my editor to use tabs not spaces 🤣)

Maybe your editor is able to use / change configurations via .editorconfig? Then it would be easy to train it. 🙂 
(We use it in PW, like in many other projects too. https://editorconfig.org/)

  • Like 1
Link to comment
Share on other sites

On 4/26/2023 at 6:08 PM, artfulrobot said:

This We refers to you and your company, not the PW project?

Sorry to be so unclear! But this "we" referred to the PW project! 🤩
And yes, as @Jan Romero already said, it is only one in the wire directory, so that you can use your own in the site directories.

  • Like 2
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Create New...