lenno Posted November 14, 2016 Share Posted November 14, 2016 Hi, Our ProcessWire website has a simple site search, which is implemented following the ProcessWire search demo code. It uses the %= operator to get broader results. $queryResults = $this->pages->find('title%=应用'); // return everything The problem: It will return ALL pages. The reason is probably that %= uses SQL LIKE and SQL needs to be told that the search string contains unicode characters by adding 'N' as the prefix. At least this is what I think I learned from: http://stackoverflow.com/questions/22156413/how-to-select-rows-with-chinese-japanese-characters I tried this, but I still get everything: $queryResults = $this->pages->find("title%=N'应用'"); // return everything I also tried encoding my chinese text: $queryResults = $this->pages->find('title%=\u5e94\u7528'); // no results But now I get zero results. And by the way: The ~= operator also returns zero results: $queryResults = $this->pages->find('title~=\u5e94\u7528'); // no results $queryResults = $this->pages->find('title~=应用'); // also no results I guess I could build my own SQL queries, but we have already build a very nice customized search based on ProcessWire selectors and would really like to stick with them. Any idea how to solve this? Many thanks, René Link to comment Share on other sites More sharing options...
BitPoet Posted November 14, 2016 Share Posted November 14, 2016 The stackoverflow post relates to Microsoft SQL Server, not MySQL. Searching for Unicode characters works fine in PW. If your characters get stripped, my bet is that they aren't valid utf8, which means the script itself is in another encoding, and/or the webserver announces a different one. I just tested by creating a page titled "你好 World". Running <?php foreach($pages->find('title%=你好') as $p) { echo $p->id . ": " . $p->title . PHP_EOL; } returned exactly that one entry as it should. To check, save your test code as a script and set your editor to utf8-encode it. Don't trust the command line. If that works, check the encodign headers of the webserver using the developer console (F12). 3 Link to comment Share on other sites More sharing options...
lenno Posted November 14, 2016 Author Share Posted November 14, 2016 Thanks BitPoet I will try that tomorrow and report back! Edit: So that was unfortunate, for some reason I had some trailing spaces in the chinese search query that messed everything up. Sorry for the confusion. I am getting the expected results now when using %=. However, ~= still seems to return zero results. Could you try running your above code with the ~= selector? From my understanding, it should return the same results. But in my case, it returns nothing. Link to comment Share on other sites More sharing options...
lenno Posted November 15, 2016 Author Share Posted November 15, 2016 double post.. Link to comment Share on other sites More sharing options...
BitPoet Posted November 15, 2016 Share Posted November 15, 2016 18 hours ago, lenno said: However, ~= still seems to return zero results. Could you try running your above code with the ~= selector? From my understanding, it should return the same results. But in my case, it returns nothing. That's because ~= uses fulltext indexes for natural language search, which is limited by its ft_min_word_len setting that defaults to 4 characters (unicode code points). PW's selector operator docs explain that. You could lower that setting, but you have to be aware that MySQL doesn't support proper detection of word boundaries for Asian languages. Recent MySQL versions (>=5.7.6) have the option to let an ngram parser divide sequences of unicode characters into groups (ngrams) of a configured length, but that is also quite a fuzzy thing and may summarily not give the results you want. Usually, Oracle's recommendation is to use a third-party search engine that comes with a dictionary-based preprocessor, e.g. the Sphinx engine. 4 Link to comment Share on other sites More sharing options...
lenno Posted November 15, 2016 Author Share Posted November 15, 2016 Well, I guess I could have figured that out by myself. Thanks a lot BitPoet! The ProcessWire support here in the forum is really great. 1 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now