sujag Posted December 27, 2022 Posted December 27, 2022 I have a strange problem with finding a certain (german) text in my page. On kulturfeste.de a search for "Festival für Freunde" https://kulturfeste.de/suche/?q=festival+für+freunde&submit= doesn't get results, omitting "für" https://kulturfeste.de/suche/?q=festival+freunde&submit= get's beside others the expected result https://kulturfeste.de/feste/festival-fuer-freunde/ The used selectors for $pages->find are 'title|headline|Intro~*=festival für freunde, limit=50, template=fest' and 'title|headline|body~*=festival für freunde, limit=150, template=veranstaltung,sort=eventTime' The search text appears multiple times in the relevant fields, it seems not to be the word "für" as possible stopword and not the umlaut 'ü' Now I'm running out of ideas. What else could I check for?
ottogal Posted December 27, 2022 Posted December 27, 2022 1 hour ago, sujag said: a search for "Festival für Freunde" https://kulturfeste.de/suche/?q=festival+für+freunde&submit= doesn't get results, On that page it says Suche nach "festival für freunde" It might be a case issue. Usually a search term in double quotes is case sensitive.
sujag Posted December 28, 2022 Author Posted December 28, 2022 Thanks, but the quotes are added by the template, they are not part of the search term and this wouldn't explain a difference between the search phrases "Festival für Freunde" and "Festival Freunde" (both submitted without quotes)
aagd Posted December 28, 2022 Posted December 28, 2022 I'm just guessing, but I think "für" is simply too short to get indexed, but you expect to match the exact phrase by using the `*=` selector operator. You might try `%=` instead. See also @BitPoet's post in Help understanding search results - General Support - ProcessWire Support Forums
sujag Posted December 30, 2022 Author Posted December 30, 2022 It does indeed work with the *= operator, but I don't quite understand why and I get more than the wanted results. I think, too short can't be the reason, cause the search without "für" works as well another search after "Gesänge der Mönche" where the article "der" doesn't interfere with the result.
aagd Posted January 2, 2023 Posted January 2, 2023 Do you have a custom MySQL config? Because it seems to me that by default it shouldn't find für or der: Quote The *= and ~= rely upon MySQL fulltext indexes, which only index words of at least a certain length (configurable, but typically 4 characters) Source: Using selectors in ProcessWire CMS
sujag Posted January 3, 2023 Author Posted January 3, 2023 I'm on a shared hosting, don't know about a custom config. But anyway, shouldn't stopwords just be ignored? In the search phrase and the result? Then there shouldn't be a different result searching for "Festival Freunde" and "Festival für Freunde"
aagd Posted January 3, 2023 Posted January 3, 2023 Is "für" a really stopword? Try $database->getStopwords() method - ProcessWire API to find out.
sujag Posted January 4, 2023 Author Posted January 4, 2023 To be honest I don't think that stopwords are the problem. I checked this a few days ago and got only an english stopword list. Then I switched to testing different phrases and decided to ask others. Another try to explain the problem: I get the same results when searching for "Gesang der Mönche" and "Gesang Mönche" but different when searching for "Festival für Freunde" und "Festival Freunde" Why?
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now