I've implemented a straightforward site search, but among the correct results it is returning pages that don't exist with strange URLs.
If, for instance, the search is for "quercus" (it's a site about trees), some valid results are returned, e.g.:
[domain]/publications/general-articles/quercus-tungmaiensis/
But I also get pages that do not exist with URLs I can't explain, e.g.:
[domain]/site/en/tree-info/tree-info/-404--i-quercus-tungmaiensis-i/
[domain]/site/en/tree-info/-407--i--quercus-rubra-i/
[domain]/site/en/tree-info/tree-info/-407--i-quercus-rubra-i/
Note that there are fields in the system that contain italicised versions of the page title (e.g. "<i>Quercus rubra</i>"), and the tags may have got into the page title and name before the data was cleaned up. At one level explains the "-i-" in the invalid URLs, but it doesn't get me much further!
The search is based closely on that in the PW default site, and the relevant code is as follows:
$q = $sanitizer->text($input->get->q);
if($q) {
// Set up the search term
$input->whitelist('q', $q);
$q = $sanitizer->selectorValue($q);
// Build the selector
$selector = "title|main_text|item_description~=$q, has_parent!=2, limit=50";
// Find the pages
$matches = $pages->find($selector);
if($matches->count) {
// ...
// Render the results
// ...
}
}
I may well be missing something obvious, but at the moment I'm completely puzzled.
Anyone know what's happening?