Jump to content

SearchEngine


teppo

Recommended Posts

Thanks @teppo - I am seeing the tabbed results now.

The only thing that's weird is that on the "All" tab, it shows all the results, then it shows the results for the different templates below separated by template, so each result shows up twice.

Viewing specific results (by clicking on a template tab/button), they are showing correctly.

Hopefully this shows what I mean. Can you see this at your end as well?

screencapture-pwtest-test-2021-03-29-12_53_55.thumb.png.32e305bdc26d32c27ea0d2690f6bd404.png

Link to comment
Share on other sites

6 minutes ago, adrian said:

The only thing that's weird is that on the "All" tab, it shows all the results, then it shows the results for the different templates below separated by template, so each result shows up twice.

Not seeing that on my end yet. Could you post the code you're using to generate this list, and perhaps a screenshot or list of settings you're using?

Link to comment
Share on other sites

<?php

$searchEngine = $modules->get('SearchEngine');

echo $searchEngine->renderStyles();
echo $searchEngine->renderScripts();

$findOptions = [
    'group_by' => 'template'
];

$renderOptions = [
    'results_grouped' => 'template.label'
];

$query = $searchEngine->find($input->get->q, $findOptions);

echo $searchEngine->renderResults($renderOptions, $query);
{"SearchEngine":{"version":"0.29.2","settings":{"indexed_fields":["title","summary","body"],"indexed_templates":["basic-page","home","license","licenses"],"find_args__sort":"sort","find_args__operator":"~=","index_pages_now_selector":"has_parent=1","index_field":"search_index","override_compatible_fieldtypes":"","compatible_fieldtypes":["FieldtypeEmail","FieldtypeFieldsetPage","FieldtypeDatetime","FieldtypeText","FieldtypeTextLanguage","FieldtypeTextarea","FieldtypeTextareaLanguage","FieldtypePageTitle","FieldtypePageTitleLanguage","FieldtypeCheckbox","FieldtypeInteger","FieldtypeFloat","FieldtypeURL","FieldtypeModule","FieldtypeFile","FieldtypeImage","FieldtypeSelector","FieldtypeOptions","FieldtypeRepeater","FieldtypeRepeaterMatrix","FieldtypePageTable","FieldtypePage","FieldtypeTable","FieldtypeTextareas"],"indexer_actions":[],"render_args__themes_directory":"","debugger_page":0,"debugger_query":"","debugger_query_args":"{}","uninstall":"","submit_save_module":"Submit"}}}

 

  • Like 1
Link to comment
Share on other sites

Thanks, @adrian! I was able to reproduce the issue and it should now be fixed in 0.29.4. Meanwhile 0.29.3 fixed those Debugger issues you mentioned earlier, and also a minor layout glitch related to grouping results list items ?

  • Like 1
Link to comment
Share on other sites

Thanks @teppo - everything looks great now.

Now I guess I need to figure out if there is any advantage to using the inbuilt grouping vs my approach. Obviously it handles all indexed templates automatically, but then mine gives the flexibility to include the results from some templates under "All", but not have a dedicated tab for them which can be handy in some cases.

My question for you then, is there any performance gain in using the inbuilt grouping? I haven't explored the code, but my instinct is that you're not really doing anything different to me, except for the big convenience factor of having this work so easily.

  • Like 1
Link to comment
Share on other sites

9 hours ago, adrian said:

mine gives the flexibility to include the results from some templates under "All", but not have a dedicated tab for them which can be handy in some cases.

+1 ? in case it is a feature request ? 

  • Like 1
Link to comment
Share on other sites

12 hours ago, adrian said:

Now I guess I need to figure out if there is any advantage to using the inbuilt grouping vs my approach. Obviously it handles all indexed templates automatically, but then mine gives the flexibility to include the results from some templates under "All", but not have a dedicated tab for them which can be handy in some cases.

My question for you then, is there any performance gain in using the inbuilt grouping? I haven't explored the code, but my instinct is that you're not really doing anything different to me, except for the big convenience factor of having this work so easily.

To be honest I don't see a big reason to switch if your solution works for you already ?

Pros for built-in approach:

  • Can be enabled or disabled at any point via arguments passed to the module.
  • SE is "aware" of the built-in grouping feature, so it's something that will likely keep working consistently and may also benefit from future updates. Not saying that yours won't keep working, though.
  • I've gone through some extra hoops to only include tabs / buttons for the templates that actually have matches. Personally I dislike it when I click that "Articles" button, wait for the page to load... and then there are no results. I don't want to waste users' time.
  • The "group_by" setting is automatically considered when returning results as a JSON feed. Returned results are split into groups. (At least they should, it's been a while since I last checked this one...)
  • Native feature may eventually get proper "autoload" support, so that switching between categories / templates doesn't cause additional delay. Again, this can be a bit tricky, and whether it's really "worth it" depends on the case.
  • It's themeable and I've tried to use markup that is somewhat accessible. To be honest there's likely more to be done in this regard ?

Whether or not any of this matters for your use case is debatable. Again, I don't see any notable reason to switch if your solution works fine already. Actually I'm pretty sure that your solution is more straightforward and results in slightly better performance.

12 hours ago, adrian said:

but then mine gives the flexibility to include the results from some templates under "All", but not have a dedicated tab for them which can be handy in some cases.

Not sure if I got this right, but SE has a couple of options that may be related. Both are under find_args:

            // Optional: values allowed for grouping.
            'group_by_allow' => [],
            // Optional: values not allowed for grouping.
            'group_by_disallow' => [],

Basically if you add a list of templates to "group_by_allow", these are the only templates that will get their own tabs. Other templates are still included in the search, and thus under the "all" tab. The "group_by_disallow" option works the other way around, templates included here will always be excluded from grouping (they won't get their own tabs, but again will be found from the "all" section.)

  • Like 2
Link to comment
Share on other sites

Hey @Erik. Could you check which PHP version you're running?

That error is likely related to nullable returns types ("?Field"), and support for those was added in PHP 7.1. This module will not work on older PHP versions.

  • Like 1
Link to comment
Share on other sites

4 hours ago, teppo said:

I've gone through some extra hoops to only include tabs / buttons for the templates that actually have matches. Personally I dislike it when I click that "Articles" button, wait for the page to load... and then there are no results. I don't want to waste users' time.

I have the same in mine, although perhaps your approach is more efficient. I went with this when outputting the tab buttons, using a ->has() to see if there are any results for the template/tab. I thought given the performance of PW's has() that this would be the best approach, but I didn't dive into it too much. I should go see what you did.

foreach($types as $name => $label):
    if($name != 'all' && $pages->has('search_index'.$findOptions['operator'].$query->query.', template='.$name) === 0) continue;
    ?>
    <a class="button small<?=((!isset($type) && $name == 'all') || (isset($type) && $name === $type) ? ' buttonactive' : '')?>" href="<?=$page->url?>?q=<?=$input->get->q?>&type=<?=$name?>"><?=$label?></a>
<?php endforeach;

 

4 hours ago, teppo said:

Basically if you add a list of templates to "group_by_allow", these are the only templates that will get their own tabs. Other templates are still included in the search, and thus under the "all" tab. The "group_by_disallow" option works the other way around, templates included here will always be excluded from grouping (they won't get their own tabs, but again will be found from the "all" section.)

Those do sound like they would do the same thing.

I think I'll set up a test with the module's core approach and see what happens.

Thanks again.

  • Like 1
Link to comment
Share on other sites

@teppo - on another note, I just got this logged error overnight:

PHP Warning: preg_match(): Compilation failed: missing closing parenthesis at offset 62 in /site/modules/SearchEngine/lib/Renderer.php:388

It came from this query: 

https://ian.umces.edu/search/?q=Ferocactus+wislizeni+(Fishhook+Barrel+Cactus

It looks like someone was trying to find that exact cactus symbol by name, but left off the trailing close parenthesis.

Currently I am using:

$query = $searchEngine->find($input->get->q, $findOptions);

as my search query. I assumed that SearchEngine would sanitize the input, but even using selectorValue or text sanitizers don't work in this case, because they both allow parentheses. Do you think this is a situation that we should manage, or SearchEngine should take care of?

  • Like 1
Link to comment
Share on other sites

53 minutes ago, adrian said:

I thought given the performance of PW's has() that this would be the best approach, but I didn't dive into it too much. I should go see what you did.

Your approach seems fine to me. If there were a lot of templates then it would result in multiple queries, but for most use cases I'd assume the performance impact to be extremely small. On the other hand the approach I've taken (modifying the DatabaseQuerySelect object on the fly) should result in just a single query, but is also potentially more fragile... ?

43 minutes ago, adrian said:

PHP Warning: preg_match(): Compilation failed: missing closing parenthesis at offset 62 in /site/modules/SearchEngine/lib/Renderer.php:388

Thanks — this is now fixed in the latest release.

  • Like 1
Link to comment
Share on other sites

Thanks for the quick fix @teppo

I set up a test search page on that site and everything works great. I am not seeing a noticeable speed difference between the tab grouping of the module vs my custom ->has() checks to exclude tabs with zero results. Still, I would like to switch over to core method, except that it doesn't seem to be possible to specify the label for tab buttons as different to the template label. In my case, it seems to be more correct to use the label for the template's parent template label - does that make sense? For example, I would prefer to see "Blog Posts", rather than "Blog Post" or "Publications" vs "Publication".

The only other issue is that I like having the "results_heading" string below the tabs, rather than above and I also like having it match the currently selected template, eg:
https://ian.umces.edu/search/?q=seagrass&type=media - this of course might be totally personal - not sure what others think of it, but I can't see an easy way to insert a heading between the tabs and the "results_summary_xxx", strings. 

Link to comment
Share on other sites

On 3/30/2021 at 11:55 AM, teppo said:

Hey @Erik. Could you check which PHP version you're running?

That error is likely related to nullable returns types ("?Field"), and support for those was added in PHP 7.1. This module will not work on older PHP versions.

That was the solution. Thanks.

  • Like 1
Link to comment
Share on other sites

On 3/30/2021 at 8:26 PM, adrian said:

... except that it doesn't seem to be possible to specify the label for tab buttons as different to the template label. In my case, it seems to be more correct to use the label for the template's parent template label - does that make sense? For example, I would prefer to see "Blog Posts", rather than "Blog Post" or "Publications" vs "Publication".

At the moment tab labels can be customized by hooking into Renderer::renderTabLabel — does that seem feasible to you? Could also have a setting for this, but I'm not yet sure about that; it would get a bit complicated, particularly if I end up adding more group_by options.

On 3/30/2021 at 8:26 PM, adrian said:

The only other issue is that I like having the "results_heading" string below the tabs, rather than above and I also like having it match the currently selected template, eg:
https://ian.umces.edu/search/?q=seagrass&type=media - this of course might be totally personal - not sure what others think of it, but I can't see an easy way to insert a heading between the tabs and the "results_summary_xxx", strings. 

My initial impression is that current approach might be better, especially considering accessibility: users should get some kind of "summary" before the tabs, so that they know what's going on. Solving this with a "tabs first layout" could mean a) visible order that is not the same as real order, or b) visually hidden heading above tabs and then a visible heading hidden from screen readers below them.

(Loosely related note: accessibility is not "perfectly" handled right now, I'll likely make some tweaks in this area soon.)

I'll have to think about this a bit more.

At the moment you have a couple of options:

  • The module currently outputs a heading above the tabs and a summary below them. You could modify that first heading so that it's visually hidden and then hook into results rendering and inject your own "heading" above the summary element. This is a bit tricky right now, but I can make the heading rendering part hookable separately, I think that'd make sense anyway.
  • Modify the order with JS. (Yes, this is a bit ugly... ?)

So no real answer here, just some ideas, but I think that this level of control will likely require some sort of hook based solution anyway.

  • Like 1
Link to comment
Share on other sites

Thanks for the thoughtful feedback @teppo

I used the Renderer::renderTabLabel hook which works great for modifying those tab labels.

Regarding the heading stuff, I went with:

'results_heading' => ''

and then modifying the group labels and adding those to:

'results_list_group_heading'

Now I've got everything looking the way it did before but with the benefit of your core group_by code!

Thanks again - this is brilliant.

  • Like 1
Link to comment
Share on other sites

Actually @teppo - I might have found a bug. If I specify a 'results_list_group_heading' then I find that when there are more than one page of results, that heading sometimes appears twice on a page. I've changed to appending the heading I want to 'tabs_tablist' and that avoids the issue, but obviously it would be nice to 'results_list_group_heading' if possible. Let me know if you can't reproduce and I'll set up an example for you.

  • Like 1
Link to comment
Share on other sites

$query = $modules->get('SearchEngine')->find($input->get->q);

Is it possible to get a PageArray from this? So i can render my own markup... I know i can use "search_index%=q" to find pages, but i would like to take advantage of the module settings, so dont have to hardcode the operator...

I'm trying:

$results = $query->results;

I get "Method Query::get does not exist or is not callable in this context"

When i try "renderResults()" method, it works as expected.

Never mind, solved it after looking into Query class.

$results = $query->getResults();

 

Edited by lokomotivan
solved
  • Like 1
Link to comment
Share on other sites

On 4/1/2021 at 10:47 PM, adrian said:

Actually @teppo - I might have found a bug. If I specify a 'results_list_group_heading' then I find that when there are more than one page of results, that heading sometimes appears twice on a page. I've changed to appending the heading I want to 'tabs_tablist' and that avoids the issue, but obviously it would be nice to 'results_list_group_heading' if possible. Let me know if you can't reproduce and I'll set up an example for you.

I've tried, but so far have been unable to reproduce this. Any additional pointers would be welcome ?

21 hours ago, lokomotivan said:

$query = $modules->get('SearchEngine')->find($input->get->q);

Is it possible to get a PageArray from this? So i can render my own markup... I know i can use "search_index%=q" to find pages, but i would like to take advantage of the module settings, so dont have to hardcode the operator...

I'm trying:


$results = $query->results;

I get "Method Query::get does not exist or is not callable in this context"

When i try "renderResults()" method, it works as expected.

Never mind, solved it after looking into Query class.


$results = $query->getResults();

 

Glad to hear you got it solved. Just for the record, with latest release 0.29.6 $query->results is also available. It's otherwise the same as getResults(), but adds additional check in case there were no results (in which case it will return null instead).

I believe my initial idea was to avoid multiple ways to access the same data, but this is more in line with other features ?

  • Like 2
Link to comment
Share on other sites

@teppo - Not sure if you can do anything about this easily, but take a look at this:

https://ian.umces.edu/search/?q=Incised+stream+banks&t=blog_post

Notice how there is no autodesc shown, but if you click on the second link, you'll see that there is "incised streambanks" in the main text. I know this is because I am using the "~=" operator, but it would be a great improvement if this could be improved. I feel like a relatively simple preg_match, looking for each separate word in the search term should do it.

The other related thing that I think would be nice is if separate words were highlighted separately. This is something I have done previously like this.

image.png.78395145051a8b63a2bbc73e966e9afd.png

image.png.da34bd171869ebac0294d6367184341f.png

image.png.503491f6e21c7c40c633e1577f3a6f47.png

 

 

Link to comment
Share on other sites

On 4/9/2021 at 9:30 PM, adrian said:

@teppo - Not sure if you can do anything about this easily, but take a look at this:

https://ian.umces.edu/search/?q=Incised+stream+banks&t=blog_post

Notice how there is no autodesc shown, but if you click on the second link, you'll see that there is "incised streambanks" in the main text. I know this is because I am using the "~=" operator, but it would be a great improvement if this could be improved. I feel like a relatively simple preg_match, looking for each separate word in the search term should do it.

Sounds like a neat feature, but this will get somewhat complex ?

At the moment automatic description is generated by looking for the (first instance of the) query string (as a whole) within the index, and then converting that + a sensible amount of "padding content" to an excerpt. So far so good. Now, in order to generate an excerpt based on multiple words...

  • Search query needs to be split by whitespace, and perhaps other characters as well.
  • Each word should be matched individually.
  • Preferably words should be matched in any order, just like they are when the query is performed.
  • Here we should probably include more than just the first match, or at least that's my initial gut feeling.
  • Since there may be many individual matches, we need a limit to how much content gets displayed, and we should also have some sort of algorithm in place to figure out which matches to use.

Of course some of this may not be strictly speaking necessary: just finding a match with any single word from the query would be pretty much as simple as the logic we now have in place. But in this scenario that would result in somewhat suboptimal results.

Anyway, just thinking out loud here. I'll give this a shot and see what I can come up with ?

On 4/9/2021 at 9:30 PM, adrian said:

there is "incised streambanks" in the main text. I know this is because I am using the "~=" operator, but

By the way, this is somewhat off-topic but this is one of the reasons I almost never use this operator: using the same content you have on the page, I'm unable to get any matches with "incised streambanks", "stream", or "banks". Might have something to do with MySQL settings (I'm using mostly default settings) but anyway, in my experience this behaviour is so flaky that I really can't be bothered with it.

"%=" may not find results in "wrong" order etc. but at least it won't miss direct hits. And as for performance... well, I've never run into a use case where I would've observed any noticeable difference ?

(Enough with the off-topic!)

On 4/9/2021 at 9:30 PM, adrian said:

The other related thing that I think would be nice is if separate words were highlighted separately.

I'm not entirely sure about this one yet. Let's see if this makes sense once I grasp the "non-consecutive autodesc" thing.

  • Like 1
Link to comment
Share on other sites

Thanks for looking into improving the autodesc - much appreciated.

10 hours ago, teppo said:

By the way, this is somewhat off-topic but this is one of the reasons I almost never use this operator: using the same content you have on the page, I'm unable to get any matches with "incised streambanks", "stream", or "banks". Might have something to do with MySQL settings (I'm using mostly default settings) but anyway, in my experience this behaviour is so flaky that I really can't be bothered with it.

Interesting - I haven't changed the default MySQL settings, so not sure what has changed there. The reason I am using it is that if I change to %= then a search for "stream banks" won't find instances of "streambanks" which I think is pretty important - sometimes it's hard to know whether these sorts of things are one word or two.

 

10 hours ago, teppo said:

I'm not entirely sure about this one yet. Let's see if this makes sense once I grasp the "non-consecutive autodesc" thing.

Fair enough - I am not too worried about this at all - it's just what I thought made sense when I built that example 18 years ago, which BTW was still in production until a couple of weeks ago ?

  • Like 1
Link to comment
Share on other sites

hi @teppo, thanks for the module. it's great!

i have pages which are selected through a page reference field from inside a table which is inside a repeater which is inside a repeater matrix item
(matrix_repeater) -> (matrix_repeater_item)  -> (repeater) -> (table) -> (table column [page reference field]) -> title or text_field

the search_index has already the title of those pages in it. is there an easy way to also get the other fields of the reference page inside the search_index. those selected pages have also a search_index.just to be sure you know where the pages are, here's the route i go on the frontend:

	foreach($page->matrix_repeater as $matrix_repeater_item) {
		if($matrix_repeater_item->type == 'xxxxx') {
			foreach($matrix_repeater_item->repeater as $repeaterItem) {
				foreach($repeaterItem->table as $tableRow) {
					$selectedPage = $tableRow->pageReference;
			        $selectedPage->title; // this is automatically in the search_index
					$selectedPage->body	// this should be added to the search_index
				}
			}
		}
	}
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...