Jump to content

SearchEngine


teppo

Recommended Posts

33 minutes ago, teppo said:

Sorry @snck and @xportde, looks like I completely missed this question. Just to be clear, you mean the ___savedPageIndex() method?

Running this when the entire index is being rebuilt should be doable, but it looks like I'll have to move a few bits and pieces to another class. I'll take a closer look at this — hopefully later today, or perhaps tomorrow.

No problem at all! Excuse me, yes, I meant the ___savedPageIndex() method. Thanks in advance!

  • Like 1
Link to comment
Share on other sites

On 6/9/2020 at 10:32 AM, xportde said:

I have a question concerning the savePageIndex-Hook: It works perfectly on saving a single page, but not, when I use the "Index pages now"-function in module settings page. How could I achieve this?

This is fixed in the latest release (SearchEngine 0.25.2).

  • Like 2
Link to comment
Share on other sites

  • 2 weeks later...

how about "exact matches"?

When I use the searchengine-module's demo-site, it seems to check for the exact match of each word of a search query in that very order.

e.g. search queries (ignore the quotes):
"scalability issues" returns 1 result
"issues scalability" returns 0 results

Some times you do not want exact matches and sometimes you do. When using Google, you soon figure out the use of quotes for an exact-match-query does the trick. Needless to say, that won't work here since the lack of quotes already returns exact matches.

How to specify that on the frontend? Would there be a way to specify that by using/not using quotes? I want to avoid including yet another set of radio-buttons. 

thanks!

Link to comment
Share on other sites

Hey @fruid,

SearchEngine finds content via the API and the operator used for text searches is configurable (see module config screen), so there's that. Depending on the operator you use you'll get a different set of results (obviously). You can read more about available selectors from the docs.

As for Google-style search queries, there's nothing exactly like that built-in, but if you've got a newish version of ProcessWire (3.0.160+) and a recent version of SearchEngine installed, the module supports the advanced text search operator. The downside is that it's currently a little unclear how such queries should be sanitized so this feature is not fully functional. If you don't mind writing a bit of code yourself, there's also an example in the README about how you could provide your own search query and frontend (which, in turn, would allow you to sanitize the query string and then perform a find operation using the "#=" operator).

... and, as for the last point, it's actually also possible to use markup generated by SearchEngine yet still provide a custom results object. It involves a couple of extra steps, but something like this should work (requires SearchEngine 0.26.0 and ProcessWire 3.0.160+):

// load and init SearchEngine
$searchEngine = $modules->get('SearchEngine');
$searchEngine->initOnce(); // (for autoloader)

// construct a Query object ($q stands for the _sanitized_ query string)
$query = new \SearchEngine\Query($q);
$query->results = $pages->find('search_index#=' . $q . ', limit=25');

// render results
echo $searchEngine->renderResults([], $query);

That doesn't remove the need to sanitize the query, though. That's something you'd (currently) have to do yourself if you want to use the advanced text search operator to the full extent.

  • Like 1
Link to comment
Share on other sites

  • 1 month later...

Hi all,

This is likely a really stupid question, but how do I go about enabling the result descriptions and text highlighting?

I have seen @teppo mention it and have seen theres methods in the codebase relating to it but for the life of me I can't figure out how to enable it.

My search results are only returning a page title and a url at present, which I figured was the out the box default but not 100% sure.

 

  • Like 1
Link to comment
Share on other sites

Hey @cosmicsafari — not a bad question at all.

The description field comes from a module config setting. By default the module is set up to look for field called "summary", but you can change this to something else:

$config->SearchEngine = [
        'render_args' => [
            'result_summary_field' => 'summary',
        ],
];

My guess is that your pages don't have the summary field?

You can use some other field instead (if there's a suitable field), or you could let the module auto-generate the description by setting the summary field as "_auto_desc"... though please note that the support for auto-generated descriptions is experimental, and comes with one major gotcha: SearchEngine doesn't know which parts of your search index are "public knowledge", so it may end up displaying anything stored there. If you end up using this option, be sure to test it and make sure that you haven't indexed anything you don't want to show up in the search results 🙂

 

  • Like 1
Link to comment
Share on other sites

On 5/10/2020 at 10:30 PM, teppo said:

$wire->addHookAfter('SearchEngine::savedPageIndex', function(HookEvent $event) { $page = $event->arguments[0]; if ($page->template == 'ContainerPage' && $page->children->count()) { $searchEngine = $event->modules->get('SearchEngine'); foreach ($page->children as $child) { $child_index = $searchEngine->indexPage($child, false, [ 'return' => 'index', ]); $page->search_index .= "\n" . $child_index[0]; } $page->save('search_index', [ 'quiet' => true, 'noHooks' => true, ]); } });

This really helped us @teppo thank you :) We're now displaying a link to the parent page when it finds the search term within the child block (sub page rendered on the parent page as a block).

 We set the auto description to _auto_desc to enable the search term to show highlighted within its surrounding text.

'result_summary_field' => '_auto_desc', 

This works perfectly when the search term is found on the parent page:
Result = Parent page title, parent page url and auto description found within the parent page

However, we're not seeing the auto description appear when it finds the search term within the child page block that is rendered on the parent page:
Expected Result  = Parent page title, parent page url and auto description found within the child page.

So while it is searching and finding the right content and displaying the correct page from this, it's just missing the contextual auto description from the content that it has searched, meaning that some results have a summary and others don't where it can't display the child page summary.

Apologies for the awful description here. Are you able to give us any pointers?

Thanks again

 

 

Link to comment
Share on other sites

Hi @teppo is it possible to order the results by the number of times the search query is found so that more important pages are listed first?

For example, a page containing the search term 5 times would appear higher than a page with the search query found only once or twice?

Link to comment
Share on other sites

  • 1 month later...

Hey @Ivan Gretsky — thanks 🙂

With latest version (just released, 0.27.0) you can make the module store page ID as an indexable field. This will allow the page to be found by its ID (since that is now included in the index) OR you can use syntax such as page.id:1234 to specifically search for a page with this ID (though this would likely also match page.id:12345 etc.) Of course id:1234 will also work, but may result even more false positives.

If you update to the latest version of the module, note that you also need to add id as an indexable field and rebuild your index before this will work.

  • Like 1
Link to comment
Share on other sites

  • 4 weeks later...

@teppo

just some basic questions, no issues (yet) 😄

I have different pro-tables throughout the site and their content is distinct, thus should be indexed separately, is that at all possible? If not, we can stop right here.

I set it up like this:
Selected indexed fields: the table-field
Indexed templates: the table-field's parent-template
Select index field: search_index (already auto-added to the parent-template) 

sort and operator is unclear to me. I will need to look into adding custom operators using the aforementioned PW-doc on selectors.
However, how do I make the sort direction and the operator optional depending on the user input?

How can I select the subfields of the table (i.e. columns) to be indexed? Only when I check the "Index pages now?" option I can select those subfields. 

I indexed the table and/or the template, did a search and got a result. But since it is not pages that it's finding but say 6 matching rows on 1 page, it only shows 1 result instead of 6 and links to the parent page. Can I adjust the results to list the rows? I have the entire markup already (you may or may not have noticed, I've been trying to accomplish a search function without SearchEngine for some time now), any chance I can just use that markup somewhere?

That's all the questions I have so far.

Thanks for help, have a nice weekend.

Link to comment
Share on other sites

17 hours ago, fruid said:

I have different pro-tables throughout the site and their content is distinct, thus should be indexed separately, is that at all possible? If not, we can stop right here.

I'm not sure I fully understand what you're trying to achieve here, but no — this is not something SearchEngine does. It creates a single index for the entire site.

I have considered adding support for multiple indexes, but a solid need for that never surfaced. Would be interesting to hear if there's one now, though admittedly this seems more like a need for a custom search engine, and thus likely has little to do with this module.

The TL;DR seems to be that "this is not the module you're looking for", but I'll go through your questions just to clarify things:

17 hours ago, fruid said:

sort and operator is unclear to me. I will need to look into adding custom operators using the aforementioned PW-doc on selectors.
However, how do I make the sort direction and the operator optional depending on the user input?

I'm not entirely sure what you mean by adding custom operators. If you're referring to selecting some operator that isn't available via SearchEngine settings, that's possible, but to be honest most of the selectors that make sense within site search queries are already there.

As always it's good to read the docs, though; understanding selectors and selector operators is a key factor when working with ProcessWire 🙂

Anyway, the settings you see in this screen are defaults. Most site search tools don't let visitors select operators or even alter the sort order in any way, so setting them once here is enough. If you're working on something more complex — perhaps something like library databases often provide — these settings have little to do with that. In this case you're probably not going to have much use for SearchEngine, although it's of course possible to create a custom search feature and use the search_index field as a regular field in your queries.

17 hours ago, fruid said:

How can I select the subfields of the table (i.e. columns) to be indexed? Only when I check the "Index pages now?" option I can select those subfields. 

At the moment you can't. These are not fields, they are indeed (literal) table columns. Search Engine doesn't work on that level. It'd be possible to add this feature, but at the moment this seems like an extremely rare need, and as such it's not particularly high priority.

The selector inputfield you see when you use "Index pages now" feature is only a tool for finding the pages you want to index, and it's using the core Inputfield Selector for selecting applicable pages.

17 hours ago, fruid said:

I indexed the table and/or the template, did a search and got a result. But since it is not pages that it's finding but say 6 matching rows on 1 page, it only shows 1 result instead of 6 and links to the parent page. Can I adjust the results to list the rows? I have the entire markup already (you may or may not have noticed, I've been trying to accomplish a search function without SearchEngine for some time now), any chance I can just use that markup somewhere?

The first part is not really in the scope of SearchEngine: it's a tool for providing a site search, and since sites consist of pages, that's what it will list. It does indeed sound like you're looking for something more customized.

... although you can in fact a) completely skip any front-end markup generated by SearchEngine and provide your own instead (basically just use the search_index as a regular field in queries), or b) modify markup generated by SearchEngine using hooks, or even c) hook into the indexing process to fill in other custom fields in addition to the default search_index field. All of these would require plenty of custom work, and I'm honestly not sure if you'd find SearchEngine useful for your needs; you may find it easier to build the entire thing from scratch.

  • Like 1
Link to comment
Share on other sites

  • 1 month later...
17 minutes ago, adrian said:

Hi @teppo - just playing around with this for the first time and got this error when using the Debug Query feature. Not sure if it's an issue with the way you are calling the PageFinderClass?


PHP Notice: Undefined index: returnAllCols in .../public_html/wire/core/PageFinder.php:1460

Hey Adrian!

First things first: which version of ProcessWire is this on? Line number doesn't seem to match either current master or dev branch.

Error you're seeing looks easy to fix, but at least in current master or dev branch of the core PageFinder takes care of setting default value for this option. My guess would be that there's a version of the core that doesn't set the default value, but I'd like to make sure before applying a "fix", just in case 🙂

 

  • Like 1
Link to comment
Share on other sites

Thanks @adrian!

Came to the same realization here in the meantime: PageFinder::find() sets the default value, but PageFinder::getQuery() doesn't, which means that in this case I do indeed need to specifically define returnAllCols. I'll commit a fixed version soon.

I'm accessing this method directly for debug purposes (in order to display the generated SQL query in the Debugger class), and that's likely the only situation where one might run into this.

  • Like 1
Link to comment
Share on other sites

Sorry if I'm being dumb, but when I went to set the 'form_action' argument, I assumed it needed to be an element within the 'render_args' array, but if I do that, it doesn't work. It does work if I set it at the top level of the args/options array that I am passing to the renderForm() method though.

If I dump $args without setting this, I see it is at the top level. Did you change something intentionally since writing the docs, or am I just confused ?

image.png.492b22ec9d16147f05bef267e488276a.png

Link to comment
Share on other sites

20 minutes ago, adrian said:

Weird thing is that I don't think I have seen those errors in Tracy which uses getQuery() and getDebugQuery()

https://github.com/adrianbj/TracyDebugger/blob/116324264c3d566c1cece8c53cc3a1075ca82862/panels/DebugModePanel.php#L232-L239

That's a different context: it seems to me that Tracy hooks after PageFinder::getQuery() and uses the getQuery() method of returned DatabaseQuerySelect object. In my case I'm calling PageFinder::getQuery() instead. It's potentially a bit confusing that PageFinder::getQuery() returns an object that has getQuery() method, but these are actually two unrelated things... 🙂

Technically I could do the same thing in SearchEngine, but it feels "less straightforward" 🙂

  • Like 1
Link to comment
Share on other sites

Sorry @teppo - another question / request for you - what do you think about being able to store theme folders in /site/templates/SearchEngine/themes to avoid any issues with overwriting during updates, and also so it's cleaner for version control?

  • Like 1
Link to comment
Share on other sites

29 minutes ago, adrian said:

Sorry if I'm being dumb, but when I went to set the 'form_action' argument, I assumed it needed to be an element within the 'render_args' array, but if I do that, it doesn't work. It does work if I set it at the top level of the args/options array that I am passing to the renderForm() method though.

"form_action" is indeed part of the "render_args" array. The $args array that SearchEngine::renderForm() (actually implemented by SearchEngine\Renderer::renderForm()) takes in is also the "render_args" array from the parent scope — so this depends (or at least it should depend) on which context you're defining this argument:

  • If you're defining it via site config or by manually calling SearchEngine::setOptions(), you should put it within the render_options array item, which itself is an array ($config->searchEngine = ['render_options' => ['form_action' => '...']]).
  • If you're calling SearchEngine::renderForm() with an args array, you should pass it as a top level item ($searchEngine->renderForm(['form_action' => '...'])).

Please let me know if this description doesn't match actual behaviour though. That is how it should work.

I must admit that I manage to confuse myself with these arguments every now and then. While I thought this would be the best approach while building the module, right now I'm not so sure anymore — the basic idea here was that when you're specifically calling render* methods, you shouldn't have to worry about anything other than the render args.

Argument handling is one of the things I might still change a bit once I decide that the module is ready for a 1.0 release... 🙂

27 minutes ago, adrian said:

Sorry @teppo - another question / request for you - what do you think about being able to store theme folders in /site/templates/SearchEngine/themes to avoid any issues with overwriting during updates, and also so it's cleaner for version control?

Makes sense to me. Since theming support was initially added I've been using either the default theme as-is, or completely custom markup and styles, so the whole theme concept is somewhat "underdeveloped" at the moment.

I've added this on my backlog for now.

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...