Jump to content

SearchEngine


teppo

Recommended Posts

Sorry @teppo - just noticed a character encoding issue. I have an excerpt (plain textarea) field which is being indexed and I am seeing this in the results:

image.png.bcddcc263a02b05d1fbe943c9a11dd0b.png

https://ian.umces.edu/search/?q=dennison

That � is a normal single quote character which you can see here: 

image.thumb.png.e0ae0bc179b4c82378ebfa087baa4fb7.png

https://ian.umces.edu/blog/page2/

Let me know if there is any other info I can provide.

Link to comment
Share on other sites

@teppo - just took a better look at this and it is related to the HTML Entity Decoder textformatter that I have applied to the excerpt field. If I dump the content of $index in the Renderer::getResultAutoDesc method, I see this:

'Tony Larkum's 80th Birthday Party ... After attending a former colleague's birthday party, Bill Dennison recounts fond memories of time he spent with Tony Larkum in Australia and the United States that spanned from the 80's through early 2000's.

What is interesting to me is that the ' in "former colleague's birthday" looks as expected, but the one in "the 80's through" is converted to that  icon.

If I add this line: 

$index = $this->wire('sanitizer')->unentities($index);

then it works as expected, although I don't think we want to do that. I feel like this is somehow related to one of the regexes and a " 's " right at the end of the autodesc. Maybe it's truncating the 's to something like &#03 which it doesn't know how to display?

Link to comment
Share on other sites

I just changed the $desc_max_length from 255 to 265 and that fixed it, so I think my last suggestion about it truncating in the middle of the 's is correct.

Here is the output of $value from ___renderResultDesc() which confirms it!

<div class="{classes.result_desc}">Tony Larkum's 80th Birthday Party ... After attending a former colleague's birthday party, Bill <strong class="{classes.result_highlight}">Dennison</strong> recounts fond memories of time he spent with Tony Larkum in Australia and the United States that spanned from the 80&#0...</div>

 

Link to comment
Share on other sites

2 hours ago, adrian said:

I just changed the $desc_max_length from 255 to 265 and that fixed it, so I think my last suggestion about it truncating in the middle of the &#039;s is correct.

Seems reasonable. In fact I had similar issues earlier, but those were fixed by switching from substr() to mb_substr(). This case seems a bit different, though. I'll try to set up a test case and see what I can do about that; so far I've only managed to end up with a partial encoded entity, 80& instead of 80' and so on 🙂

Meanwhile I noticed that in some cases the new auto desc feature generates odd looking "......" character sequences and other weird results, so there's more to tweak anyway. I'll try to dedicate some time to this in the next few days.

Quote

HTML Entity Decoder textformatter

Just to make sure: did you mean HTML Entity Encoder?

  • Like 1
Link to comment
Share on other sites

On 4/13/2021 at 12:49 PM, ngrmm said:

hi @teppo, thanks for the module. it's great!

i have pages which are selected through a page reference field from inside a table which is inside a repeater which is inside a repeater matrix item
(matrix_repeater) -> (matrix_repeater_item)  -> (repeater) -> (table) -> (table column [page reference field]) -> title or text_field

the search_index has already the title of those pages in it. is there an easy way to also get the other fields of the reference page inside the search_index.

Thanks!

Currently "id", "name", and "title" are indexed automatically for page reference fields. If a programmatic approach is easy enough, you can modify this behaviour and add more fields by hooking before Indexer::getPageReferenceIndexValue() and modifying the $indexed_fields argument. Alternatively you could hook into this method and in specific cases replace the behaviour completely, for an example by returning the output of the indexPage() method for each individual page instead (here's a loosely related example).

Let me know if you need additional pointers and I can provide some sample code.

  • Like 2
Link to comment
Share on other sites

14 minutes ago, teppo said:

Thanks!

Currently "id", "name", and "title" are indexed automatically for page reference fields. If a programmatic approach is easy enough, you can modify this behaviour and add more fields by hooking before Indexer::getPageReferenceIndexValue() and modifying the $indexed_fields argument. Alternatively you could hook into this method and in specific cases replace the behaviour completely, for an example by returning the output of the indexPage() method for each individual page instead (here's a loosely related example).

Let me know if you need additional pointers and I can provide some sample code.

ok, I'm a noob but I'll give it a try. I will use my loops and add the needed content to the searchindex.
But where should i place this hook to run it after a page is saved in the backend?

Link to comment
Share on other sites

36 minutes ago, teppo said:

I'll try to dedicate some time to this in the next few days.

Thanks @teppo

This exact text in a textarea with a HTML Entity Encoder textformatter should allow you to replicate the issue.

After attending a former colleague's birthday party, Bill Dennison recounts fond memories of time he spent with Tony Larkum in Australia and the United States that spanned from the 80's through early 2000's.

 

37 minutes ago, teppo said:

Just to make sure: did you mean HTML Entity Encoder?

Yes, sorry about that 🙂

  • Like 1
Link to comment
Share on other sites

26 minutes ago, ngrmm said:

ok, I'm a noob but I'll give it a try. I will use my loops and add the needed content to the searchindex.

Great — let me know if you run into any trouble and I'll be happy to help!

26 minutes ago, ngrmm said:

But where should i place this hook to run it after a page is saved in the backend?

If you mean which file, for this purpose (saving pages in the admin) you could put the hook into site/templates/admin.php, but site/init.php is likely just as good (and also catches page saves that occur outside the admin).

  • Like 1
Link to comment
Share on other sites

5 minutes ago, teppo said:

Great — let me know if you run into any trouble and I'll be happy to help!

I found another solution and it works for me. First I loop through my pages and save the needed content into a helper-textarea-field. I added this helper-field into the indexed fields and this works. I guess this is not the best for the performance but it works for me.

Thanks again

  • Like 1
Link to comment
Share on other sites

@teppo - I looked into the issue with truncation of entities a little more and it seems like there are a couple of different approaches to fix this. The simplest seems to be to simply remove any trailing bits of entity that is left over, eg: 

  $value = rtrim(preg_replace('/(?:<(?!.+>)|&(?!.+;)).*$/us', '', $value));

as mentioned here: https://www.drupal.org/project/drupal/issues/2279655#comment-8843201 - there is some other useful discussion there about this issue.

Another approach I've come across is to calculate the length by treating the entities as a single character like they detail here: https://gist.github.com/andykirk/b304a3c84594515677e6 and https://alanwhipple.com/2011/05/25/php-truncate-string-preserving-html-tags-words/

I tried the first approach and it seems to work fine. Hopefully this might save you a bit of time. This is where I've put it for now:

    protected function formatResultAutoDesc(string $match, string $index, string $desc): string {
        if ($match !== '') {
            $match_length = mb_strlen($match);
            $add_prefix = (empty($desc) || mb_substr($desc, -3) !== '...') && (mb_strpos($match, '...') === 0 || mb_substr($index, 0, $match_length) !== $match);
            $add_suffix = mb_substr($index, -$match_length) !== $match || mb_strrpos($match, '.') !== $match_length;
            // Remove scraps of HTML entities from the end of a strings
            $match = rtrim(preg_replace('/(?:<(?!.+>)|&(?!.+;)).*$/us', '', $match));
            $match = ($add_prefix ? '...' . $match : $match) . ($add_suffix ? '...' : '');
        }
        return $match;
    }

 

Link to comment
Share on other sites

11 hours ago, adrian said:

I tried the first approach and it seems to work fine. Hopefully this might save you a bit of time.

Thanks, @adrian. I agree that this is the simplest solution, and probably the most sensible one right now. It's now included in the dev branch of the module 🙂

  • Like 1
Link to comment
Share on other sites

Hi @teppo - I've been playing around with multiple operators (https://processwire.com/docs/selectors/operators/#using-more-than-one-operator-at-a-time). I think you mentioned that these aren't supported, so please feel free to ignore this request, but what I discovered when using this operator combination: *=~= is that the grouping feature no longer works if PW fails to find any results via *= and moves on to ~=

All results end up being listed under the All tab with no other tabs shown. For now I've gone back to my old approach to adding the tabs which works with this operator. 

I have also played around with the #= operator which I think will need some special handling to support because of the need to additional additional quotes and escaping. 

Just rambling out loud here - I also played around with adding basic support for swapping between *= and ~= depending on whether the user's search term includes double quotes. I almost stuck with that approach but I am honestly not sure how many non-tech users know about quoting search terms, so I thought the *=~= was probably better for most people.

Link to comment
Share on other sites

@teppo - sorry for the barrage. Again, don't worry about this if it's not important to you, but I noticed an issue with highlighting in the auto description if there is punctuation in the search, eg:
https://ian.umces.edu/search/?q=bird's+head

Note the last result: "Science communication training for West Papua, Indonesia" doesn't highlight "Bird’s Head" because it's not a regular single quote character used for the apostrophe. I wonder if highlight matching can / should ignore punctuation characters?

Link to comment
Share on other sites

  • 4 weeks later...

I had this moster on my search result page:

$selector = "template=produkt, check2|check_temp=1, (title|text|text_editor_minimal|text_editor|text_editor2|text_editor3|search_index~=$q), (produkt_hersteller.title%=$q), (produkt_bereiche.title%=$q), (produkt_anwendungen.title%=$q), (produkt_verwendungen.title%=$q), (produkt_geratetypen.title%=$q), (produkt_gruppen.title%=$q), (produkt_verdichter.title%=$q), (produkt_kaltemittel.title%=$q), (produkt_gwp.title%=$q)";

It worked so far. After I updated PW to 3.0.165 I got an Internal Server error. So far so good. I updated SearchEngine too. No change. I removed piece by piece parts of the selector. Finally I realised that the subfield selectors (like produkt_hersteller.title) caused the error.

Now I ended with this selector:

$selector = "template=produkt, check2|check_temp=1, (title|text|text_editor_minimal|text_editor|text_editor2|text_editor3|search_index%=$q)";

Has anybody an idea why subfield selectors don't work anymore?

Link to comment
Share on other sites

Hey @teppo,

i got a quick question - maybe it is even trivial:

I just upgraded from SE 0.26 to 0.30.1 and i am running into some issues with the JSON output. Up until now it was working flawlessly, when i was querying my search endpoint via AJAX. The output was just dead simple:

$searchEngine = \ProcessWire\modules()->get('SearchEngine');
header('Content-Type: application/json');
return $searchEngine->renderResultsJSON();

But somehow this endpoint returns no results on 0.30.1, when using the same query as on 0.26:

{
  "query": "xyz"
}

I also did a quick debug query via the SE admin panel, which returns all the expected results.

Am i overlooking some changes?

  • Thanks 1
Link to comment
Share on other sites

  • 5 weeks later...
On 5/21/2021 at 2:48 PM, 2hoch11 said:

I had this moster on my search result page:

$selector = "template=produkt, check2|check_temp=1, (title|text|text_editor_minimal|text_editor|text_editor2|text_editor3|search_index~=$q), (produkt_hersteller.title%=$q), (produkt_bereiche.title%=$q), (produkt_anwendungen.title%=$q), (produkt_verwendungen.title%=$q), (produkt_geratetypen.title%=$q), (produkt_gruppen.title%=$q), (produkt_verdichter.title%=$q), (produkt_kaltemittel.title%=$q), (produkt_gwp.title%=$q)";

It worked so far. After I updated PW to 3.0.165 I got an Internal Server error. So far so good. I updated SearchEngine too. No change. I removed piece by piece parts of the selector. Finally I realised that the subfield selectors (like produkt_hersteller.title) caused the error.

Now I ended with this selector:

$selector = "template=produkt, check2|check_temp=1, (title|text|text_editor_minimal|text_editor|text_editor2|text_editor3|search_index%=$q)";

Has anybody an idea why subfield selectors don't work anymore?

This seems like an issue with ProcessWire itself, not SearchEngine. If you have a chance you might want to give the latest dev version of ProcessWire a try, though if it's a live site then I'd recommend doing it locally / in a development environment first, or alternatively backing your site up before the update (just in case).

If that doesn't solve it, you may want to open an issue for this at https://github.com/processwire/processwire-issues. "Internal Server error" alone isn't very informative, so you may also want to dig into your log files etc. to see if there's anything more specific there.

  • Like 1
Link to comment
Share on other sites

On 5/27/2021 at 5:51 PM, mlfct said:

I just upgraded from SE 0.26 to 0.30.1 and i am running into some issues with the JSON output. Up until now it was working flawlessly, when i was querying my search endpoint via AJAX. The output was just dead simple:

$searchEngine = \ProcessWire\modules()->get('SearchEngine');
header('Content-Type: application/json');
return $searchEngine->renderResultsJSON();

But somehow this endpoint returns no results on 0.30.1, when using the same query as on 0.26:

{
  "query": "xyz"
}

I also did a quick debug query via the SE admin panel, which returns all the expected results.

Am i overlooking some changes?

This seems like a bug. So far I've traced it back to version 0.29.0, but will have to dig in further to see what's causing it. This version introduced major changes behind the scenes, so I'm not entirely surprised that something went wrong.

I'll get back to you once I figure this out.

Link to comment
Share on other sites

@mlfct, the issue mentioned above should now be fixed in SearchEngine version 0.30.2. Thanks for letting me know about this and sorry for taking so long to solve it 🙂

  • Like 2
Link to comment
Share on other sites

  • 2 weeks later...

@adrian How are you able to extract image information along with the rest of the rendered results? I've tried adding item.image_field to the templates in a number of configurations and I can get the raw filename but not any of the other page image fields, e.g. httpUrl, height/width, etc.

Link to comment
Share on other sites

19 minutes ago, gornycreative said:

@adrian How are you able to extract image information along with the rest of the rendered results? I've tried adding item.image_field to the templates in a number of configurations and I can get the raw filename but not any of the other page image fields, e.g. httpUrl, height/width, etc.

I added this to my search.php template file. Hope that helps.

$searchEngine->addHookAfter('Renderer::renderResult', function($event) {
    $p = $event->arguments[0];
    $image = $p->getunFormatted('pdf_images|image|images|video_images')->first();
    if($image) {
        $thumb = $image->size(0, 260, array('upscaling' => false));
        $event->return = '
        <div class="row">
            <div class="small-12 medium-4 large-5 xlarge-4 xxlarge-3 columns teaser__image">
                <a href="'.$p->url.'">
                    <img style="max-height: 300px" src = "'.$thumb->url.'" />
                </a>
            </div>
            <div class="small-12 medium-8 large-7 xlarge-8 xxlarge-9 columns teaser__content">
            ' . $event->return . '
        </div>';
    }
});

 

  • Like 2
  • Thanks 1
Link to comment
Share on other sites

@adrian Thanks, it does. I wasn't sure if I'd need to dig into hooks on this or not.

@teppo Just to confirm, looking at Renderer it also seems like there isn't a method off the bat to get the paginationString from the query... just count and total.

Looks like I'd need to hook into renderResultsListSummary with a pagination check for that? I think I could get it from the magic get method?

Link to comment
Share on other sites

18 hours ago, gornycreative said:

@teppo Just to confirm, looking at Renderer it also seems like there isn't a method off the bat to get the paginationString from the query... just count and total.

Looks like I'd need to hook into renderResultsListSummary with a pagination check for that? I think I could get it from the magic get method?

Depends on where and how you want to use this 🙂

The query object has "pager" property that returns a rendered pager, and Renderer has public method (Renderer::renderPager(array $args, Query $query)) that does the same. Whether or not these will be useful depends, again, largely on your context.

Link to comment
Share on other sites

  • 2 weeks later...

This has been great. I just wanted to confirm one thing I noticed which is that _auto_desc doesn't seem to highlight/load a summary on partial word matches, although the search itself picks them up.

So for example if I search for 'business' using a partial match operator, any article search_index that includes the whole word 'business' will create a summary with the word 'business' marked... and results will show up where businesses and businessmen are in the index, but no summary with a highlighted partial word appears. The summary for these entries is blank.

Is this the intended behavior?

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By Robin S
      This module lets you add some custom menu items to the main admin menu, and you can set the dropdown links dynamically in a hook if needed.
      Sidenote: the module config uses some repeatable/sortable rows for the child link settings, similar to the ProFields Table interface. The data gets saved as JSON in a hidden textarea field. Might be interesting to other module developers?
      Custom Admin Menus
      Adds up to three custom menu items with optional dropdowns to the main admin menu.
      The menu items can link to admin pages, front-end pages, or pages on external websites.
      The links can be set to open in a new browser tab, and child links in the dropdown can be given an icon.
      Requires ProcessWire v3.0.178 or newer.
      Screenshots
      Example of menu items

      Module config for the menus

      Link list shown when parent menu item is not given a URL

      Advanced
      Setting child menu items dynamically
      If needed you can set the child menu items dynamically using a hook.
      Example:
      $wire->addHookAfter('CustomAdminMenus::getMenuChildren', function(HookEvent $event) { // The menu number is the first argument $menu_number = $event->arguments(0); if($menu_number === 1) { $colours = $event->wire()->pages->findRaw('template=colour', ['title', 'url', 'page_icon']); $children = []; foreach($colours as $colour) { // Each child item should be an array with the following keys $children[] = [ 'icon' => $colour['page_icon'], 'label' => $colour['title'], 'url' => $colour['url'], 'newtab' => false, ]; } $event->return = $children; } }); Create multiple levels of flyout menus
      It's also possible to create multiple levels of flyout submenus using a hook.

      For each level a submenu can be defined in a "children" item. Example:
      $wire->addHookAfter('CustomAdminMenus::getMenuChildren', function(HookEvent $event) { // The menu number is the first argument $menu_number = $event->arguments(0); if($menu_number === 1) { $children = [ [ 'icon' => 'adjust', 'label' => 'One', 'url' => '/one/', 'newtab' => false, ], [ 'icon' => 'anchor', 'label' => 'Two', 'url' => '/two/', 'newtab' => false, 'children' => [ [ 'icon' => 'child', 'label' => 'Red', 'url' => '/red/', 'newtab' => false, ], [ 'icon' => 'bullhorn', 'label' => 'Green', 'url' => '/green/', 'newtab' => false, 'children' => [ [ 'icon' => 'wifi', 'label' => 'Small', 'url' => '/small/', 'newtab' => true, ], [ 'icon' => 'codepen', 'label' => 'Medium', 'url' => '/medium/', 'newtab' => false, ], [ 'icon' => 'cogs', 'label' => 'Large', 'url' => '/large/', 'newtab' => false, ], ] ], [ 'icon' => 'futbol-o', 'label' => 'Blue', 'url' => '/blue/', 'newtab' => true, ], ] ], [ 'icon' => 'hand-o-left', 'label' => 'Three', 'url' => '/three/', 'newtab' => false, ], ]; $event->return = $children; } }); Showing/hiding menus according to user role
      You can determine which menu items can be seen by a role by checking the user's role in the hook.
      For example, if a user has or lacks a role you could include different child menu items in the hook return value. Or if you want to conditionally hide a custom menu altogether you can set the return value to false. Example:
      $wire->addHookAfter('CustomAdminMenus::getMenuChildren', function(HookEvent $event) { // The menu number is the first argument $menu_number = $event->arguments(0); $user = $event->wire()->user; // For custom menu number 1... if($menu_number === 1) { // ...if user does not have some particular role... if(!$user->hasRole('foo')) { // ...do not show the menu $event->return = false; } } });  
      https://github.com/Toutouwai/CustomAdminMenus
      https://processwire.com/modules/custom-admin-menus/
    • By tcnet
      This module for ProcessWire sends a notification email for each failed login attempt. Similar modules exists already in the module directory of ProcessWire. However, this module is designed to notify, even if specified user doesn't exist.
      Settings
      The settings for this module are located in the menu Modules=>Configure=>LoginFailNotifier.
      Notification email
      Specifies the email address to which the notification emails should be sent.
        Email subject
      Specifies the subject line for the notification email.
        Post variables
      Specifies the $_POST variables to be included in the notification email. Each variable must be separated by a comma. For example: login_name,login_pass
        Server variables
      Specifies the $_SERVER variables to be included in the notification email. Each variable must be separated by a comma. For example: REMOTE_ADDR,HTTP_USER_AGENT
      Link to ProcessWire module directory:
      https://processwire.com/modules/login-fail-notifier/
      Link to github.com:
      https://github.com/techcnet/LoginFailNotifier
    • By Fokke
      ProcessWire 3.x markup module for rendering meta tags in HTML document head section. Note that this module is not a full-blown SEO solution, but rather a simple tool for rendering meta tags based on module configuration. Adding custom meta tags is also supported.
      Built-in meta tags
      The following meta tags are supported out-of-the-box:
      Document title consisting of page title and site name Character set Canonical Viewport Description Keywords Hreflang tags Open Graph og:title og:site_name og:type og:url og:description og:image og:image:width og:image:height Twitter meta tags twitter:card twitter:site twitter:creator twitter:title twitter:description twitter:image Facebook meta tags fb:app_id The full documentation with configurable options can be found here: https://github.com/Fokke-/MarkupMetadata
       
      Requirements:
      ProcessWire>=3.0.0 PHP >=7.1 Installation using Composer
      composer require fokke/markup-metadata Manual installation
      Download latest version from https://github.com/Fokke-/MarkupMetadata/archive/master.zip Extract module files to site/modules/MarkupMetadata directory.
    • By m.sieber
      ITRK-Service for ProcessWire
      Module for the automated transfer of imprint, data protection declaration and terms and conditions from IT-Recht Kanzlei to your ProcessWire installation
      What is ITRK Service for ProcessWire?
      ITRK-Service for ProcessWire is a free module for ProcessWire CMS. It provides an interface to the update service of IT-Recht Kanzlei, via which the legal texts of your online presence are automatically updated. In this way, the texts remain legally secure and warning-proof in the long term. Imprint, data protection declaration, revocation and general terms and conditions are currently supported.
      You can find our documentation (in german language) here: https://www.pupit.de/itrk-service-for-processwire/dokumentation/

      Download: https://www.pupit.de/itrk-service-for-processwire/
      Github: https://github.com/pupit-de/pwItrkServiceConnector
    • By LuisM
      Symprowire is a PHP MVC Framework based and built on Symfony using ProcessWire 3.x as DBAL and Service-Provider
      It acts as a Drop-In Replacement Module to handle the Request/Response outside the ProcessWire Admin. Even tough Symfony or any other mature MVC Framework could be intimidating at first, Symprowire tries to abstract Configuration and Symfony Internals away as much as possible to give you a quick start and lift the heavy work for you.
      The main Goal is to give an easy path to follow an MVC Approach during development with ProcessWire and open up the available eco-system.
      You can find the GitHub Repo and more Information here: https://github.com/Luis85/symprowire
      Documentation
      The Symprowire Wiki https://github.com/Luis85/symprowire/wiki How to create a simple Blog with Symprowire https://github.com/Luis85/symprowire/wiki/Symprowire-Blog-Tutorial Last Update
      16.07.2021 // RC 1 v0.6.0 centralized ProcessWire access trough out the Application by wrapping to a Service https://github.com/Luis85/symprowire/releases/tag/v0.6.0-rc-1 Requirements
      PHP ^7.4 Fresh ProcessWire ^3.0.181 with a Blank Profile Composer 2 (v1 should work, not recommended) The usual Symfony Requirements Features
      Twig Dependency Injection Monolog for Symprowire Support for .env YAML Configuration Symfony Console and Console Commands Symfony Webprofiler Full ProcessWire access inside your Controller and Services Webpack Encore support Caveats
      Symfony is no small Framework and will come with a price in terms of Memory Usage and added Overhead. To give you a taste I installed Tracy Debugger alongside to compare ProcessWire profiling with the included Symfony Webprofiler

      So in a fresh install Symprowire would atleast add another 2MB of Memory usage and around 40ms in response time, should be less in production due to the added overhead of the Webprofiler in dev env
       
×
×
  • Create New...