Jump to content

SearchEngine


teppo

Recommended Posts

Hey folks!

Took a couple of late nights, but managed to turn this old gist of mine into a proper module. The name is SearchEngine, and currently it provides support for indexing page contents (into a hidden textarea field created automatically), and also includes a helper feature ("Finder") for querying said contents. No fancy features like stemming here yet, but something along those lines might be added later if it seems useful (and if I find a decent implementation to integrate).

Though the API and selector engine make it really easy to create site search pages, I pretty much always end up duplicating the same features from site to site. Also – since it takes a bit of extra time – it's tempting to skip over some accessibility related things, and leave features like text highlighting out. Overall I think it makes sense to bundle all that into a module, which can then be reused over and over again ?

Note: markup generation is not yet built into the module, which is why the examples below use PageArray::render() method to produce a simple list of results. This will be added later on, as a part of the same module or a separate Markup module. There's also no fancy JS API or anything like that (yet).

This is an early release, so be kind – I got the find feature working last night (or perhaps this morning), and some final tweaks and updates were made just an hour ago ?

Usage

  1. Install SearchEngine module.

Note: the module will automatically create an index field install time, so be sure to define a custom field (via site config) before installation if you don't want it to be called "search_index". You can change the field name later as well, but you'll have to update the "index_field" option in site config or module settings (in Admin) after renaming it.

  1. Add the site search index field to templates you want to make searchable.
  2. Use selectors to query values in site search index.

Note: you can use any operator for your selectors, you will likely find the '=' and '%=' operators most useful here. You can read more about selector operators from ProcessWire's documentation.

Options

By default the module will create a search index field called 'search_index' and store values from Page fields title, headline, summary, and body to said index field when a page is saved. You can modify this behaviour (field name and/or indexed page fields) either via the Module config screen in the PocessWire Admin, or by defining $config->SearchEngine array in your site config file or other applicable location:

$config->SearchEngine = [
    'index_field' => 'search_index',
    'indexed_fields' => [
        'title',
        'headline',
        'summary',
        'body',
    ],
    'prefixes' => [
        'link' => 'link:',
    ],
    'find_args' => [
        'limit' => 25,
        'sort' => 'sort',
        'operator' => '%=',
        'query_param' => null,
        'selector_extra' => '',
    ],
];

You can access the search index field just like any other ProcessWire field with selectors:

if ($q = $sanitizer->selectorValue($input->get->q)) {
    $results = $pages->find('search_index%=' . $query_string . ', limit=25');
    echo $results->render();
    echo $results->renderPager();
}

Alternatively you can delegate the find operation to the SearchEngine module:

$query = $modules->get('SearchEngine')->find($input->get->q);
echo $query->resultsString; // alias for $query->results->render()
echo $query->pager; // alias for $query->results->renderPager()

Requirements

  • ProcessWire >= 3.0.112
  • PHP >= 7.1.0

Note: later versions of the module may require Composer, or alternatively some additional features may require installing via Composer. This is still under consideration – so far there's nothing here that would really depend on it, but advanced features like stemming most likely would.

Installing

It's the usual thing: download or clone the SearchEngine directory into your /site/modules/ directory and install via Admin. Alternatively you can install SearchEngine with Composer by executing composer require teppokoivula/search-engine in your site directory.

  • Like 28
  • Thanks 4
Link to comment
Share on other sites

2 hours ago, teppo said:

Overall I think it makes sense to bundle all that into a module, which can then be reused over and over again ?

Absolutely true! Thx for sharing! ? Are there any live examples of you module?

  • Like 1
Link to comment
Share on other sites

12 minutes ago, bernhard said:

Are there any live examples of you module?

Was afraid someone would ask that ?

No, not really a good example. I've been testing the module at wireframe-framework.com and just now added a couple of lines of code to get a very crude results list up: https://wireframe-framework.com/?q=composer. The thing is that I don't have a clean implementation for a results list (or a search form etc.) built into the module yet – that's something I'm going to add next.

I've got some sites on the way that all need this feature ?

  • Like 2
Link to comment
Share on other sites

On 7/15/2019 at 4:04 PM, Ivan Gretsky said:

Is there an easy way to include all the necessary fields (by name/type) from Repeater/RepeaterMatrix fields?

Each indexed field should be declared (named) in the config setting mentioned above, or via the "Indexed fields" AsmSelect field in module config. If you have a lot of fields to add, this may of course take a while, but generally unless you have a whole lot of fields it shouldn't be a major issue. Also, SearchEngine doesn't (currently) distinguish between Repeater / PageTable / RepeaterMatrix content: if they contain indexable fields, the values from those fields are indexed as part of the parent page's search index.

Though now that I've said that last point out loud, I think that content from repeatable fields should only ever be included in the (parent page's) index if those fields are set to be indexable as well. I'm going to make this change in the next release ?

Edit: done now (0.3.2). Repeatable fields need to be included in the indexed_fields array before their values can be stored in the index. This is really how it should've been from the start.

--

Note that if you use the config setting instead of module settings for "indexable_fields", it's possible to generate that list of fields programmatically. I.e. you can define some hard-coded field names, and then merge that array with another one you generate with code. I'm not sure how efficient that would be, but it's doable at least.

Additionally there are various hookable methods in the Indexer class included with SearchEngine, so if you need something more specific, you can always hook into those and change what gets indexed.

  • Like 3
Link to comment
Share on other sites

Quick update on this module: got a bunch of rendering features almost ready to ship. Still need to add paging support and may add a "fast mode" that always returns default markup (not sure how important that would really be, it's just an idea I've been tinkering with), but that's just about it.

I'll probably bundle the (optional) styles – and later scripts – with the module, just to provide a decent-looking default state right out of the box (for actual use or testing). Also, I'm thinking of adding a "load more" alternative to paging, but that'll probably be a later addition.

Somewhat off-topic, but I've been going back and forth with BEM: sometimes I love it, but often it just complicates things and makes both CSS and markup hard to maintain. Nevertheless, in a project like this it's just brilliant!

By declaring search form, results list, and a single result as separate components, I can quite easily style them individually, not worry too much about messing with site markup in general, and it's also really easy to override just s specific part of the default styles in site-specific styles.

Anyway, currently the ("full", i.e. a form and a results list) rendered state looks like this:

249634706_ScreenShot2019-07-16at10_21_59.thumb.png.f903e380cbba4415f2bd45f52ce0bef1.png

  • Like 7
Link to comment
Share on other sites

Thx Teppo, that module will be a great contribution! ?

5 hours ago, teppo said:

I'll probably bundle the (optional) styles – and later scripts – with the module, just to provide a decent-looking default state right out of the box (for actual use or testing). Also, I'm thinking of adding a "load more" alternative to paging, but that'll probably be a later addition.

Maybe you could already think of different frameworks so that others can add default markup for different frameworks easily in separate folders? Maybe just define a path where the module should take files to render, then we could have some examples in the module itself, like /site/modules/SearchEngine/tpl/uikit/head|entry|foot.php or /site/templates/search/head|entry|foot.php

  • Like 3
Link to comment
Share on other sites

4 hours ago, bernhard said:

Maybe you could already think of different frameworks so that others can add default markup for different frameworks easily in separate folders? Maybe just define a path where the module should take files to render, then we could have some examples in the module itself, like /site/modules/SearchEngine/tpl/uikit/head|entry|foot.php or /site/templates/search/head|entry|foot.php

The way markup generation is currently implemented can be seen in the dev branch at the GitHub repository – it's pretty easy to grasp by taking a look at the render settings in the SearchEngine.module.php file: https://github.com/teppokoivula/SearchEngine/blob/dev/SearchEngine.module.php#L70:L124.

Instead of defining big chunks of markup in one go, I've tried to make the rendering "atomic" enough to allow for pretty much any kind of modification. Technically a framework-specific version could mean just alternative 'classes' and 'templates' arrays, and of course whatever custom styles might be needed.

Not sure if I got your point right, though, so please let me know if this seems very different to what you were thinking ?

  • Like 4
Link to comment
Share on other sites

Another quick update: rendered search feature is now visible at https://wireframe-framework.com/search/. Entire content area (between main menu and the footer) is rendered by the SearchEngine module. The rendered version (if bundled styles are used, which is optional) should look more or less like that on most sites – obviously site styles come into play, so it may be a bit off and require tweaking, but that's to be expected ?

Pager is now included as well, but on this site you'd have to search for something silly like "co" to see it in action. The default limit is set to 20 results, and the site doesn't have that many pages to begin with.

The styles are organised into "themes", which currently means any number of custom CSS and/or JS files, but I'll probably expand these to include custom config files as well sometime soon. This way it should be easy to create multiple built-in themes to select from (for different frameworks or whatever).

The render feature is currently only available via the dev branch of the module. I'll test it a bit more before merging to master – probably tomorrow ?‍♂️

  • Like 7
Link to comment
Share on other sites

20 hours ago, teppo said:

Not sure if I got your point right, though, so please let me know if this seems very different to what you were thinking ?

No, thanks, that's similar to what I was talking about and I have to try your module before I can make any further suggestions ? 

13 hours ago, teppo said:

but I'll probably expand these to include custom config files as well sometime soon. This way it should be easy to create multiple built-in themes to select from (for different frameworks or whatever).

That was exactly what I was thinking of ? 

Thx for sharing the search link. I tried it using "test" as keyword. The result did not show "test" anywhere on the results page. I then clicked on https://wireframe-framework.com/docs/directory-structure/ and found "latest" there. That' exactly the things that are tedious when implementing site searches and that's why I'm really thankful that you built a module for it that we can improve step by step (and hopefully as community). I'll need your module on a project soon. Maybe I can contribute something then. 

Link to comment
Share on other sites

3 hours ago, bernhard said:

Thx for sharing the search link. I tried it using "test" as keyword. The result did not show "test" anywhere on the results page. I then clicked on https://wireframe-framework.com/docs/directory-structure/ and found "latest" there. That' exactly the things that are tedious when implementing site searches and that's why I'm really thankful that you built a module for it that we can improve step by step (and hopefully as community).

It's true that this is a bit tedious – to say the least ?

The indexer doesn't currently have stemming support or anything like that built-in, which means that it can only find exact matches. I'd love to include something more elegant, but I'd first need to find a solution that works reliably across multiple languages – at least German, Swedish, Finnish, and English, since those are the ones I personally need most, and that would probably cover a large part of the audience here. If such a feature does get added, it might also make sense to distribute it as a separate module (which wouldn't be particularly hard, really).

By default the module uses "%=" selector, which I've personally preferred over "*=".This is partly because I actually want partial matches, but also because back in the days the "*=" selector didn't seem to work too well with Finnish content. This way I'm not constantly worrying about MySQL quirks, full-text stopwords, etc. Anyhow – I just experimented by changing the selector operator to "*=", and now the only hit is the "Why use Wireframe?" page, which has the word "battle-tested" on it.

Default selector is configurable via site config, or you can pass custom selector in if you query results yourself, and that might actually be enough to resolve some issues already. Currently there are a lot of options that are only configurable via site config (code), but I'll probably add these (selectively) to the admin as well. It's just a lot easier to add them in code first ?

(Edit: on a second thought I'm probably going to switch the default selector to "*=". Seems like it will be better for most users, and those who don't like it can always change it.)

Finally, the description text shown on the search results page is pretty "dumb", currently defaulting to the value of the summary field. Obviously that won't do for sites that don't have this field, but it's there for most default profiles, so that seemed like a reasonable starting point. In my own projects I'd probably make that "meta_description|summary", since those fields will always be there, and they are pretty much exactly what a view like this needs.

I'd love to create the summary from the index so that it can always (or nearly always at least) show a piece of text that matched (kind of like Google does), but since the index is essentially a whole lot of mismatched text stuck together, that would often look pretty awful. Google does these "custom excerpts" well – obviously – but even Relevanssi (which is the de facto search plugin for WP) struggles with this part and in many projects produces unreadable mess when the custom excerpt option is enabled.

(Mainly because such a feature is a nightmare from a logical sense.)

3 hours ago, bernhard said:

I'll need your module on a project soon. Maybe I can contribute something then. 

Awesome! ?

Edited by teppo
Added note about future default selector change.
  • Like 2
  • Thanks 1
Link to comment
Share on other sites

One last update for the day: master branch of the module is now at 0.5.0. This includes both the rendering features mentioned above, and a slightly more polished theming feature, where each theme can declare custom markup, styles, scripts, strings, etc. More details in the README file.

  • Like 7
Link to comment
Share on other sites

@teppo First I would like to thank you for creating this module!

However, I'm having problems with it. When I try to index my fields I get two warnings: Warning: Declaration of SearchEngine\Renderer::__get(string $name) should be compatible with ProcessWire\Wire::__get($name) in xxx/modules/SearchEngine/lib/Renderer.php on line 581

Warning: Declaration of SearchEngine\Query::__get(string $name) should be compatible with ProcessWire\Wire::__get($name) in xxx/modules/SearchEngine/lib/Query.php on line 226

Then it says "indexed 0 pages in 0 seconds".

Why does it do that?

Thank you.

  • Like 1
Link to comment
Share on other sites

Hey @VeiJari, thanks for your report ?

Those warnings were fixed in version 0.5.1 – I keep forgetting that core classes don't use strong typing for non-object parameter values. Lesson learned: always develop with debug mode on (or use something else to display all warnings).

"Indexed 0 pages in 0 seconds" isn't related to the warnings. This just means that the module couldn't find pages with your configured search index field. I've added a warning message for this situation, explaining what happened and what to check first (that your index field is actually added to at least one template with existing pages), but other than that I'd have to know a bit more about your specific setup to answer why this is happening.

Additionally (with latest release 0.6.0) I've changed the defaults so that any non-trashed page with the index field is now automatically included. Previously the default selector for this feature only included public pages, which meant that hidden and/or unpublished pages were not indexed (via the "Index pages now?" feature, that is – saving the page itself indexed it as expected).

--

If you're still getting a message that indexable pages couldn't be found, please let me know how you've configured the module, and which version of ProcessWire you're using. Also please make sure that you actually have indexable pages to begin with – unless you've specified a selector string for the "Index pages now?" feature, this is the selector that finds those pages for you ?

  • Like 2
Link to comment
Share on other sites

18 hours ago, teppo said:

Hey @VeiJari, thanks for your report ?

Those warnings were fixed in version 0.5.1 – I keep forgetting that core classes don't use strong typing for non-object parameter values. Lesson learned: always develop with debug mode on (or use something else to display all warnings).

"Indexed 0 pages in 0 seconds" isn't related to the warnings. This just means that the module couldn't find pages with your configured search index field. I've added a warning message for this situation, explaining what happened and what to check first (that your index field is actually added to at least one template with existing pages), but other than that I'd have to know a bit more about your specific setup to answer why this is happening.

Additionally (with latest release 0.6.0) I've changed the defaults so that any non-trashed page with the index field is now automatically included. Previously the default selector for this feature only included public pages, which meant that hidden and/or unpublished pages were not indexed (via the "Index pages now?" feature, that is – saving the page itself indexed it as expected).

--

If you're still getting a message that indexable pages couldn't be found, please let me know how you've configured the module, and which version of ProcessWire you're using. Also please make sure that you actually have indexable pages to begin with – unless you've specified a selector string for the "Index pages now?" feature, this is the selector that finds those pages for you ?

Upgraded to the latest version and warnings disappeared. 

I also noticed that you have to manually add the search_index field to the templates you want to index. I assumed it would automatically add them to the templates you want to index. It could be handy, but not a deal breaker to the honest. Now I got it working, thanks for your help!

The next step is to translate the texts. I tried to do it via language, but nothing changed? So there's no translate support yet? 

  • Like 1
Link to comment
Share on other sites

1 hour ago, VeiJari said:

Upgraded to the latest version and warnings disappeared. 

I also noticed that you have to manually add the search_index field to the templates you want to index. I assumed it would automatically add them to the templates you want to index. It could be handy, but not a deal breaker to the honest. Now I got it working, thanks for your help!

Great to hear that it works now. The problem with automating this is that I don't want to assume which templates should become searchable. That being said, I could add an install-time setting to pick templates to add this field to, or alternatively a module config setting that fetches this info real-time from templates. I think that'd be a pretty nice solution overall, so I've added that to my to-do list.

1 hour ago, VeiJari said:

The next step is to translate the texts. I tried to do it via language, but nothing changed? So there's no translate support yet? 

There was – it just wasn't working quite as expected ?

I hadn't tested this properly: translatable strings were defined and cached too early during the module's bootup sequence, with the result that the language of the user wasn't actually set yet, and thus you always ended up with the default (english) versions. This should be fixed in the latest release, 0.6.2. There's a basic demo here with most public-facing search-related terms translated: https://wireframe-framework.com/haku/.

Sorry for the inconvenience, and thanks for digging out these issues!

  • Like 2
Link to comment
Share on other sites

1 hour ago, ceberlin said:

The PW "Upgrades" module does not recognize those updates... it still shows version 0.3.1. as current (in my setup), unfortunately

Thanks for reporting this. I'm not quite sure why the Modules Directory won't recognise the module version properly, but I've now updated it manually and so far it seems to stick.

The info JSON examples at the API ref show a format that isn't actually supported by ProcessWire itself, so it's a possible that this is a bug in the modules directory parser ?

  • Like 2
  • Thanks 1
Link to comment
Share on other sites

On 7/25/2019 at 4:07 PM, teppo said:

Great to hear that it works now. The problem with automating this is that I don't want to assume which templates should become searchable. That being said, I could add an install-time setting to pick templates to add this field to, or alternatively a module config setting that fetches this info real-time from templates. I think that'd be a pretty nice solution overall, so I've added that to my to-do list.

There was – it just wasn't working quite as expected ?

I hadn't tested this properly: translatable strings were defined and cached too early during the module's bootup sequence, with the result that the language of the user wasn't actually set yet, and thus you always ended up with the default (english) versions. This should be fixed in the latest release, 0.6.2. There's a basic demo here with most public-facing search-related terms translated: https://wireframe-framework.com/haku/.

Sorry for the inconvenience, and thanks for digging out these issues!

Updated to the latest version and also you fixed the items url's not including the rooturl, cheers for that!

Now I can also translate without problems, thank you for correcting errors in such a short notice!

+1 recommendation for this mod.

  • Like 1
Link to comment
Share on other sites

On 7/13/2019 at 10:43 AM, teppo said:

By default the module will create a search index field called 'search_index' and store values from Page fields title, headline, summary, and body to said index field when a page is saved.

Is there a way to index all pages without editing each page and save again? i.e. if I install the search engine in an existing site.

Can I just do:

$modules->get('SearchEngine')->indexPages()

?

Link to comment
Share on other sites

2 hours ago, dragan said:

Is there a way to index all pages without editing each page and save again? i.e. if I install the search engine in an existing site.

Can I just do:


$modules->get('SearchEngine')->indexPages()

?

If you update to module version 0.9.0 (latest release), you can ?

More details here: https://github.com/teppokoivula/SearchEngine#rebuilding-the-search-index.

Note that indexing the pages is also possible via module config screen. There's an option for "Manual indexing", where you can either index all indexable pages, or those matching a specific selector.

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

  • 4 weeks later...

@teppo

Hey, is there a way to exclude the path from the summary results?

I've added a custom config table and I've managed to change the "summary" field to our corresponding one.

But I couldn't find a way to exclude 'result_path' in renderer.php on line 204. (other than just comment it out)

Is this something that needs to be added? Or is there a trick to it?

 

Link to comment
Share on other sites

Hey @VeiJari,

This should be doable by either hooking into the Renderer::renderResults() method and replacing it with custom method of your own (that doesn't contain the path), or simply by setting the result_path template string to an empty string or null via config options:

$config->SearchEngine = [
    'render_args' => [
        'templates' => [
            'result_path' => '',
        ],
    ],
];

Let me know if this doesn't work, though ?

  • Like 1
Link to comment
Share on other sites

11 hours ago, teppo said:

Hey @VeiJari,

This should be doable by either hooking into the Renderer::renderResults() method and replacing it with custom method of your own (that doesn't contain the path), or simply by setting the result_path template string to an empty string or null via config options:


$config->SearchEngine = [
    'render_args' => [
        'templates' => [
            'result_path' => '',
        ],
    ],
];

Let me know if this doesn't work, though ?

Works, thank you for the quick response yet again! :)

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...