Jump to content

SearchEngine


teppo

Recommended Posts

Hi Clarity,

Can you clarify what you mean by, "modify the query during search," perhaps with an example of what you are looking for?

SearchEngine does provide the ability for you to not use its default ProcessWire searches, and instead simply prepare the index and form/results for you, so that you can form the query manually on your own depending on your own needs, as shown in the module page's documentation. Is that what you mean? If not, an example would be great. ? Some of adrian's examples within this topic may be of use too.

  • Like 1
Link to comment
Share on other sites

Hello, @BrendonKoz!

I need to change the erroneous keyboard layout after user performs the search. For example, I wrote 'ghbdtn' and that's supposed to be 'привет' (hello) on Russian. The code shown by the link is not similar to what I have. I'm using the following code ($searchEngine is a variable which contains the module):

 
<head>
  ...
	<?= $searchEngine->renderStyles() ?>
	<?= $searchEngine->renderScripts() ?>
  ...
</head>
<body>
  ...
	<?= $searchEngine->renderForm() ?>
  ...
  	<?= $searchEngine->renderResults() ?>
  ...
</body>
 
Link to comment
Share on other sites

It seems like you could take advantage of the part of the module's documentation that I linked to above, something like the following:

<?php namespace ProcessWire;
$searchEngine = $modules->get('SearchEngine');
...
<head>
    <?= $searchEngine->renderStyles() ?>
    <?= $searchEngine->renderScripts() ?>
</head>
<body>
<?php
  	echo $searchEngine->renderForm();
	// This assumes your search form uses GET, and a field named 'q'
	if ($q = $sanitizer->selectorValue($input->get->q)) {
      	// Modify your value of the search query ($q) here...
      	// ...

        // This finds pages matching the query string and returns them as a PageArray:
        $results = $pages->find('search_index%=' . $q . ', limit=25');

        // Render results and pager with PageArray::render() and PageArray::renderPager():
        echo $results->render(); // PageArray::render()
        echo $results->renderPager(); // PageArray::renderPager()

        // ... or you iterate over the results and render them manually:
        // echo "<ul>";
        // foreach ($results as $result) {
        //     echo "<li><a href='{$result->url}'>{$result->title}</a></li>";
        // }
        // echo "</ul>";
    }
?>

The biggest benefit to the SearchEngine module is its creation of the (default) `search_index` field. You can query against that field just like any other field within ProcessWire. Rendering the form and search results are just additional benefits.

  • Thanks 1
Link to comment
Share on other sites

On 12/12/2022 at 7:52 PM, BrendonKoz said:

It seems like you could take advantage of the part of the module's documentation that I linked to above, something like the following:

<?php namespace ProcessWire;
$searchEngine = $modules->get('SearchEngine');
...
<head>
    <?= $searchEngine->renderStyles() ?>
    <?= $searchEngine->renderScripts() ?>
</head>
<body>
<?php
  	echo $searchEngine->renderForm();
	// This assumes your search forum uses GET, and a field named 'q'
	if ($q = $sanitizer->selectorValue($input->get->q)) {
      	// Modify your value of the search query ($q) here...
      	// ...

        // This finds pages matching the query string and returns them as a PageArray:
        $results = $pages->find('search_index%=' . $q . ', limit=25');

        // Render results and pager with PageArray::render() and PageArray::renderPager():
        echo $results->render(); // PageArray::render()
        echo $results->renderPager(); // PageArray::renderPager()

        // ... or you iterate over the results and render them manually:
        // echo "<ul>";
        // foreach ($results as $result) {
        //     echo "<li><a href='{$result->url}'>{$result->title}</a></li>";
        // }
        // echo "</ul>";
    }
?>

The biggest benefit to the SearchEngine module is its creation of the (default) `search_index` field. You can query against that field just like any other field within ProcessWire. Rendering the form and search results are just additional benefits.

Thank you! I already did it. My code was the following (in search view):

<?php if($q = $sanitizer->selectorValue($input->get('q'))): ?>
    <?php if($pages->find('search_index%=' . $q)->count): ?>
   		<?= $searchEngine->renderResults() ?>
    <?php else: ?>
    	<?= $searchEngine->renderResults([], $searchEngine->find(Utils::handleSearch($q))) ?>
    <?php endif; ?>
<?php endif; ?>

I check first if there are search results in normal keyboard layout and then invert keyboard layout using Utils::handleSearch() method if there aren't any results.

  • Like 1
Link to comment
Share on other sites

  • 2 months later...

Hi @teppo and thank you for this module. I am evaluating Whether we can use this for a bigger project. The project makes heavy use of Repeater Matrix fields. And those seem not to be supported atm. Could you make the Indexer::indexPage method hookable? That way we could implement our own code for non supported fieldtypes. That would be awesome.

Link to comment
Share on other sites

Hi @gebeer! I'm currently using RepeaterMatrix with SearchEngine successfully, though I'm not using any non-native PW FieldTypes. I did run into an issue where RepeaterMatrix-based fields, if customized by a template to not include one of it's available fields would cause an error and not complete the index. I have a pull request pending for the repository which, in my testing, seems to have fixed the problem.

Although I suspect this might not be your issue, I wanted to share on the small chance that it does help.

  • Like 1
Link to comment
Share on other sites

40 minutes ago, BrendonKoz said:

Hi @gebeer! I'm currently using RepeaterMatrix with SearchEngine successfully, though I'm not using any non-native PW FieldTypes. I did run into an issue where RepeaterMatrix-based fields, if customized by a template to not include one of it's available fields would cause an error and not complete the index. I have a pull request pending for the repository which, in my testing, seems to have fixed the problem.

Although I suspect this might not be your issue, I wanted to share on the small chance that it does help.

Oh great, thank you. So this means that Repeater Matrix fields are supported out of the box or do you need to tell the module how to process them? If there is custom code/hooks involved would you mind sharing a snippet? ?

EDIT: I just saw that Indexer::getFieldIndex has logic for repeater fieldtypes. So no need for a snippet.

  • Like 1
Link to comment
Share on other sites

3 hours ago, gebeer said:

Hi @teppo and thank you for this module. I am evaluating Whether we can use this for a bigger project. The project makes heavy use of Repeater Matrix fields. And those seem not to be supported atm. Could you make the Indexer::indexPage method hookable? That way we could implement our own code for non supported fieldtypes. That would be awesome.

Hi @teppo after checking your module code more carefully I found the already hookable methods ___getPageIndex and ___getFieldIndex. So please ignore my post above ? 

  • Like 2
Link to comment
Share on other sites

  • 2 months later...
On 10/5/2020 at 2:29 PM, sambadave said:

Hi @teppo is it possible to order the results by the number of times the search query is found so that more important pages are listed first?

For example, a page containing the search term 5 times would appear higher than a page with the search query found only once or twice?

Hi @teppo, as I could not find an answer to this old question: Is there a way to sort the results by number of matches / occurences? That would be awesome!

Link to comment
Share on other sites

Hi @snck! Although teppo might have a different answer, I suspect it'll be similar to this.

The SearchEngine module simply makes it dead simple to add standard search functionality into ProcessWire without handling it all manually yourself (i.e.: properly parsing/escaping fields, extrapolating searchable text from files [with the SearchEngine FileIndexer add-on module], and figuring out how to generate a search result list). Beyond that, it still uses ProcessWire's own search functionality; it doesn't expand upon it. ProcessWire can do some pretty significant things in search, but overall it still relies on MySQL's fulltext search to handle everything. MySQL can offer some level of relevancy (depending on the PW selector search you choose), but it can't, as far as I know, order by number of matches found.

Relevancy is not (necessasrily/typically) the same as number of matches (per matched database record). For anything outside of MySQL's default capabilities, something external would likely need to be integrated, such as Apache Lucene or ElasticSearch.

  • Like 2
Link to comment
Share on other sites

2 hours ago, BrendonKoz said:

Relevancy is not (necessasrily/typically) the same as number of matches (per matched database record). For anything outside of MySQL's default capabilities, something external would likely need to be integrated, such as Apache Lucene or ElasticSearch.

Thanks for your reply! You're right that the number of matches is not necessarily an indicator for relevancy. As I stumbled over the quoted question I was just curious whether there was a selector or option that would allow me to quickly try it out because I have a project that could benefit from it. SearchEngine is a great module nonetheless. ?

  • Like 1
Link to comment
Share on other sites

On 5/22/2023 at 6:57 PM, BrendonKoz said:

Beyond that, it still uses ProcessWire's own search functionality; it doesn't expand upon it. ProcessWire can do some pretty significant things in search, but overall it still relies on MySQL's fulltext search to handle everything. MySQL can offer some level of relevancy (depending on the PW selector search you choose), but it can't, as far as I know, order by number of matches found.

This is true, with a couple of small twists:

  • SearchEngine supports "pinning" specific template(s) to the top of the list, or alternatively grouping results by template. These require making slight modifications (adding extra rules) to the query (DatabaseQuerySelect object) generated by ProcessWire.
  • In the dev branch of the module there is a work in progress "sort by relevance" feature, which also modifies the query. This is based on MySQL natural language full-text search, so it's still up to the database to decide how relevant each result really is.

Sorting results by number of matches, giving some fields more "weight" than others, etc. are not currently something that this module does, though I have occasionally considered if they should be. The main issue here is that it would require different storage and search mechanisms, so it's a lot of work, and additionally it would raise a few rather complicated issues (e.g. handling permissions, which is something that we currently get "for free" by relying on selectors.)

Not sure how sensible that would be, all things considered. It might make more sense to use SE to feed data to a separate search tool, or ditch SE altogether for that sort of use case ?

  • Like 3
Link to comment
Share on other sites

  • 2 weeks later...

Thanks, @teppo?

I am facing a different problem now. I have a site in debug mode and was not able to upload images realiably since installing SearchEngine. I found out that the JSON response of the server for the image upload returned a PHP error related to the Indexer:

Quote

Warning:  foreach() argument must be of type array|object, null given in /site/modules/SearchEngine/lib/Indexer.php on line 315

I am using a RepeaterMatrix field on the page (and Repeaters nested in RepeaterMatrix fields on other pages).

Do you have a solution for this problem? Obviously disabling debug mode is a temporary fix, but not ideal.

Link to comment
Share on other sites

Hey @snck,

This should be fixed in the latest version of the module, 0.35.4. I'm not entirely sure of the circumstances causing this issue, but the warning was pretty clear, so it should be fine now.

  • Like 1
Link to comment
Share on other sites

  • 2 weeks later...

Hi @teppo!

I had a warning pop up today due to my indexing of a repeater that contains a combo field. The lib/Processor.php attempted to run `implode()` on its values, but its values were that of a multidimensional array, so PHP threw an error (Warning: Array to String conversion). I solved it by flattening the multidimensional array prior to the call to implode. I'd have submitted a pull request via Github, but I thought that in this instance there might be other places this type of solution would be warranted?

Value prior to flattening:

Spoiler

 

Value after flattening:

Spoiler

 

Adjusted method (the two lines after "// Flatten any multidimensional [...]" comment):

<?php
    public function processIndex(array $index, array $args = []): string {
        $processed_index = '';
        $index = array_filter($index);
        if (!empty($index)) {
            $args = array_merge([
                'withMeta' => true,
                'withTags' => false,
            ], $args);
            $meta_index = $args['withMeta'] ? $this->getMetaIndex($index) : null;
            $processed_index = array_filter($index, function($index_key) {
                return strpos($index_key, Indexer::META_PREFIX) !== 0;
            }, ARRAY_FILTER_USE_KEY);
            // Flatten any multidimensional arrays to a single dimension, then convert to string for indexing
            $processed_index = new \RecursiveIteratorIterator(new \RecursiveArrayIterator($processed_index));
            $processed_index = iterator_to_array($processed_index, false);
            $processed_index = implode(' ... ', $processed_index);
            $processed_index = str_replace('<', ' <', $processed_index);
            if (!$args['withTags']) {
                $processed_index = strip_tags($processed_index);
            }
            // Note: "u" flag fixes a potential macOS PCRE UTF-8 issue, https://github.com/silverstripe/silverstripe-framework/issues/7132
            $processed_index = preg_replace('/\s+/u', ' ', $processed_index);
            if ($args['withMeta']) {
                $processed_index .= "\n" . (empty($meta_index) ? '{}' : json_encode($meta_index, JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES));
            }
        }
        return $processed_index;
    }

I also wasn't sure if this was the best solution to a problem that, I believe, is not explicitly supported by the module. ?

search engine.png

Link to comment
Share on other sites

  • 3 months later...

Hi,

first of all, thanks a lot for this module i really enjoy playing with ?

just a word to say, in your module file i've added

'render_args' => [
    'theme' => 'default',
    'themes_directory' => null,
    'minified_resources' => true,
    'form_method' => 'get',
    'form_action' => './',

and then

'templates' => [
    'form' => '<form id="{form_id}" class="{classes.form}" action="{form_action}" method="{form_method}" role...

this way in my pw config.php, i can chosse to render the form with a method post when needed like if i had coded the form by myself (what works great with your module too...)

thanks again and have a nice day

Link to comment
Share on other sites

  • 2 months later...
  • 1 month later...

@teppo First of all, thanks for this great module! I've used it several times already.

For a project I'm working on right now, I'd like to sort the search results like this:
First based on indexed templates. This is possible with the "_indexed_templates" option, so no problem.
After that, pages where the search term appears in the page title should be placed above pages where the search term only appears in the page content and NOT in the page title. Is this possible when using SearchEngine::find()?

Link to comment
Share on other sites

Hi @Didjee

of couse this is possible ?
once you've get the results

$results = $pages->find('search_index%=' . $q . ', limit=25');

what you do with it is totally up to you, you could for example use two foreach(es) like the one Teppo shows

foreach ($results as $result) {
    echo "<li><a href='{$result->url}'>{$result->title}</a></li>";
}

and submit, inside the foreach, the fact that in the first one the request is contained in the title and then in the second one if the request is not (as if it is part of the results, it is somewhere else in the fields you've set as indexable)

have a nice day
 

Link to comment
Share on other sites

Hi @virtualgadjo, thanks for your quick response!

I was using SearchEngine::find() for the results and not a regular ProcessWire selector. Mainly because with SearchEngine::find() you can use "_index_template" for sorting results by template (see SearchEngine module settings). AFAIK, sorting by template is a not build in feature for page selectors. But I found a way to accomplish that:

Now I'm using the code below instead of SearchEngine::find()
Results are first sorted by the given template order. After that, pages with $query in title or headline will be shown before pages with $query in search_index but NOT in title or headline.

$templates = ["news", "blog", "archive", "plain"];
$results = $pages->find("title|headline|search_index~*={$query},template=".implode('|', $templates));
$sortedResults = new PageArray();
foreach ($templates as $t) {
   $sortedResults->import($results->find("template={$t}"));
}
$results = $sortedResults;
Link to comment
Share on other sites

hi @Didjee
glad to hear you've found your solution and a good way to get a page sorting by template ?
now i just wonder why

 

$pages->find("title|headline|search_index~*={$query}

as if you've included title and headline in the template search_index, it's twice the search but well, maybe it helps ordering as you wanted too, first the result in the title, then the headline and so on
i've done many weird things with this search engine but not tried this one, yet... ?

have a nice day

Link to comment
Share on other sites

  • 1 month later...

Hello all,

I was running into an issue where a particular repeater matrix structure failed to index properly.

In a structure like this:

Template
	Repeater Matrix Field
    	Matrix Type
        	Page Field
            	Page
                    Id
                    Name
                    Title
                    Headline
                    Body
                    Summary
        

So where you have a page template with 'zones' that allow you to place a matrix type that acts like a block with a page selection field in it to insert a piece of content, that deeper piece of content was not getting properly indexed no matter what indexed fields you set in place.

In SearchEngine/lib/Index.php within the function __getFieldIndex there is a section that overrides the $indexed_fields array with a hard-coded array. In the version below I have commented it out and restored the $indexed_fields array and the referenced child page content for Repeater Matrix Types now indexes properly:

    /**
     * Get index for a single field
     *
     * @param \ProcessWire\Field $field
     * @param \ProcessWire\WireData $object
     * @param array $indexed_fields
     * @param string $prefix
     * @param array $args
     * @return array
     */
    protected function ___getFieldIndex(\ProcessWire\Field $field, \ProcessWire\WireData $object, array $indexed_fields = [], string $prefix = '', array $args = []): array {
        $index = [];
        if ($this->isRepeatableField($field)) {
            $index = $this->getRepeatableIndexValue($object, $field, $indexed_fields, $prefix);
        } else if ($field->type->className() == 'FieldtypeFieldsetPage') {
            $index = $this->getPageIndex(
                $this->getUnformattedFieldValue($object, $field->name),
                $indexed_fields,
                $prefix . $field->name . '.',
                $args
            );
        } else if ($field->type instanceof \ProcessWire\FieldtypePage) {
            // Note: unlike with FieldtypeFieldsetPage above, here we want to check for both FieldtypePage
            // AND any class that might potentially extend it, which is why we're using instanceof.
            /**
            $index = $this->getPageReferenceIndexValue($object, $field, [
                'id',
                'name',
                'title',
            ], $prefix);
            */
            $index = $this->getPageReferenceIndexValue($object, $field, $indexed_fields, $prefix);

        } else {
            $index_value = $this->getIndexValue($object, $field, $indexed_fields);
            $index[$prefix . $field->name] = $index_value->getValue();
            foreach ($index_value->getMeta(true) as $meta_key => $meta_value) {
                $meta_value = explode(':', $meta_value);
                $index[self::META_PREFIX . $meta_key . '.' . $field->name . '.' . array_shift($meta_value) . ':'] = implode(':', $meta_value);
            }
        }
        return array_filter($index);
    }

I don't know if this was set in place for performance reasons, but if you are using a page template structure with page fields you may have run into this problem where in some cases only the content titles is included in the search_index output..

This change does extend the indexing timeframe, but pages are indexed fully.

  • Like 1
Link to comment
Share on other sites

  • 2 months later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...