Jump to content

SearchEngine


teppo

Recommended Posts

Hello,

first of: thank you for this great module ?

I have implemented it more or less successfully. But I've hit a wall. I've tried generating the code with JSON, manually or with the default setup, where the module handles everything. And no matter what I try, I don't get a summary to work. I will receive title and url, but no summary. I am using Multilanguage (with de and en) and most fields are RepeaterMatrix.

German and Englisch works fine, Multilanguage works on the titles and urls.

But I don't receive any kind of summary in the code.

I've checked and rechecked the documentation, but I didn't find a solution for it.

Link to comment
Share on other sites

  On 5/14/2024 at 8:55 PM, GedankenSchmiede said:

But I don't receive any kind of summary in the code.

I've checked and rechecked the documentation, but I didn't find a solution for it.

Expand  

The "summary", just in case this was overlooked, isn't (entirely) an in-built feature for the module. It needs to be told what field in your template(s) will be used for the search result summary when rendered. From the documentation on the Modules page, under the "Options" heading, check the render_args property of the module's config, and look for the below:

// Summary of each result (in the search results list) is the value of this field.
'result_summary_field' => 'summary',

In the config, the "result_summary_field" points to the field used in your instance of ProcessWire this module is being used in that will be used to render the search result template's summary. So if in your templates you either don't have a summary field, or the field you use to define a summary is named differently, you'd need to use whatever value you have for your template(s). If maybe you used something like "short_description" as a page summary field, go with that. If you don't have a summary, you could use a "body" or "content" field, and in the render template use some form of string truncation, such as sanitizer()->truncate($your_summary_field, 80).

If you're using JSON, slightly further down is a different section for that:

// These settings define the fields used when search results are rendered as JSON.
'results_json_fields' => [
    'title' => 'title',
    'desc' => 'summary',
    'url' => 'url',
],

Does that help at all?

  • Like 2
Link to comment
Share on other sites

  • 3 months later...

This is really great and I appreciate the module and the approach!

Question: Is it possible to manually add content that should be indexed? My use case is fields that exist within RockPageBuilder and I'd like to update the index with field content using a hook.

Thanks!

Link to comment
Share on other sites

Hi @FireWire,

before @Teppo gives you the right answer and trick ? i can already answer, yes it is possible

i have a website that uses a lot data coming from a totally different db for some pages and i wanted those contents and pages to be indexed as well, here is the trick
- i've created a field (textarea) named extcont (for external content but of course, name it as you want ? ) and added this field to the indexable fields for the template that have it
- and then, i used a hook in the ready.php file to fill the module index field this way

$this->addHookAfter('Pages::saveReady', function(HookEvent $event) {
    $page = $event->arguments(0);
    $template = $page->get('template');
    require_once('templates/_dbc.php'); // my connectionh to the external db needed by the class method i use below
	require_once('templates/_func.php'); // same thing for some functions i need in that same method
    if ( $template == 'an_edition' )
    {
        require_once('classes/myVictimPage.php');
        $id_ed = $page->id_ed; // a "normal" field in the page to get the... edition id :)
        $ext = $page->get_ext($id_ed); // a method in the template class that returns all the content i need to index in raw form
        $page->extcont = $ext[0]; // for the default language
        $page->extcont->setLanguageValue('en', $ext[1]); // guess, it's a multilingual website :) and here i add data to the field
    }
    //... and some more for the other templates that need it
}

and it works like a charm for many different templates, a program, history, etc;, in your case i think you may just have to add RockPageBuilder returned content but here is the kind of hook you can use to add some extra content to the the field indexed by the module
little piece of advcie is add only raw text content without any html

hope it may help

have a nice day

  • Thanks 1
Link to comment
Share on other sites

@virtualgadjo I was able to find a similar conversation on a RPB module support thread and put something together. I did take an idea from you using the Pages::saveReady event rather than the Pages::saved event in that example which I like better. Same setup with the extra field as you have done. In case this helps anyone else, here's what I came up with. It may be useful outside of the context of RockPageBuilder where field types matter.

/**
 * Adds any content in RockPageBuilder field blocks to the dedicated indexable search_engine_block
 * field that it then added to the search index
 */
$wire->addHookBefore('Pages::saveReady', function($event) {
    $page = $event->arguments(0);

    // Get only RPB fields if they exist
    $rpbFields = array_filter($page->fields->getArray(), function(Field $field) {
        return $field->type instanceof FieldTypeRockPageBuilder;
    });

    if (!$rpbFields) {
        return;
    }

    // Map RPB fields with values from getSearchIndexValues method if it exists on the child block
    $indexableContent = array_map(function(Field $rpbField) use ($page) {
        $blocks = $page->getFormatted($rpbField->name)->getArray();

        // Merge content for each block within a field into a single array
        return array_reduce($blocks, function($values, $block) {
            if (!method_exists($block, 'getSearchIndexValues')) {
                return $values;
            }

            return $values = [...$values, ...$block->getSearchIndexValues()];
        }, []);

    }, $rpbFields);

    // Flatten array of arrays containing index-prepared content
    $indexableContent = array_merge(...$indexableContent);

    if ($indexableContent) {
        // This is where it may be improved to make use of a SearchEngine method
        $page->search_index_blocks = implode(' ... ', $indexableContent);
    }
});

The last comment is right above where I think it would be useful to see if there's a way to make use of the SearchEngine object to index content. My implode() method is mimicking the format of the search_index field with ' ... ' but deferring that rather than mimicking it would be great. It's not a dealbreaker but it would help keep my code from knowing too much about how SearchEngine works internally.

Thanks @virtualgadjo for sharing!

  • Like 3
Link to comment
Share on other sites

  • 4 months later...

I think there may be an error in the try/catch block when the FileIndexer attempts to index a file, and fails.

  Quote

Fatal Error: Uncaught Error: Cannot use object of type SearchEngine\FileIndexer\FileIndexerPdfParser as array in /site/modules/SearchEngineFileIndexer/SearchEngineFileIndexer.module.php:112 Stack trace:

Expand  

That line points to $file_indexer['method'] as shown below:

<?php
// Attempt to read file data using file indexer
try {
    $text = $file_indexer->getText($file);
} catch (\Exception $e) {
    $this->log->error(sprintf(
        'SearchEngineFileIndexer::%s error for file at %s: %s',
        $file_indexer['method'],
        $file_info['filename'],
        $e->getMessage()
    ));
    return null;
}

In a cursory search, I was unable to find a method property or name that reported the current indexing method type. I don't want to suggest a fix without knowing what the intended error log value was. In this instance, the file also failed to properly upload/save. I don't know if it's because of a server configuration or if the search indexing ended up being a blocking behavior. I suspect it's the fault of the server, but identifying the error is difficult since this was the only error reported.

Link to comment
Share on other sites

  • 3 weeks later...

Hi.
I just upgraded my installation of dedicated server to PHP 8.4 and latest PW and Search Engine is showing this errors:

image.thumb.png.b3c6e7f92a271e7b05a153efd49741b1.png

It seems that it is some problems with selector with null ?

I found some fix for this:

https://dev.to/gromnan/fix-php-84-deprecation-implicitly-marking-parameter-as-nullable-is-deprecated-the-explicit-nullable-type-must-be-used-instead-5gp3

So I have to set some default value or is needed to be in code of module?

Your module is working right anyway and is amazing for searching - so I just need to know if I could just ignore this for now...

Thanks for any help.

 

  • Like 1
Link to comment
Share on other sites

  • 1 month later...

Looks like I've got some catching up to do here 🙊

  On 1/26/2025 at 11:14 AM, Pavel Radvan said:

I just upgraded my installation of dedicated server to PHP 8.4 and latest PW and Search Engine is showing this errors:

Expand  

Thanks for reporting this. It should now be fixed in the latest version of the module!

  • Like 2
Link to comment
Share on other sites

  On 5/14/2024 at 3:40 PM, Neue Rituale said:

Hi Teppo,

Would it be possible to make the 'isRepeatableField' in Indexer.php method hookable. I would like to attach our FieldType 'FieldtypePageTableNext'.
That would be great. ?

Expand  

Sorry for the delay — this method is now hookable 🙂

  • Like 2
Link to comment
Share on other sites

  • 2 weeks later...

Has someone put together a tutorial for displaying search results via AJAX requests? I've been finding this pattern a bit challenging, not having done much with AJAX. I see that the module returns JSON, and that javascript can be used to return what I see rendered at the static page mysite.com/?q=mySearchTerm … I'm not exactly sure how do to it, my working code is simply:

<div class="uk-visible@m">
  {var $searchengine = $modules->get('SearchEngine')}
  {$searchengine->render()|noescape}
</div>

I see that Rockfrontend also provides AJAX endpoints. Some combinations of these approaches? Any help would be appreciated.

  • Like 1
Link to comment
Share on other sites

@protro  I use this module with htmx and it works nicely. It's pretty easy to do and it can all be done in HTML without handwriting JavaScript or parsing JSON.

Here's a simple example. It should work, it may need tweaking but the concept is accurate.

<!-- Your search form -->
<style>
  .search-box {
    position: relative;
  }

  .search-indicator {
    align-items: center;
    display: flex;
    inset: 0;
    justify-content: center;
    opacity: 0;
    pointer-events: none;
    transition: opacity .3s;
  }

  /* Style your AJAX indicator as you please */
  .htmx-request .search-indicator,
  .htmx-request.search-indicator {
    opacity: 1;
    pointer-events: auto;
  }
</style>
<div class="search-box">
  <form hx-get="the search URL here" hx-target="#search-results-container" hx-disabled-elt="button[type=submit]" hx-indicator=".search-indicator">
    <input type="search" inputmode="search" name="q" placeholder="What would you like to find?">
    <input type="submit" value="Search">
    <div id="search-results-container">
      <!-- Search results will load inside this div -->
    </div>
  </form>
  <div class="search-indicator">
    <span>Searching...</span>
  </div>
</div>
  • hx-get tells htmx to request the page from the URL you provide
  • hx-target tells htmx where to put the markup that it receives back from the AJAX request, it accepts any CSS selector
  • hx-disabled-elt directs htmx to disable the submit button during the request to prevent people from clicking the button multiple times, it accepts any CSS selector
  • hx-indicator isn't required, but tells htmx  which element to add the .htmx-request class while the request is in flight and then remove it when the results are loaded for a nice "loading" display. It accepts any CSS selector

On the search page all you have to do is render the results that will appear inside #search-results-container, just markup. This is a rough example, but just illustrates how simple it can be. You'll have to tailor this to your needs. You can probably use markup that the SearchEngine module generates, but I haven't used that before myself.

<div class="search-results">
  <?php if ($searchResults->count()): ?>
    <ul>
      <?php foreach ($searchResults as $result): ?>
        <li>
          <p><?=$result->title?></p>
          <a href="<?=$result->url?>">View Page</a>
        </li>
      <?php endforeach ?>
    </ul>
  <?php else: ?>
    <p>There were no search results for "<?=$sanitizer->entities1($input->get->q)?>"</p> <!-- Thanks to teppo for reminder to sanitize -->
  <?php endif ?>
</div>

That's the only markup you need. You'll probably need to do some additional work to make things like paging happen, but it's all doable.

I cobbled this together based on some code I've already written so it's just a starting point and probably needs some tinkering. Here's a handy reference of htmx attributes that will let you do anything you need.

Edited by FireWire
Edited example for safety
  • Like 3
  • Thanks 1
Link to comment
Share on other sites

HTMX is definitely a good solution for this!

Just a minor note, even though I do get that this was meant as a rough example: be sure to always sanitize user provided input before output 🙂

    <p>There were no search results for "<?=$sanitizer->entities1($input->get->q)?>"</p>

As an alternative to rendering HTML in the backend, SearchEngine provides an option to output JSON, but that way you'll have to write a bit more JS. Querying data with fetch is easy, but you'll also need to rende results, handle debouncing, etc.

I've been thinking of adding a prebuilt solution for that into the module itself. In the meantime here's a gist that shows one ajax search approach — just hacked it together quickly, so there are likely things that I've forgot to account for, but it seems to work based on a quick test on one of my sites 🙂

 

  • Like 2
  • Thanks 2
Link to comment
Share on other sites

  On 3/29/2025 at 8:15 AM, teppo said:

be sure to always sanitize user provided input before output

Expand  

I like to live dangerously*

 

*please don't live dangerously

 

Thank you for the correction of my oversight @teppo. I've edited my example to include that in case someone doesn't make it further down the comments 👍

  • Haha 1
Link to comment
Share on other sites

Thank you @FireWire and @teppo for your help.

Before I attempt an htmx approach, I wanted to see if I could get @teppo's javascript working. It seems I am almost there, but curious why 

$modules->get('SearchEngine')->renderResultsJSON()

is returning the full HTML page (<!DOCTYPE …) in the Response (verified in Inspector -> Network Tab). Status code is 200 and X-Requested-With shows XMLHttpRequest.

I incorporated the javascript code from the gist that was referenced, and verified with console.log messages that it is identifying the searchForm and other elements.

This is my template code where SearchEngine is invoked (I am also calling the styles and scripts in the <head> as you reference in the module documentation).

<div class="uk-visible@m">
  {var $searchengine = $modules->get('SearchEngine')}

  {$searchengine->renderForm()|noescape}

  {if $config->ajax}
  {$searchengine->renderResultsJSON()|noescape}
  {/if}
</div>

I am using MAMP serving at http://localhost:8888/mysite/
Do I need to pass additional formatting to the  renderResultsJSON as you have in the example here ?

Thank you for looking this over,

Link to comment
Share on other sites

@protro, this is an area where it is very difficult to give exact answers, but I'll try.

Now, the reason I can't give 100% solid answer is that I don't know what kind of output strategy you are using, etc. And from your example I can see that you are using some templating language, which potentially complicates things a bit.

The gist of it is that $searchengine->renderResultsJSON() won't remove page markup, it will just render JSON blob. If you were using a plain "direct output" output strategy, sample code could look something like this:

<?php namespace ProcessWire;

// /site/templates/search.php

if ($config->ajax) {
  // AJAX request, send JSON header, output JSON and exit
  header('Content-Type: application/json; charset=utf-8');
  echo $modules->get('SearchEngine')->renderResultsJSON();
  exit;
}
?>
<html>
  <body>
    
    <!-- This is your normal HTML output -->
    <?= $modules->get('SearchEngine')->renderResults() ?>
    
  </body>
</html>

How to apply this to your particular site / output structure depends 🙂

In some cases it could be easier to implement this as a separate endpoint, e.g. using URL hooks. Perhaps something like this:

<?php namespace ProcessWire;

// /site/init.php or /site/ready.php

$wire->addHook('/json/search/', function($event) {
  header('Content-Type: application/json; charset=utf-8');
  return wire()->modules->get('SearchEngine')->renderResultsJSON();
});

In this case you would modify the code so that it sends JS fetch requests to this URL instead.

Note: not tested, written in browser, may not work. But that's the general idea.

  • Like 1
Link to comment
Share on other sites

  On 3/29/2025 at 1:01 AM, protro said:

I see that Rockfrontend also provides AJAX endpoints. Some combinations of these approaches? Any help would be appreciated.

Expand  

In my last post I said that I don't know what output strategy you are using, but from this I would assume that you are actually using RockFrontend. And thus that templating language you are using is probably Latte.

To be honest I have no idea how you would do this in a RockFrontend-native way, but "AJAX endpoints" sound like a potential answer.

Personally I would probably just go with an URL hook in this case, as it is pretty much "universal" for any ProcessWire site, regardless of what output strategy they are using. Gave my example in above post a quick try and it seems to work fine.

URL hooks are awesome 🙂

Link to comment
Share on other sites

Thank you @teppo this is so helpful. I am trying your hook code above, but it says undefined method returnResultsJSON() after placing it in /site/ready.php
Strange. I see that the method is referenced at the top @method in the module code. What am I not understanding ?

Edit: I ended up getting this to work by creating /site/templates/search.php with the following code:

<?php namespace ProcessWire;

if ($config->ajax) {
  // AJAX request, send JSON header, output JSON and exit
  header('Content-Type: application/json; charset=utf-8');
  echo $modules->get('SearchEngine')->renderResultsJSON();
  exit;
}

Crucially, I forgot to create a new template 'search' and a new page using this template in the PW Admin Panel, (the page is titled 'Search', and set as a Hidden Page). After that, I had to modify my markup to use the search page endpoint as the url.

header.latte (where the search bar markup lives), passing options to the /search/ endpoint url

<div class="uk-visible@m">
  {var $searchengine = $modules->get('SearchEngine')}

  {* Pass the form_action parameter to point to your search.php file *}
  {var $formOptions = ['form_action' => $config->urls->root . 'search/']}

  {$searchengine->renderForm($formOptions)|noescape} 

  {$searchengine->renderResults()|noescape}
</div>

The Javascript @teppo provided, with a modified fetch query:

const findResults = () => {
		window.clearTimeout(searchTimeout)
		searchTimeout = window.setTimeout(() => {
			if (searchResults) {
				searchResults.setAttribute('hidden', 'true')
			}
			if (searchInput.value.length > 2) {
				if (searchCache[searchInput.value]) {
					renderResults(searchForm, searchCache[searchInput.value])
					return
				}
				if (searchInput.hasAttribute('data-request')) {
					return
				}
				searchInput.setAttribute('data-request', 'true')
				searchInput.setAttribute('disabled', 'true')
				const searchParams = new URLSearchParams()
				searchParams.append('q', searchInput.value)
				fetch(`${pwConfig.rootUrl}search/?${searchParams}`, {
					headers: {
						// set the request header to indicate to ProcessWire that this is an AJAX request; this
						// way we can check $config->ajax in the template file and return JSON instead of HTML
						// by calling $modules->get('SearchEngine')->renderResultsJSON()
						'X-Requested-With': 'XMLHttpRequest',
					},
				})
					.then((response) => {
						if (!response.ok) {
							throw new Error('Network response was not ok')
						}
            console.log(response);
						return response.json()
					})
					.then((data) => {
						searchCache[searchInput.value] = data
						renderResults(searchForm, data)
						searchInput.removeAttribute('data-request')
						searchInput.removeAttribute('disabled')
						searchInput.focus()
					})
					.catch((error) => {
						console.error('Error fetching search results:', error)
						searchInput.removeAttribute('data-request')
						searchInput.removeAttribute('disabled')
						searchInput.focus()
					})
			}
		}, 300)
	}

and an additional <script> tag in _main.php to allow the above javascript to reference the ProcessWire root properly within my MAMP local dev environment

<!-- Create a global object to store ProcessWire paths for ajax search hook -->
<script>
  var pwConfig = {
    rootUrl: '<?= $config->urls->root ?>'
  };
</script>

Maybe this is not the most elegant way to do this, but it seems to be working nicely now. I spent too many hours trying other ways. Maybe someone could point out redundancies or a better methodology. Sharing this here in case anyone else wants to add some nice AJAX functionality to @teppo's fantastic module.

Thanks!

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...