Jump to content

Dynamic search with htmx, hyperscript and ProcessWire


Spiria
 Share

Recommended Posts

Hello, I would really appreciate your comments on this blog post I am preparing for our site.

===========

Creating a dynamic search with very little code in ProcessWire is easy. This search cannot compete with engines like Elasticsearch, Solr, and others. However, it is suitable for most "showcase" sites. Here is how we did it on the Spiria site using the small library htmx and its companion hyperscript.

The goal

htmx01.png.c699a4f3c4ec885d4f41ba9f41ca8a9e.png

The recipe

  1. Inclusion of libraries htmx and hyperscript (the latter is optional).
  2. A field of type textarea integrated to the page models that we want to index.
  3. A code for indexing the existing content in the file `ready.php`
  4. A search controller which we name here `api.php`. This controller will also be a page with the `api` template.  
  5. A form placed in the pages requiring the search.

 Content indexing

Before we can program, we need to index the content on which we want to apply our search. In my proof of concept, I have developed two strategies. This is probably overrated, because I am not sure of the gain in speed.

  1. Index for a single term search
  2. Indexing for a multiple term search

To do this, we need to introduce two fields in each model where we want an indexation.

  1. The `search_text` field which will contain only one occurrence of each word on a page.
  2. The `search_text_long` field which will preserve all sentences without HTML tags.

We place a hook in the `ready.php` page in this way:

<?php namespace ProcessWire;
    
pages()->addHookAfter("saveReady", function (HookEvent $event) {
    $p = $event->arguments[0];
    switch ($p->template->name) {
        case "blog_article":
            $french = languages()->get('fr');
            $english = languages()->get('default');
            $txt_en = $p->page_content->getLanguageValue($english) . ' ' . $p->blog_summary->getLanguageValue($english);
            $txt_fr = $p->page_content->getLanguageValue($french) . ' ' . $p->blog_summary->getLanguageValue($french);
            $title_en = $p->title->getLanguageValue($english);
            $title_fr = $p->title->getLanguageValue($french);
            $resultEn = stripText($txt_en, $title_en);
            $resultFr = stripText($txt_fr, $title_fr);
            $p->setLanguageValue($english, "search_text", $resultEn[0]);
            $p->setLanguageValue($english, "search_text_long", $resultEn[1]);
            $p->setLanguageValue($french, "search_text", $resultFr[0]);
            $p->setLanguageValue($french, "search_text_long", $resultFr[1]);
            break;
    }
});

And

function stripText($t, $s)
{

    $resultText = [];
    $t = strip_tags($t);
    $t .= " " . $s;
    $t = str_replace("\n", " ", $t);
    $t = str_replace(",", "", $t);
    $t = str_replace("“", "", $t);
    $t = str_replace("”", "", $t);
    $t = str_replace("'", "", $t);
    $t = str_replace("?", "", $t);
    $t = str_replace("!", "", $t);
    $t = str_replace(":", "", $t);
    $t = str_replace("«", "", $t);
    $t = str_replace("»", "", $t);
    $t = str_replace(",", "", $t);
    $t = str_replace(".", "", $t);
    $t = str_replace("l’", "", $t);
    $t = str_replace("d’", "", $t);
    $t = str_replace("&nbsp;", "", $t);
    $t = preg_replace('/\[\[.*\]\]/', '', $t);
    //$t = preg_replace('/\?|\[\[.*\]\]|“|”|«|»|\.|!|\&nbsp;|l’|d’|s’/','',$t);
    $arrayText = explode(" ", $t);
    foreach ($arrayText as $item) {
        if (strlen(trim($item)) > 3 && !in_array($item, $resultText)) {
            $resultText[] = $item;
        }
    }
    return [implode(" ", $resultText), $t];
}

If you have the ListerPro module, it becomes easy to save all the pages to be indexed in batch and any new page will then be indexed as it is created.

The `stripText()` function aims at cleaning up the text as we want.  Note that, in my example, I distinguish between French and English. This little algorithm is entirely perfectible! I have commented a shorter way to clean up the text, but at the expense of comprehension.

As mentioned before, it is probably unnecessary to create two search fields. The most important thing would be to optimize the text as much as possible, as many small words are useless. The current code restricts to words longer than three characters, which is tricky in a computing context like our site where names like `C#`, `C++`, `PHP` compete with `the`, `for`, `not`, etc. That said, perhaps this optimization is superfluous in the context of a simple content and not and limited in number.

So now see the process and the research code.

The structure

structure.jpg.fc232f18f35a1ef3864128f854412037.jpg

structure.svg

This scheme is classical and needs few explanations. The `htmx` library allows a simple Ajax call.

The form

code01.jpg.a52c22b49492fa3e90272f00766a4991.jpg

code01.svg

  1. The form has a `get` method that returns to a conventional search page when the user presses the `enter` key.
  2. A hidden field with the secret key generated on the fly enhances security.
  3. The third field is the `input` of the dynamic search. It has a `htmx` syntax. The first command, `hx-post`, indicates the method of sending the data to the API. Here, it is a `post`. `htmx` allows to handle events on any DOM element. So we could have several calls on different elements of a form.
  4. The second line indicates where the API response will be sent when received, which is `div#searchResult` below the form.
  5. The `hx-trigger` command describes the context of sending to the API. Here, when the user releases a key, with a delay of 200 ms between each reading of the event.
  6. The `hx-indicator` command is optional. It tells the user that something is happening. In the example, the `#indexsearch` image is displayed. This is automatically handled by htmx.
  7. We have the possibility to pass other parameters to the search with the `hx-vals` command. The given example is simplified. We send the search language.
  8. The last command comes from the `hyperscript` syntax. It indicates that if you click anywhere outside the search field, you will make the contents of `#found` disappear, which will be described below.

It is clear from this example that no javascript is called, except the [htmx] and [hyperscript] libraries. It is worth visiting the web site of these two libraries to understand their philosophy and possibilities.

The Search API

The API resides in a normal ProcessWire page. Although it is published, it remains "hidden" from CMS searches. This page allows requests to be answered and the correct functions to be called. Several requests to the CMS can be gathered in this type of page. 

<?php namespace ProcessWire;

$secretsearch = session()->get('secretToken');
$request = input()->post();
$lang = sanitizer()->text($request["lang"]);

if (isset($request['CSRFTokenBlog'])) {
    if (hash_equals($secretsearch, $request['CSRFTokenBlog'])) {
        if (!empty($request["search"])) {
            echo page()->querySite(sanitizer()->text($request["search"]),$lang);
        }
    } else {
        echo __("A problem occurred. We are sorry of the inconvenience.");
    }
}
exit;

In this case :

  1. We extract the secret token of the session, token that will have been created in the search form page.
  2. We then process everything that is in the `post` query. Note that this is a simplified example.
  3. We compare the token with the one received in the query. If all goes well, we call the SQL query. Our example uses a class residing in `site/classes/ApiPage.php`; it can therefore be called directly by `page()`. Any other strategy is valid.

The following code represents the core of the process.

<?php namespace ProcessWire;

public function querySite($q, $l)
    {
        $this->search = "";
        $this->lang = $l == 'en' ? 'default' : 'fr';
        user()->setLanguage($this->lang);
        $whatQuery = explode(" ", $q);
        $this->count = count($whatQuery);
        if ($this->count > 1) {
            $this->search = 'template=blog_article,has_parent!=1099,search_text_long~|*=  "' . $q . '",sort=-created';
        } elseif (strlen($q) > 1) {
            $this->search = 'template=blog_article,has_parent!=1099,search_text*=' . $q . ',sort=-created';
        }
        if ($this->search !== "") {
            $this->result = pages()->find($this->search);
            return $this->formatResult();
        }
        return "";
    }

protected function formatResult()
    {
        $html = '<ul id="found">';
        if (count($this->result) > 0) {
            foreach ($this->result as $result) {
                $html .= '<li><a href="' . $result->url . '">' . $result->title . '</a></li>';
            }
        } else {
            $html .= __('Nothing found');
        }
        $html .= '</ul></div>';
        return $html;
    }

The `formatResult()` function is simple to understand and it is here that we see the `ul#found` tag appear which, remember, is deleted by the _hyperscript_ line of the form.

_="on click from elsewhere remove #found"


Actually, I've commented out this line in the production code so far. Instead of clicking anywhere, it would be better to add a background behind the form when doing a search in order to focus the `click` event. But again, all strategies are good! 

In the current code, it is not necessary to add CSS to display the result. It is placed in an empty `#searchResult` tag, so it is invisible at first. As soon as it is filled by the search result, everything becomes accessible, the CSS being targeted on the `ul#found` list and not on its parent.

Conclusion

The purpose of this article was to demonstrate htmx and the possibilities that this library allows. There is already an excellent search module for ProcessWire, SearchEngine, which can coexist very well with the code described here.

  • Like 13
Link to comment
Share on other sites

9 minutes ago, Spiria said:

Hello, I would really appreciate your comments on this blog post I am preparing for our site.

Very nice!

In the context of this post in the PW forums I find myself looking for a link to your site so I can see the search in action. Maybe you could add a link?

A tip: str_replace() accepts an array for the $search argument, so your stripText() function could be made more compact where it is removing multiple different characters.

  • Like 3
Link to comment
Share on other sites

  • 1 year later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...