Jump to content

ElasticSearch for ProcessWire


Jonathan Dart
 Share

Recommended Posts

ElasticSearch does a lot, but the part that is most interesting to me is that it does an amazing job of fulltext search. It's also crazy fast. It can be a bit scary at first so hopefully this module will make it more accessible.

I threw together this module pretty quickly, it's more of a proof of concept than anything else at this point. I tried it out on a site with 400 bilingual pages and the search results are much improved over the normal search you would get doing like queries or fulltext queries in mysql.

Github page: https://github.com/jdart/ElasticSearchProcessWire

I'd love to hear some feedback on how it works for you. 

It's very new so expect bugs, in particular the mechanism that turns pages into data to be indexed by ES might have some surprises.

Edited by Nico Knoll
Added the "module" tag.
  • Like 21
Link to comment
Share on other sites

Wow. "As usual" this post was made at the right time. We're currently building two projects that need some advanced search mechanisms. We thought about using SolR or Elasticsearch. I would propably have gone with SolR as we've used it in other projects before. This module will make our decision a lot easier :D

Link to comment
Share on other sites

  • 2 weeks later...

Hi Jonathan,

I just installed the module and tried to perform the initial indexing. But I got the following nesting-level-exceeded error (using win/php5.4.6)

( ! ) Fatal error: Maximum function nesting level of '400' reached, aborting! in ...\wire\core\Template.php on line 206
Call Stack
#    Time    Memory    Function    Location
1    0.0013    164624    {main}( )    ..\index.php:0
2    0.2451    12483568    ProcessPageView->execute( )    ..\index.php:195
3    0.2451    12483680    Wire->__call( )    ..\index.php:195
4    0.2451    12483680    Wire->runHooks( )    ..\Wire.php:317
5    0.2452    12485280    call_user_func_array ( )    ..\Wire.php:359
6    0.2452    12485376    ProcessPageView->___execute( )    ..\Wire.php:359
7    0.2555    12579304    Page->render( )    ..\ProcessPageView.module:167
8    0.2555    12579416    Wire->__call( )    ..\ProcessPageView.module:167
9    0.2555    12579416    Wire->runHooks( )    ..\Wire.php:317
10    0.3390    13117808    ElasticSearch->checkForRebuildSearchData( )    ..\Wire.php:381
11    0.4550    13572320    ElasticSearch->updatePageContentInElasticSearch( )    ..\ElasticSearch.module:127
12    0.4574    13586960    ElasticSearch->getAllContentForPage( )    ..\ElasticSearch.module:209
13    0.4684    13640000    ElasticSearch->getRepeaterTypeAsContent( )    ..\ElasticSearch.module:149
14    0.4684    13640880    ElasticSearch->getAllContentForPage( )    ..\ElasticSearch.module:198
15    0.4759    13942032    ElasticSearch->getPageTypeAsContent( )    ..\ElasticSearch.module:147
16    0.4759    13942048    ElasticSearch->getAllContentForPage( )    ..\ElasticSearch.module:190
17    0.4760    13943400    ElasticSearch->getRepeaterTypeAsContent( )    ..\ElasticSearch.module:149
18    0.4760    13944240    ElasticSearch->getAllContentForPage( )    ..\ElasticSearch.module:198
19    0.4761    13945496    ElasticSearch->getPageTypeAsContent( )    ..\ElasticSearch.module:147
20    0.4761    13945496    ElasticSearch->getAllContentForPage( )    ..\ElasticSearch.module:190
21    0.4762    13946848    ElasticSearch->getRepeaterTypeAsContent( )    ..\ElasticSearch.module:149
22    0.4762    13947688    ElasticSearch->getAllContentForPage( )    ..\ElasticSearch.module:198
23    0.4764    13948944    ElasticSearch->getPageTypeAsContent( )    ..\ElasticSearch.module:147
24    0.4764    13948944    ElasticSearch->getAllContentForPage( )    ..\ElasticSearch.module:190
25    0.4764    13950296    ElasticSearch->getRepeaterTypeAsContent( )    ..\ElasticSearch.module:149
26    0.4764    13951136    ElasticSearch->getAllContentForPage( )    ..\ElasticSearch.module:198
27    0.4766    13952400    ElasticSearch->getPageTypeAsContent( )    ..\ElasticSearch.module:147

....

I think there must be a problem with recursions of page and/or repeater fields. Did you experience something like this? Is there a patch for the module that prevents this recursion type effects?

regards,

Marco

Link to comment
Share on other sites

Yes, I'm running xdebug (and already increased the nesting level to 400). But as the error message shows, there is an endless loop in function calls. So increasing the nesting level won't help.

Link to comment
Share on other sites

Nesting level of 1000 didn't help. Deactivating the xdebug extension led to an memory exhaustion error (as expected).

This is an endless recursion problem (I think) and therefore cannot be solved by any type of php confoiguration.

A possible solution could be to limit indexing the actual text fields, especially ignoring fields that reference other pages to prevent circular references.

  • Like 2
Link to comment
Share on other sites

Hi Marco, I'm not sure what might be the issue, I'll check it out asap

Nesting level of 1000 didn't help. Deactivating the xdebug extension led to an memory exhaustion error (as expected).

This is an endless recursion problem (I think) and therefore cannot be solved by any type of php confoiguration.

A possible solution could be to limit indexing the actual text fields, especially ignoring fields that reference other pages to prevent circular references.

Link to comment
Share on other sites

Hi Marco,

In ElasticSearch.module can you try changing the below function (around line 190):

protected function getPageTypeAsContent($value) {
    return $this->getAllContentForPage($value);
}

to:

protected function getPageTypeAsContent($value) {
    return $value->title;
}

Let me know if that gets rid of the nesting issue, and if search results are affected.

Thanks

Nesting level of 1000 didn't help. Deactivating the xdebug extension led to an memory exhaustion error (as expected).

This is an endless recursion problem (I think) and therefore cannot be solved by any type of php confoiguration.

A possible solution could be to limit indexing the actual text fields, especially ignoring fields that reference other pages to prevent circular references.

  • Like 1
Link to comment
Share on other sites

  • 1 month later...

Hi Jonathan, 

No doubts, its a good module, as I was looking for something same. But I am facing an issue with pagination while using the results from ElasticSearch module. It always highlights the First Page on Pagination, otherwise records are displaying perfectly right as those should be. For example, if I go to Page 3, using the pagination, Search Results are appearing for Page 3 but "Page 1" is still highlighted on pagination. This is how I have rendered the pager.

echo $search_results->renderPager();

Any help in this regard will be much appreciated.

Thanks.

Link to comment
Share on other sites

Okay, so i figured it out, basically there was need to set the "Start" in the PageArray and that was missing in this module. I have added the following code

$pages->setStart($from);

right after the

$pages->setLimit($size);

at line # 372 in ElasticSearch.module file, and this fixed my issue.

  • Like 2
Link to comment
Share on other sites

  • 3 months later...
  • 4 months later...

I'm using it in 2.5 and so is my co-worker.

what gets me when setting up the config. 
input ip port ->click submit

once page reloads then click index all pages.

Though, we recently found some bugs with it including hidden pages but its working fine with some alternations.

basic use create a search page 

/search/?q=test

<?php if ($q = $sanitizer->selectorValue($input->get->q)) {
 $input->whitelist('q', $q);
 $matches = $modules->get("ElasticSearch")->search($q, 25); 
 foreach($matches as $key => $match) 
 { 
  if ($match->isHidden())
   $matches->remove($key); 		
 }
}
?php>

<?php if ( ! $q): ?>
Type something.
<?php elseif ($matches->count()): ?>

 <?php foreach ($matches as $m): ?>
  <a href='<?php echo $m->url ?>'><?php echo $m->title ?></a>
 <?php endforeach ?>

<?php else: ?>
  no results found
<?php endif ?>

Link to comment
Share on other sites

  • 1 month later...
  • 4 weeks later...

I'm trying this module out and could use some troubleshooting tips.

Java and Elastic Search are installed. I'm forwarding port 9200 through to the virtual machine. I ran "sudo /etc/init.d/elasticsearch start" and if I try to access the site's domain using port 9200 I do get a response:

{"status": 200,
"name": "Conquest",
"cluster_name": "elasticsearch",
"version":
{"number": "1.5.1",
"build_hash": "5e38401bc4e4388537a615569ac60925788e1cf4",
"build_timestamp": "2015-04-09T13:41:35Z",
"build_snapshot": false,
"lucene_version": "4.10.4"},
"tagline": "You Know, for Search"}

I went with the default module settings for host and port and chose a template which has just 10 pages. When I click to index all pages I get this error:

Error: Maximum execution time of 30 seconds exceeded (line 617 of /web/elastic/wire/core/Page.php)

I'd think 30 seconds would be quite adequate for 10 pages so I'm wondering what I can do to diagnose the problem.

Tried it with the max execution time at 60sec and it timed out again.

Error: Maximum execution time of 60 seconds exceeded (line 622 of /web/elastic/wire/core/Page.php)

FYI: I'm using the dev branch (2.5.26) running Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-39-generic x86_64) in a virtual machine on my PC.

Thanks!

Link to comment
Share on other sites

Elastic Search itself was okay. Here's what I found.

Timeout while indexing:

The module's code for indexing all pages does a find and I'd assumed it would make use of the template whitelist value from module configuration but it didn't. It finds lots of pages, then skips the ones which should not be indexed. I have thousands of simple pages (containers for images) which don't need to be found by this selector. Now I'm using the whitelist to build a more specific selector. May have to break this up into multiple finds when I have more content.

In checkForRebuildSearchData()

		$arr = $this->getAllowedTemplates();
		$str = (count($arr)) ? ' template='.implode('|', $arr).',' : '';
		$pages = $this->pages->find("id!=2, id!=7, has_parent!=2, has_parent!=7, template!=admin,$str include=all");

The other thing that became obvious pretty quickly is that the Textareas (with an s) fieldtype was not handled. Adding a function and a line to use it in getAllContentForPage() took care of that.

    protected function getTextareasTypeAsContent($value)    {
        $values = array();
        foreach ($value as $name=>$value) {
            $values[$name] = $value;
        }
        return $values;
    }    

...

			elseif ($type instanceof FieldtypeTextareas)
				$value = $this->getTextareasTypeAsContent($value);

I've confirmed that it is picking up changes when I edit pages. Too early for opinions on effectiveness of Elastic Search itself.

  • Like 2
Link to comment
Share on other sites

  • 2 years later...
  • 2 years later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Similar Content

    • By monollonom
      (once again I was surprised to see a work of mine pop up in the newsletter, this time without even listing the module on PW modules website 😅. Thx @teppo !)
      FieldtypeQRCode
      Github: https://github.com/romaincazier/FieldtypeQRCode
      Modules directory: https://processwire.com/modules/fieldtype-qrcode/
      A simple fieldtype generating a QR Code from the public URL of the page, and more.
      Using the PHP library QR Code Generator by Kazuhiko Arase.

      Options
      In the field’s Details tab you can change between .gif or .svg formats. If you select .svg you will have the option to directly output the markup instead of a base64 image. SVG is the default.
      You can also change what is used to generate the QR code and even have several sources. The accepted sources (separated by a comma) are: httpUrl, editUrl, or the name of any text/URL/file/image field.
      If LanguageSupport is installed the compatible sources (httpUrl, text field, ...) will return as many QR codes as there are languages. Note however that when outputting on the front-end, only the languages visible to the user will be generated.
      Formatting
      Unformatted value
      When using $page->getUnformatted("qrcode_field") it returns an array with the following structure:
      [ [ "label" => string, // label used in the admin "qr" => string, // the qrcode image "source" => string, // the source, as defined in the configuration "text" => string // and the text used to generate the qrcode ], ... ] Formatted value
      The formatted value is an <img>/<svg> (or several right next to each other). There is no other markup.
      Should you need the same markup as in the admin you could use:
      $field = $fields->get("qrcode_field"); $field->type->markupValue($page, $field, $page->getUnformatted("qrcode_field")); But it’s a bit cumbersome, plus you need to import the FieldtypeQRCode's css/js. Best is to make your own markup using the unformatted value.
      Static QR code generator
      You can call FieldtypeQRCode::generateQRCode to generate any QR code you want. Its arguments are:
      string $text bool $svg Generate the QR code as svg instead of gif ? (default=true) bool $markup If svg, output its markup instead of a base64 ? (default=false) Hooks
      Please have a look at the source code for more details about the hookable functions.
      Examples
      $wire->addHookAfter("FieldtypeQRCode::getQRText", function($event) { $page = $event->arguments("page"); $event->return = $page->title; // or could be: $event->return = "Your custom text"; }) $wire->addHookAfter("FieldtypeQRCode::generateQRCodes", function($event) { $qrcodes = $event->return; // keep everything except the QR codes generated from editUrl foreach($qrcodes as $key => &$qrcode) { if($qrcode["source"] === "editUrl") { unset($qrcodes[$key]); } } unset($qrcode); $event->return = $qrcodes; })
    • By Sebi
      AppApiFile adds the /file endpoint to the AppApi routes definition. Makes it possible to query files via the api. 
      This module relies on the base module AppApi, which must be installed before AppApiFile can do its work.
      Features
      You can access all files that are uploaded at any ProcessWire page. Call api/file/route/in/pagetree?file=test.jpg to access a page via its route in the page tree. Alternatively you can call api/file/4242?file=test.jpg (e.g.,) to access a page by its id. The module will make sure that the page is accessible by the active user.
      The GET-param "file" defines the basename of the file which you want to get.
      The following GET-params (optional) can be used to manipulate an image:
      width height maxwidth maxheight cropX cropY Use GET-Param format=base64 to receive the file in base64 format.
    • By MarkE
      This fieldtype and inputfield bundle was built for storing measurement values within a field, rendering them in a variety of formats and converting them to other units or otherwise modifying them via the API.
      The API consists of a number of predefined functions, some of which include...
      render() for rendering the measurement object, valueAs() for converting the value to another unit value, convertTo() for converting the whole measurement object to different units, and add() and subtract() for for modifying the stored value by the value (converted as required) in another measurement. In the admin the inputfield includes a checkbox (which can be optionally disabled) for converting values on page save. For an example if a value was typed in as centimeters, the unit was changed to metres, and the page saved with this checkbox selected, said value would be automatically converted so that e.g. 170 cm becomes 1.7 m.

      A simple length field using Fieldtype Measurement and Inputfield Measurement.
      Combination units (e.g. feet and inches) are also supported.
      Please note that this module is 'proof of concept' at the moment - there are limited units available and quite a lot of code tidying to do. More units will be added shortly.
      See the GitHub at https://github.com/MetaTunes/FieldtypeMeasurement for full details and updates.
    • By tcnet
      File Manager for ProcessWire is a module to manager files and folders from the CMS backend. It supports creating, deleting, renaming, packing, unpacking, uploading, downloading and editing of files and folders. The integrated code editor ACE supports highlighting of all common programming languages.
      https://github.com/techcnet/ProcessFileManager

      Warning
      This module is probably the most powerful module. You might destroy your processwire installation if you don't exactly know what you doing. Be careful and use it at your own risk!
      ACE code editor
      This module uses ACE code editor available from: https://github.com/ajaxorg/ace

      Dragscroll
      This module uses the JavaScript dragscroll available from: http://github.com/asvd/dragscroll. Dragscroll adds the ability to drag the table horizontally with the mouse pointer.
      PHP File Manager
      This module uses a modified version of PHP File Manager available from: https://github.com/alexantr/filemanager
       
    • By tcnet
      This module implements the website live chat service from tawk.to. Actually the module doesn't have to do much. It just need to inserted a few lines of JavaScript just before the closing body tag </body> on each side. However, the module offers additional options to display the widget only on certain pages.
      Create an account
      Visit https://www.tawk.to and create an account. It's free! At some point you will reach a page where you can copy the required JavaScript-code.

      Open the module settings and paste the JavaScript-code into the field as shown below. Click "Submit" and that's all.

      Open the module settings
      The settings for this module are located int the menu Modules=>Configure=>LiveChatTawkTo.

       
×
×
  • Create New...