Jump to content

ElasticSearch for ProcessWire


Jonathan Dart
 Share

Recommended Posts

ElasticSearch does a lot, but the part that is most interesting to me is that it does an amazing job of fulltext search. It's also crazy fast. It can be a bit scary at first so hopefully this module will make it more accessible.

I threw together this module pretty quickly, it's more of a proof of concept than anything else at this point. I tried it out on a site with 400 bilingual pages and the search results are much improved over the normal search you would get doing like queries or fulltext queries in mysql.

Github page: https://github.com/jdart/ElasticSearchProcessWire

I'd love to hear some feedback on how it works for you. 

It's very new so expect bugs, in particular the mechanism that turns pages into data to be indexed by ES might have some surprises.

Edited by Nico Knoll
Added the "module" tag.
  • Like 21
Link to comment
Share on other sites

Wow. "As usual" this post was made at the right time. We're currently building two projects that need some advanced search mechanisms. We thought about using SolR or Elasticsearch. I would propably have gone with SolR as we've used it in other projects before. This module will make our decision a lot easier :D

Link to comment
Share on other sites

  • 2 weeks later...

Hi Jonathan,

I just installed the module and tried to perform the initial indexing. But I got the following nesting-level-exceeded error (using win/php5.4.6)

( ! ) Fatal error: Maximum function nesting level of '400' reached, aborting! in ...\wire\core\Template.php on line 206
Call Stack
#    Time    Memory    Function    Location
1    0.0013    164624    {main}( )    ..\index.php:0
2    0.2451    12483568    ProcessPageView->execute( )    ..\index.php:195
3    0.2451    12483680    Wire->__call( )    ..\index.php:195
4    0.2451    12483680    Wire->runHooks( )    ..\Wire.php:317
5    0.2452    12485280    call_user_func_array ( )    ..\Wire.php:359
6    0.2452    12485376    ProcessPageView->___execute( )    ..\Wire.php:359
7    0.2555    12579304    Page->render( )    ..\ProcessPageView.module:167
8    0.2555    12579416    Wire->__call( )    ..\ProcessPageView.module:167
9    0.2555    12579416    Wire->runHooks( )    ..\Wire.php:317
10    0.3390    13117808    ElasticSearch->checkForRebuildSearchData( )    ..\Wire.php:381
11    0.4550    13572320    ElasticSearch->updatePageContentInElasticSearch( )    ..\ElasticSearch.module:127
12    0.4574    13586960    ElasticSearch->getAllContentForPage( )    ..\ElasticSearch.module:209
13    0.4684    13640000    ElasticSearch->getRepeaterTypeAsContent( )    ..\ElasticSearch.module:149
14    0.4684    13640880    ElasticSearch->getAllContentForPage( )    ..\ElasticSearch.module:198
15    0.4759    13942032    ElasticSearch->getPageTypeAsContent( )    ..\ElasticSearch.module:147
16    0.4759    13942048    ElasticSearch->getAllContentForPage( )    ..\ElasticSearch.module:190
17    0.4760    13943400    ElasticSearch->getRepeaterTypeAsContent( )    ..\ElasticSearch.module:149
18    0.4760    13944240    ElasticSearch->getAllContentForPage( )    ..\ElasticSearch.module:198
19    0.4761    13945496    ElasticSearch->getPageTypeAsContent( )    ..\ElasticSearch.module:147
20    0.4761    13945496    ElasticSearch->getAllContentForPage( )    ..\ElasticSearch.module:190
21    0.4762    13946848    ElasticSearch->getRepeaterTypeAsContent( )    ..\ElasticSearch.module:149
22    0.4762    13947688    ElasticSearch->getAllContentForPage( )    ..\ElasticSearch.module:198
23    0.4764    13948944    ElasticSearch->getPageTypeAsContent( )    ..\ElasticSearch.module:147
24    0.4764    13948944    ElasticSearch->getAllContentForPage( )    ..\ElasticSearch.module:190
25    0.4764    13950296    ElasticSearch->getRepeaterTypeAsContent( )    ..\ElasticSearch.module:149
26    0.4764    13951136    ElasticSearch->getAllContentForPage( )    ..\ElasticSearch.module:198
27    0.4766    13952400    ElasticSearch->getPageTypeAsContent( )    ..\ElasticSearch.module:147

....

I think there must be a problem with recursions of page and/or repeater fields. Did you experience something like this? Is there a patch for the module that prevents this recursion type effects?

regards,

Marco

Link to comment
Share on other sites

Yes, I'm running xdebug (and already increased the nesting level to 400). But as the error message shows, there is an endless loop in function calls. So increasing the nesting level won't help.

Link to comment
Share on other sites

Nesting level of 1000 didn't help. Deactivating the xdebug extension led to an memory exhaustion error (as expected).

This is an endless recursion problem (I think) and therefore cannot be solved by any type of php confoiguration.

A possible solution could be to limit indexing the actual text fields, especially ignoring fields that reference other pages to prevent circular references.

  • Like 2
Link to comment
Share on other sites

Hi Marco, I'm not sure what might be the issue, I'll check it out asap

Nesting level of 1000 didn't help. Deactivating the xdebug extension led to an memory exhaustion error (as expected).

This is an endless recursion problem (I think) and therefore cannot be solved by any type of php confoiguration.

A possible solution could be to limit indexing the actual text fields, especially ignoring fields that reference other pages to prevent circular references.

Link to comment
Share on other sites

Hi Marco,

In ElasticSearch.module can you try changing the below function (around line 190):

protected function getPageTypeAsContent($value) {
    return $this->getAllContentForPage($value);
}

to:

protected function getPageTypeAsContent($value) {
    return $value->title;
}

Let me know if that gets rid of the nesting issue, and if search results are affected.

Thanks

Nesting level of 1000 didn't help. Deactivating the xdebug extension led to an memory exhaustion error (as expected).

This is an endless recursion problem (I think) and therefore cannot be solved by any type of php confoiguration.

A possible solution could be to limit indexing the actual text fields, especially ignoring fields that reference other pages to prevent circular references.

  • Like 1
Link to comment
Share on other sites

  • 1 month later...

Hi Jonathan, 

No doubts, its a good module, as I was looking for something same. But I am facing an issue with pagination while using the results from ElasticSearch module. It always highlights the First Page on Pagination, otherwise records are displaying perfectly right as those should be. For example, if I go to Page 3, using the pagination, Search Results are appearing for Page 3 but "Page 1" is still highlighted on pagination. This is how I have rendered the pager.

echo $search_results->renderPager();

Any help in this regard will be much appreciated.

Thanks.

Link to comment
Share on other sites

Okay, so i figured it out, basically there was need to set the "Start" in the PageArray and that was missing in this module. I have added the following code

$pages->setStart($from);

right after the

$pages->setLimit($size);

at line # 372 in ElasticSearch.module file, and this fixed my issue.

  • Like 2
Link to comment
Share on other sites

  • 3 months later...
  • 4 months later...

I'm using it in 2.5 and so is my co-worker.

what gets me when setting up the config. 
input ip port ->click submit

once page reloads then click index all pages.

Though, we recently found some bugs with it including hidden pages but its working fine with some alternations.

basic use create a search page 

/search/?q=test

<?php if ($q = $sanitizer->selectorValue($input->get->q)) {
 $input->whitelist('q', $q);
 $matches = $modules->get("ElasticSearch")->search($q, 25); 
 foreach($matches as $key => $match) 
 { 
  if ($match->isHidden())
   $matches->remove($key); 		
 }
}
?php>

<?php if ( ! $q): ?>
Type something.
<?php elseif ($matches->count()): ?>

 <?php foreach ($matches as $m): ?>
  <a href='<?php echo $m->url ?>'><?php echo $m->title ?></a>
 <?php endforeach ?>

<?php else: ?>
  no results found
<?php endif ?>

Link to comment
Share on other sites

  • 1 month later...
  • 4 weeks later...

I'm trying this module out and could use some troubleshooting tips.

Java and Elastic Search are installed. I'm forwarding port 9200 through to the virtual machine. I ran "sudo /etc/init.d/elasticsearch start" and if I try to access the site's domain using port 9200 I do get a response:

{"status": 200,
"name": "Conquest",
"cluster_name": "elasticsearch",
"version":
{"number": "1.5.1",
"build_hash": "5e38401bc4e4388537a615569ac60925788e1cf4",
"build_timestamp": "2015-04-09T13:41:35Z",
"build_snapshot": false,
"lucene_version": "4.10.4"},
"tagline": "You Know, for Search"}

I went with the default module settings for host and port and chose a template which has just 10 pages. When I click to index all pages I get this error:

Error: Maximum execution time of 30 seconds exceeded (line 617 of /web/elastic/wire/core/Page.php)

I'd think 30 seconds would be quite adequate for 10 pages so I'm wondering what I can do to diagnose the problem.

Tried it with the max execution time at 60sec and it timed out again.

Error: Maximum execution time of 60 seconds exceeded (line 622 of /web/elastic/wire/core/Page.php)

FYI: I'm using the dev branch (2.5.26) running Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-39-generic x86_64) in a virtual machine on my PC.

Thanks!

Link to comment
Share on other sites

Elastic Search itself was okay. Here's what I found.

Timeout while indexing:

The module's code for indexing all pages does a find and I'd assumed it would make use of the template whitelist value from module configuration but it didn't. It finds lots of pages, then skips the ones which should not be indexed. I have thousands of simple pages (containers for images) which don't need to be found by this selector. Now I'm using the whitelist to build a more specific selector. May have to break this up into multiple finds when I have more content.

In checkForRebuildSearchData()

		$arr = $this->getAllowedTemplates();
		$str = (count($arr)) ? ' template='.implode('|', $arr).',' : '';
		$pages = $this->pages->find("id!=2, id!=7, has_parent!=2, has_parent!=7, template!=admin,$str include=all");

The other thing that became obvious pretty quickly is that the Textareas (with an s) fieldtype was not handled. Adding a function and a line to use it in getAllContentForPage() took care of that.

    protected function getTextareasTypeAsContent($value)    {
        $values = array();
        foreach ($value as $name=>$value) {
            $values[$name] = $value;
        }
        return $values;
    }    

...

			elseif ($type instanceof FieldtypeTextareas)
				$value = $this->getTextareasTypeAsContent($value);

I've confirmed that it is picking up changes when I edit pages. Too early for opinions on effectiveness of Elastic Search itself.

  • Like 2
Link to comment
Share on other sites

  • 2 years later...
  • 2 years later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...