Jonathan Dart Posted March 27, 2014 Share Posted March 27, 2014 (edited) ElasticSearch does a lot, but the part that is most interesting to me is that it does an amazing job of fulltext search. It's also crazy fast. It can be a bit scary at first so hopefully this module will make it more accessible. I threw together this module pretty quickly, it's more of a proof of concept than anything else at this point. I tried it out on a site with 400 bilingual pages and the search results are much improved over the normal search you would get doing like queries or fulltext queries in mysql. Github page: https://github.com/jdart/ElasticSearchProcessWire I'd love to hear some feedback on how it works for you. It's very new so expect bugs, in particular the mechanism that turns pages into data to be indexed by ES might have some surprises. Edited April 18, 2015 by Nico Knoll Added the "module" tag. 21 Link to comment Share on other sites More sharing options...
adrian Posted March 27, 2014 Share Posted March 27, 2014 I obviously need to read up on ElasticSearch some more, but this sounds pretty cool - thanks! 1 Link to comment Share on other sites More sharing options...
MindFull Posted March 29, 2014 Share Posted March 29, 2014 Awesome! I'm going to give this a try as I was just thinking about incorporating ElasticSearch into an app I'm putting together. I'll let you know how it works out. 1 Link to comment Share on other sites More sharing options...
Jonathan Dart Posted March 29, 2014 Author Share Posted March 29, 2014 I've updated the module to support typical pw style pagination: $search_results = $modules->get('ElasticSearch')->search('foo bar', $results_per_page); echo "Total results: " . $search_results->getTotal(); echo $search_results->renderPagination(); 4 Link to comment Share on other sites More sharing options...
felix Posted April 1, 2014 Share Posted April 1, 2014 Wow. "As usual" this post was made at the right time. We're currently building two projects that need some advanced search mechanisms. We thought about using SolR or Elasticsearch. I would propably have gone with SolR as we've used it in other projects before. This module will make our decision a lot easier Link to comment Share on other sites More sharing options...
Jonathan Dart Posted April 2, 2014 Author Share Posted April 2, 2014 FWIW ElasticSearch is a layer on top of Apache Solr. I've tried using Solr and it's much harder to use. ElasticSearch is like Solr + magic. Edit: Both Solr and ElasticSearch are built on the Lucene search engine Link to comment Share on other sites More sharing options...
marco Posted April 11, 2014 Share Posted April 11, 2014 Hi Jonathan, I just installed the module and tried to perform the initial indexing. But I got the following nesting-level-exceeded error (using win/php5.4.6) ( ! ) Fatal error: Maximum function nesting level of '400' reached, aborting! in ...\wire\core\Template.php on line 206Call Stack# Time Memory Function Location1 0.0013 164624 {main}( ) ..\index.php:02 0.2451 12483568 ProcessPageView->execute( ) ..\index.php:1953 0.2451 12483680 Wire->__call( ) ..\index.php:1954 0.2451 12483680 Wire->runHooks( ) ..\Wire.php:3175 0.2452 12485280 call_user_func_array ( ) ..\Wire.php:3596 0.2452 12485376 ProcessPageView->___execute( ) ..\Wire.php:3597 0.2555 12579304 Page->render( ) ..\ProcessPageView.module:1678 0.2555 12579416 Wire->__call( ) ..\ProcessPageView.module:1679 0.2555 12579416 Wire->runHooks( ) ..\Wire.php:31710 0.3390 13117808 ElasticSearch->checkForRebuildSearchData( ) ..\Wire.php:38111 0.4550 13572320 ElasticSearch->updatePageContentInElasticSearch( ) ..\ElasticSearch.module:12712 0.4574 13586960 ElasticSearch->getAllContentForPage( ) ..\ElasticSearch.module:20913 0.4684 13640000 ElasticSearch->getRepeaterTypeAsContent( ) ..\ElasticSearch.module:14914 0.4684 13640880 ElasticSearch->getAllContentForPage( ) ..\ElasticSearch.module:19815 0.4759 13942032 ElasticSearch->getPageTypeAsContent( ) ..\ElasticSearch.module:14716 0.4759 13942048 ElasticSearch->getAllContentForPage( ) ..\ElasticSearch.module:19017 0.4760 13943400 ElasticSearch->getRepeaterTypeAsContent( ) ..\ElasticSearch.module:14918 0.4760 13944240 ElasticSearch->getAllContentForPage( ) ..\ElasticSearch.module:19819 0.4761 13945496 ElasticSearch->getPageTypeAsContent( ) ..\ElasticSearch.module:14720 0.4761 13945496 ElasticSearch->getAllContentForPage( ) ..\ElasticSearch.module:19021 0.4762 13946848 ElasticSearch->getRepeaterTypeAsContent( ) ..\ElasticSearch.module:14922 0.4762 13947688 ElasticSearch->getAllContentForPage( ) ..\ElasticSearch.module:19823 0.4764 13948944 ElasticSearch->getPageTypeAsContent( ) ..\ElasticSearch.module:14724 0.4764 13948944 ElasticSearch->getAllContentForPage( ) ..\ElasticSearch.module:19025 0.4764 13950296 ElasticSearch->getRepeaterTypeAsContent( ) ..\ElasticSearch.module:14926 0.4764 13951136 ElasticSearch->getAllContentForPage( ) ..\ElasticSearch.module:19827 0.4766 13952400 ElasticSearch->getPageTypeAsContent( ) ..\ElasticSearch.module:147 .... I think there must be a problem with recursions of page and/or repeater fields. Did you experience something like this? Is there a patch for the module that prevents this recursion type effects? regards, Marco Link to comment Share on other sites More sharing options...
netcarver Posted April 11, 2014 Share Posted April 11, 2014 @marco Do you have xdebug installed on that machine? If so, check out the documenation on how to increase the max nesting level. Link to comment Share on other sites More sharing options...
marco Posted April 11, 2014 Share Posted April 11, 2014 Yes, I'm running xdebug (and already increased the nesting level to 400). But as the error message shows, there is an endless loop in function calls. So increasing the nesting level won't help. Link to comment Share on other sites More sharing options...
SiNNuT Posted April 11, 2014 Share Posted April 11, 2014 You could be right but just for fun, have you tried setting it to let's say 1000 or even disabling xdebug and see if it runs? Link to comment Share on other sites More sharing options...
marco Posted April 11, 2014 Share Posted April 11, 2014 Nesting level of 1000 didn't help. Deactivating the xdebug extension led to an memory exhaustion error (as expected). This is an endless recursion problem (I think) and therefore cannot be solved by any type of php confoiguration. A possible solution could be to limit indexing the actual text fields, especially ignoring fields that reference other pages to prevent circular references. 2 Link to comment Share on other sites More sharing options...
Jonathan Dart Posted April 15, 2014 Author Share Posted April 15, 2014 Hi Marco, I'm not sure what might be the issue, I'll check it out asap Nesting level of 1000 didn't help. Deactivating the xdebug extension led to an memory exhaustion error (as expected). This is an endless recursion problem (I think) and therefore cannot be solved by any type of php confoiguration. A possible solution could be to limit indexing the actual text fields, especially ignoring fields that reference other pages to prevent circular references. Link to comment Share on other sites More sharing options...
Jonathan Dart Posted April 15, 2014 Author Share Posted April 15, 2014 Hi Marco, In ElasticSearch.module can you try changing the below function (around line 190): protected function getPageTypeAsContent($value) { return $this->getAllContentForPage($value); } to: protected function getPageTypeAsContent($value) { return $value->title; } Let me know if that gets rid of the nesting issue, and if search results are affected. Thanks Nesting level of 1000 didn't help. Deactivating the xdebug extension led to an memory exhaustion error (as expected). This is an endless recursion problem (I think) and therefore cannot be solved by any type of php confoiguration. A possible solution could be to limit indexing the actual text fields, especially ignoring fields that reference other pages to prevent circular references. 1 Link to comment Share on other sites More sharing options...
marco Posted April 15, 2014 Share Posted April 15, 2014 Hi Jonathan, I added your little patch, and it helped preventing the recursion problems. The site content has been indexed. 1 Link to comment Share on other sites More sharing options...
Aamir Mughal Posted May 21, 2014 Share Posted May 21, 2014 Hi Jonathan, No doubts, its a good module, as I was looking for something same. But I am facing an issue with pagination while using the results from ElasticSearch module. It always highlights the First Page on Pagination, otherwise records are displaying perfectly right as those should be. For example, if I go to Page 3, using the pagination, Search Results are appearing for Page 3 but "Page 1" is still highlighted on pagination. This is how I have rendered the pager. echo $search_results->renderPager(); Any help in this regard will be much appreciated. Thanks. Link to comment Share on other sites More sharing options...
Aamir Mughal Posted May 21, 2014 Share Posted May 21, 2014 Okay, so i figured it out, basically there was need to set the "Start" in the PageArray and that was missing in this module. I have added the following code $pages->setStart($from); right after the $pages->setLimit($size); at line # 372 in ElasticSearch.module file, and this fixed my issue. 2 Link to comment Share on other sites More sharing options...
MuchDev Posted September 19, 2014 Share Posted September 19, 2014 Out of curiosity has anyone tested this with 2.5? I am wondering if there is an issue with the module or my configuration, im not seeming to get any results when I index. Link to comment Share on other sites More sharing options...
nghi Posted January 30, 2015 Share Posted January 30, 2015 I'm using it in 2.5 and so is my co-worker.what gets me when setting up the config. input ip port ->click submit once page reloads then click index all pages.Though, we recently found some bugs with it including hidden pages but its working fine with some alternations.basic use create a search page /search/?q=test <?php if ($q = $sanitizer->selectorValue($input->get->q)) { $input->whitelist('q', $q); $matches = $modules->get("ElasticSearch")->search($q, 25); foreach($matches as $key => $match) { if ($match->isHidden()) $matches->remove($key); } } ?php> <?php if ( ! $q): ?> Type something. <?php elseif ($matches->count()): ?> <?php foreach ($matches as $m): ?> <a href='<?php echo $m->url ?>'><?php echo $m->title ?></a> <?php endforeach ?> <?php else: ?> no results found <?php endif ?> Link to comment Share on other sites More sharing options...
sakkoulas Posted March 17, 2015 Share Posted March 17, 2015 Hello Jonathan is it possible search text inside attachments thank you Link to comment Share on other sites More sharing options...
adrian Posted March 17, 2015 Share Posted March 17, 2015 Not sure about elastic search, but this module (http://modules.processwire.com/modules/indexer/) allows you to index content inside doc and pdf files. Does that help at all? 1 Link to comment Share on other sites More sharing options...
sakkoulas Posted March 21, 2015 Share Posted March 21, 2015 hello adrian i was looking for this, but i couldn't find it. Thank you Link to comment Share on other sites More sharing options...
SteveB Posted April 17, 2015 Share Posted April 17, 2015 I'm trying this module out and could use some troubleshooting tips. Java and Elastic Search are installed. I'm forwarding port 9200 through to the virtual machine. I ran "sudo /etc/init.d/elasticsearch start" and if I try to access the site's domain using port 9200 I do get a response: {"status": 200,"name": "Conquest","cluster_name": "elasticsearch","version":{"number": "1.5.1","build_hash": "5e38401bc4e4388537a615569ac60925788e1cf4","build_timestamp": "2015-04-09T13:41:35Z","build_snapshot": false,"lucene_version": "4.10.4"},"tagline": "You Know, for Search"} I went with the default module settings for host and port and chose a template which has just 10 pages. When I click to index all pages I get this error: Error: Maximum execution time of 30 seconds exceeded (line 617 of /web/elastic/wire/core/Page.php) I'd think 30 seconds would be quite adequate for 10 pages so I'm wondering what I can do to diagnose the problem. Tried it with the max execution time at 60sec and it timed out again. Error: Maximum execution time of 60 seconds exceeded (line 622 of /web/elastic/wire/core/Page.php) FYI: I'm using the dev branch (2.5.26) running Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-39-generic x86_64) in a virtual machine on my PC. Thanks! Link to comment Share on other sites More sharing options...
SteveB Posted April 19, 2015 Share Posted April 19, 2015 Elastic Search itself was okay. Here's what I found. Timeout while indexing: The module's code for indexing all pages does a find and I'd assumed it would make use of the template whitelist value from module configuration but it didn't. It finds lots of pages, then skips the ones which should not be indexed. I have thousands of simple pages (containers for images) which don't need to be found by this selector. Now I'm using the whitelist to build a more specific selector. May have to break this up into multiple finds when I have more content. In checkForRebuildSearchData() $arr = $this->getAllowedTemplates(); $str = (count($arr)) ? ' template='.implode('|', $arr).',' : ''; $pages = $this->pages->find("id!=2, id!=7, has_parent!=2, has_parent!=7, template!=admin,$str include=all"); The other thing that became obvious pretty quickly is that the Textareas (with an s) fieldtype was not handled. Adding a function and a line to use it in getAllContentForPage() took care of that. protected function getTextareasTypeAsContent($value) { $values = array(); foreach ($value as $name=>$value) { $values[$name] = $value; } return $values; } ... elseif ($type instanceof FieldtypeTextareas) $value = $this->getTextareasTypeAsContent($value); I've confirmed that it is picking up changes when I edit pages. Too early for opinions on effectiveness of Elastic Search itself. 2 Link to comment Share on other sites More sharing options...
mn-martin Posted August 3, 2017 Share Posted August 3, 2017 Hello @Jonathan Dart, is this plugin still maintained? I don't see support for ProcessWire 3.x? Link to comment Share on other sites More sharing options...
asbjorn Posted May 20, 2020 Share Posted May 20, 2020 I was looking at this plugin today, and I have the same question as @mn-martin three years ago: Is it still maintained @Jonathan Dart? Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now