Alessio Dal Bianco Posted July 22, 2013 Author Share Posted July 22, 2013 This is perhaps an obvious feature creep suggestion but you might want to add the option to remove "noise words" like a, the, of, etc. There are lists on the net. These are words that are too common to be meaningful keywords. You are right. But there is this option: It is not exactly what you explained, but usually a word with more of 3-4 chars can be considerated a real keyword. Link to comment Share on other sites More sharing options...
teppo Posted July 22, 2013 Share Posted July 22, 2013 This is perhaps an obvious feature creep suggestion but you might want to add the option to remove "noise words" like a, the, of, etc. There are lists on the net. These are words that are too common to be meaningful keywords. You probably mean stop words? Seems like a reasonable feature to me, especially if made configurable (as English stop words make very little sense / are sometimes even harmful for a site written in Finnish etc.) Just saying. 2 Link to comment Share on other sites More sharing options...
ryan Posted July 23, 2013 Share Posted July 23, 2013 _('text') is for gettext, but ProcessWire doesn't use gettext. I think that _() was actually meant to be a __('text') or a $this->_('text') ? Those are the ProcessWire translation functions, among others. Link to comment Share on other sites More sharing options...
Alessio Dal Bianco Posted July 23, 2013 Author Share Posted July 23, 2013 _('text') is for gettext, but ProcessWire doesn't use gettext. I think that _() was actually meant to be a __('text') or a $this->_('text') ? Those are the ProcessWire translation functions, among others. Ok, i've switch from _() to __() ! 1 Link to comment Share on other sites More sharing options...
ryan Posted July 25, 2013 Share Posted July 25, 2013 Ok, i've switch from _() to __() ! If the call is within a class that extends one of ProcessWire's (like Wire or WireData), it's actually better to use $this->_('your text'); as there is a little bit less overhead with that call than with a __('your text'); call. Link to comment Share on other sites More sharing options...
marco Posted September 20, 2013 Share Posted September 20, 2013 I had to commend out a line in the Indexer.module file (v0.5.1, line 246, getKeywords()) that strips numbers from indexed text. I had to do this, because my site contains product names using numbers (e.g. "serviceFLAT360") that weren't be found. Perhaps you could introduce a config option for the module to enable/disable number stripping. Link to comment Share on other sites More sharing options...
Alessio Dal Bianco Posted September 20, 2013 Author Share Posted September 20, 2013 Hi marco, i'm working on a new version because i'm facing your same problem. It will be released soon! Link to comment Share on other sites More sharing options...
marco Posted September 20, 2013 Share Posted September 20, 2013 Great! I'm looking forawrd to it. Link to comment Share on other sites More sharing options...
Alessio Dal Bianco Posted September 26, 2013 Author Share Posted September 26, 2013 Hi all, New changes / features under the hood, check it out! http://modules.processwire.com/modules/indexer/ 2 Link to comment Share on other sites More sharing options...
Jeroen Diderik Posted November 24, 2013 Share Posted November 24, 2013 Hey Alessio, Love the module, works great. I missed one thing though... Page fields. I use Page fields regularly to make for example references to Genres, Categories, Countries etc. So I added some code to also add the pagenames of the pages in Page fields. I created a pull reguest on Github to add this change to your code. For those wanting to try this out, replace the extractTextFromField function in Indexer.module with this one or just add the elseif() part (start line 372) to it: public function extractTextFromField($f, $p){ if( preg_match('/text|title|url/i', $f->type) && $p->editable($f->name) && $f->name != self::fieldName ): $stripped = strip_tags($p->get($f->name)); return ' '.$stripped; elseif( preg_match('/page/i', $f->type) && $p->editable($f->name) && $f->name != self::fieldName ): $stripped = ""; $f_ref = $p->get($f->name); if($f_ref instanceOf PageArray){ foreach($f_ref as $fp){ $stripped .= ' '.strip_tags($fp->name); } }else{ $stripped .= ' '.strip_tags($f_ref->name); } return $stripped; endif; } 1 Link to comment Share on other sites More sharing options...
Alessio Dal Bianco Posted November 26, 2013 Author Share Posted November 26, 2013 Hi jdiderick, Thank you for the addition! I will release a new version soon with your code and some other improvements. ADB 2 Link to comment Share on other sites More sharing options...
Alessio Dal Bianco Posted December 17, 2013 Author Share Posted December 17, 2013 I all, i have update the module. Some fixes for repeater fields plus the addition of Diderik (Thank you!) ADB 3 Link to comment Share on other sites More sharing options...
ceberlin Posted March 30, 2014 Share Posted March 30, 2014 HI Allesio, I am right now evaluating the module for a project. This project has languages. My questions: RIght now the stopwords list is hardcoded into the modules folder. In case I add or change anything there, it would be lost after an update of the module. Correct? Isn't assets a better place for the stopwords? - Or the database? What do I do if I am in a multilingual environment. How can I set stopwords per language? Can I at all? Link to comment Share on other sites More sharing options...
thetuningspoon Posted November 7, 2014 Share Posted November 7, 2014 Hi Alessio, thank you for your work on this module. It looks like it's just what I need on an upcoming project, and accomplishes it in the same way I was contemplating doing it. One question: Will this index Excel files as well as PDFs and Word files? Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now