Jump to content

Module: Site indexer

Recommended Posts

This is perhaps an obvious feature creep suggestion but you might want to add the option to remove "noise words" like a, the, of, etc. There are lists on the net. These are words that are too common to be meaningful keywords.

You are right. But there is this option:


It is not exactly what you explained, but usually a word with more of 3-4 chars can be considerated a real keyword.

Link to comment
Share on other sites

This is perhaps an obvious feature creep suggestion but you might want to add the option to remove "noise words" like a, the, of, etc. There are lists on the net. These are words that are too common to be meaningful keywords.

You probably mean stop words? Seems like a reasonable feature to me, especially if made configurable (as English stop words make very little sense / are sometimes even harmful for a site written in Finnish etc.)

Just saying.

  • Like 2
Link to comment
Share on other sites

_('text') is for gettext, but ProcessWire doesn't use gettext. I think that _() was actually meant to be a __('text') or a $this->_('text') ? Those are the ProcessWire translation functions, among others

Link to comment
Share on other sites

Ok, i've switch from _() to __() ! 

If the call is within a class that extends one of ProcessWire's (like Wire or WireData), it's actually better to use $this->_('your text'); as there is a little bit less overhead with that call than with a __('your text'); call. 
Link to comment
Share on other sites

  • 1 month later...

I had to commend out a line in the Indexer.module file (v0.5.1, line 246, getKeywords()) that strips numbers from indexed text. I had to do this, because my site contains product names using numbers (e.g. "serviceFLAT360") that weren't be found.

Perhaps you could introduce a config option for the module to enable/disable number stripping.

Link to comment
Share on other sites

  • 1 month later...

Hey Alessio,

Love the module, works great.

I missed one thing though... Page fields.

I use Page fields regularly to make for example references to Genres, Categories, Countries etc.

So I added some code to also add the pagenames of the pages in Page fields.

I created a pull reguest on Github to add this change to your code.

For those wanting to try this out, replace the extractTextFromField function in Indexer.module with this one or just add the elseif() part (start line 372) to it:

     public function extractTextFromField($f, $p){
        if( preg_match('/text|title|url/i', $f->type) && $p->editable($f->name) && $f->name != self::fieldName ):
            $stripped = strip_tags($p->get($f->name));
            return ' '.$stripped;
        elseif( preg_match('/page/i', $f->type) && $p->editable($f->name) && $f->name != self::fieldName ):
            $stripped = "";
            $f_ref = $p->get($f->name);
            if($f_ref instanceOf PageArray){
                foreach($f_ref as $fp){
                    $stripped .= ' '.strip_tags($fp->name);
                $stripped .= ' '.strip_tags($f_ref->name);
            return $stripped;

  • Like 1
Link to comment
Share on other sites

  • 3 weeks later...
  • 3 months later...

HI Allesio,

I am right now evaluating the module for a project. This project has languages.

My questions:

RIght now the stopwords list is hardcoded into the modules folder. In case I add or change anything there, it would be lost after an update of the module. Correct?

Isn't assets a better place for the stopwords? - Or the database?

What do I do if I am in a multilingual environment. How can I set stopwords per language? Can I at all?

Link to comment
Share on other sites

  • 7 months later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Similar Content

    • By monollonom
      (once again I was surprised to see a work of mine pop up in the newsletter, this time without even listing the module on PW modules website 😅. Thx @teppo !)
      Github: https://github.com/romaincazier/FieldtypeQRCode
      Modules directory: https://processwire.com/modules/fieldtype-qrcode/
      A simple fieldtype generating a QR Code from the public URL of the page, and more.
      Using the PHP library QR Code Generator by Kazuhiko Arase.

      In the field’s Details tab you can change between .gif or .svg formats. If you select .svg you will have the option to directly output the markup instead of a base64 image. SVG is the default.
      You can also change what is used to generate the QR code and even have several sources. The accepted sources (separated by a comma) are: httpUrl, editUrl, or the name of any text/URL/file/image field.
      If LanguageSupport is installed the compatible sources (httpUrl, text field, ...) will return as many QR codes as there are languages. Note however that when outputting on the front-end, only the languages visible to the user will be generated.
      Unformatted value
      When using $page->getUnformatted("qrcode_field") it returns an array with the following structure:
      [ [ "label" => string, // label used in the admin "qr" => string, // the qrcode image "source" => string, // the source, as defined in the configuration "text" => string // and the text used to generate the qrcode ], ... ] Formatted value
      The formatted value is an <img>/<svg> (or several right next to each other). There is no other markup.
      Should you need the same markup as in the admin you could use:
      $field = $fields->get("qrcode_field"); $field->type->markupValue($page, $field, $page->getUnformatted("qrcode_field")); But it’s a bit cumbersome, plus you need to import the FieldtypeQRCode's css/js. Best is to make your own markup using the unformatted value.
      Static QR code generator
      You can call FieldtypeQRCode::generateQRCode to generate any QR code you want. Its arguments are:
      string $text bool $svg Generate the QR code as svg instead of gif ? (default=true) bool $markup If svg, output its markup instead of a base64 ? (default=false) Hooks
      Please have a look at the source code for more details about the hookable functions.
      $wire->addHookAfter("FieldtypeQRCode::getQRText", function($event) { $page = $event->arguments("page"); $event->return = $page->title; // or could be: $event->return = "Your custom text"; }) $wire->addHookAfter("FieldtypeQRCode::generateQRCodes", function($event) { $qrcodes = $event->return; // keep everything except the QR codes generated from editUrl foreach($qrcodes as $key => &$qrcode) { if($qrcode["source"] === "editUrl") { unset($qrcodes[$key]); } } unset($qrcode); $event->return = $qrcodes; })
    • By Sebi
      AppApiFile adds the /file endpoint to the AppApi routes definition. Makes it possible to query files via the api. 
      This module relies on the base module AppApi, which must be installed before AppApiFile can do its work.
      You can access all files that are uploaded at any ProcessWire page. Call api/file/route/in/pagetree?file=test.jpg to access a page via its route in the page tree. Alternatively you can call api/file/4242?file=test.jpg (e.g.,) to access a page by its id. The module will make sure that the page is accessible by the active user.
      The GET-param "file" defines the basename of the file which you want to get.
      The following GET-params (optional) can be used to manipulate an image:
      width height maxwidth maxheight cropX cropY Use GET-Param format=base64 to receive the file in base64 format.
    • By MarkE
      This fieldtype and inputfield bundle was built for storing measurement values within a field, rendering them in a variety of formats and converting them to other units or otherwise modifying them via the API.
      The API consists of a number of predefined functions, some of which include...
      render() for rendering the measurement object, valueAs() for converting the value to another unit value, convertTo() for converting the whole measurement object to different units, and add() and subtract() for for modifying the stored value by the value (converted as required) in another measurement. In the admin the inputfield includes a checkbox (which can be optionally disabled) for converting values on page save. For an example if a value was typed in as centimeters, the unit was changed to metres, and the page saved with this checkbox selected, said value would be automatically converted so that e.g. 170 cm becomes 1.7 m.

      A simple length field using Fieldtype Measurement and Inputfield Measurement.
      Combination units (e.g. feet and inches) are also supported.
      Please note that this module is 'proof of concept' at the moment - there are limited units available and quite a lot of code tidying to do. More units will be added shortly.
      See the GitHub at https://github.com/MetaTunes/FieldtypeMeasurement for full details and updates.
    • By tcnet
      File Manager for ProcessWire is a module to manager files and folders from the CMS backend. It supports creating, deleting, renaming, packing, unpacking, uploading, downloading and editing of files and folders. The integrated code editor ACE supports highlighting of all common programming languages.

      This module is probably the most powerful module. You might destroy your processwire installation if you don't exactly know what you doing. Be careful and use it at your own risk!
      ACE code editor
      This module uses ACE code editor available from: https://github.com/ajaxorg/ace

      This module uses the JavaScript dragscroll available from: http://github.com/asvd/dragscroll. Dragscroll adds the ability to drag the table horizontally with the mouse pointer.
      PHP File Manager
      This module uses a modified version of PHP File Manager available from: https://github.com/alexantr/filemanager
    • By tcnet
      This module implements the website live chat service from tawk.to. Actually the module doesn't have to do much. It just need to inserted a few lines of JavaScript just before the closing body tag </body> on each side. However, the module offers additional options to display the widget only on certain pages.
      Create an account
      Visit https://www.tawk.to and create an account. It's free! At some point you will reach a page where you can copy the required JavaScript-code.

      Open the module settings and paste the JavaScript-code into the field as shown below. Click "Submit" and that's all.

      Open the module settings
      The settings for this module are located int the menu Modules=>Configure=>LiveChatTawkTo.

  • Create New...