Jump to content

SearchEngine File Indexer add-on


teppo
 Share

Recommended Posts

This module is an optional (and still somewhat experimental) add-on for SearchEngine. It adds support for indexing file contents, replacing earlier SearchEngine PDF indexer module.

Features

SearchEngine by itself will only store the name, description, tags, and custom field values for file/image fields. This module, on the other hand, attempts to extract human-readable text from the file itself.

As for file types, at least in theory this module supports any filetype that can be reasonably converted to text. It has built-in support (mostly via third party libraries) for...

  • office documents (.doc, .docx, .rtf, .odf),
  • pdf documents (.pdf),
  • spreadsheets (.xls, .xlsx, .ods, .csv) and
  • plain text (.txt).

The module also ships with a FileIndexer base class and exposes the SearchEngineFileIndexer::addFileIndexer() method for introducing indexers for file types that are not yet supported.

Links

Getting started

  1. install and configure SearchEngine (version 0.34.0 or later),
  2. install SearchEngine File Indexer,
  3. install third party dependencies — if you installed SearchEngineFileIndexer via Composer you should already have these available, otherwise you'll need to run "composer install" in the SearchEngineFileIndexer module directory,
  4. choose which file indexers you'd like to enable.

The rest should happen automagically behind the scenes.

Additional notes

The important thing to note here is that we're going to rely on third party libraries to handle parsing (most) files, and things can still go wrong, so please consider this a beta release. It did work in my early tests, but there's little guarantee that it will work in real life use cases. Just to be safe it is recommended to back up your site before installing and enabling this module.

Another thing to keep in mind is that indexing files can be resource intensive and take plenty of time. As such, this module provides some settings for limiting files by size etc. Regardless, this is something that likely needs further consideration in the future; some future version of this module, or an additional add-on module, may e.g. add support for indexing pages/files "lazily" in the background.

  • Like 2
  • Thanks 1
Link to comment
Share on other sites

@teppo, it looks like this is precisely the module I was going to begin searching for on Monday. I’m wildly excited that you’re doing this, though I understand your warnings and cautions. My fingers are crossed!

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...