teppo Posted July 27, 2022 Share Posted July 27, 2022 This module is an optional (and still somewhat experimental) add-on for SearchEngine. It adds support for indexing file contents, replacing earlier SearchEngine PDF indexer module. Features SearchEngine by itself will only store the name, description, tags, and custom field values for file/image fields. This module, on the other hand, attempts to extract human-readable text from the file itself. As for file types, at least in theory this module supports any filetype that can be reasonably converted to text. It has built-in support (mostly via third party libraries) for... office documents (.doc, .docx, .rtf, .odf), pdf documents (.pdf), spreadsheets (.xls, .xlsx, .ods, .csv) and plain text (.txt). The module also ships with a FileIndexer base class and exposes the SearchEngineFileIndexer::addFileIndexer() method for introducing indexers for file types that are not yet supported. Links GitHub: https://github.com/teppokoivula/SearchEngineFileIndexer Composer: composer require teppokoivula/search-engine-file-indexer Modules directory: https://processwire.com/modules/search-engine-file-indexer/ Getting started install and configure SearchEngine (version 0.34.0 or later), install SearchEngine File Indexer, install third party dependencies — if you installed SearchEngineFileIndexer via Composer you should already have these available, otherwise you'll need to run "composer install" in the SearchEngineFileIndexer module directory, choose which file indexers you'd like to enable. The rest should happen automagically behind the scenes. Additional notes The important thing to note here is that we're going to rely on third party libraries to handle parsing (most) files, and things can still go wrong, so please consider this a beta release. It did work in my early tests, but there's little guarantee that it will work in real life use cases. Just to be safe it is recommended to back up your site before installing and enabling this module. Another thing to keep in mind is that indexing files can be resource intensive and take plenty of time. As such, this module provides some settings for limiting files by size etc. Regardless, this is something that likely needs further consideration in the future; some future version of this module, or an additional add-on module, may e.g. add support for indexing pages/files "lazily" in the background. 3 1 Link to comment Share on other sites More sharing options...
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!Register a new account
Already have an account? Sign in here.Sign In Now