This module is an optional (and still somewhat experimental) add-on for SearchEngine. It adds support for indexing file contents, replacing earlier SearchEngine PDF indexer module.
Features
SearchEngine by itself will only store the name, description, tags, and custom field values for file/image fields. This module, on the other hand, attempts to extract human-readable text from the file itself.
As for file types, at least in theory this module supports any filetype that can be reasonably converted to text. It has built-in support (mostly via third party libraries) for...
office documents (.doc, .docx, .rtf, .odf),
pdf documents (.pdf),
spreadsheets (.xls, .xlsx, .ods, .csv) and
plain text (.txt).
The module also ships with a FileIndexer base class and exposes the SearchEngineFileIndexer::addFileIndexer() method for introducing indexers for file types that are not yet supported.
Links
GitHub: https://github.com/teppokoivula/SearchEngineFileIndexer
Composer: composer require teppokoivula/search-engine-file-indexer
Modules directory: https://processwire.com/modules/search-engine-file-indexer/
Getting started
install and configure SearchEngine (version 0.34.0 or later),
install SearchEngine File Indexer,
install third party dependencies — if you installed SearchEngineFileIndexer via Composer you should already have these available, otherwise you'll need to run "composer install" in the SearchEngineFileIndexer module directory,
choose which file indexers you'd like to enable.
The rest should happen automagically behind the scenes.
Additional notes
The important thing to note here is that we're going to rely on third party libraries to handle parsing (most) files, and things can still go wrong, so please consider this a beta release. It did work in my early tests, but there's little guarantee that it will work in real life use cases. Just to be safe it is recommended to back up your site before installing and enabling this module.
Another thing to keep in mind is that indexing files can be resource intensive and take plenty of time. As such, this module provides some settings for limiting files by size etc. Regardless, this is something that likely needs further consideration in the future; some future version of this module, or an additional add-on module, may e.g. add support for indexing pages/files "lazily" in the background.