Search the Community
Showing results for tags 'file-search'.
I was thinking about writing a module: Enabling PW to index and search contents of uploaded files. I know the easiest way to include search-inside-files capability is to use Google CSE. Developers who have a dedicated server may of course use of of the "big boys" of search like Lucene, ElasticSearch, Solr etc. For ES you'll need Apache Tomcat, which most people don't have at their disposal, etc. So, my question is, first of all: Would such a feature be used at all? I know you can create some sort of meta-search with file-descriptions, or when using the "one page per file" approach. After some brainstorming, I came up with this: Idea: Make it possible to search file upload content (PDFs, Word, Excel) Approach: Build a module (d'oh) Config settings: select templates / file-fields (what to index) - list all inputfields type "file" “index now” button or “index each time a file is added” or cron? Performance? Where / how to store indexes? As a separate, new field inside each page? On the file-system? In the module folder, each file has a related JSON file? (similar to language files) A new, separate DB-table? What we need: filename / path / URL filetype (to customize search results with file-icons) page id extracted content timestamp of last index-build? MD5/SHA hash or some such? How to handle user roles when doing actual search? Inherit from page? Inherit from file-field? So, what do you think? Not worth it? Does something like this already exist (I searched, but found nothing)?