GaryW Posted May 29 Posted May 29 Can ProcessWire do full-text searches of the entire site, plus all document files (docx, pdf, pptx, etc.)?
elabx Posted May 29 Posted May 29 Do you mean the files content or just their names? I use https://github.com/teppokoivula/SearchEngine for this sort of things. 1
Gideon So Posted May 30 Posted May 30 Hi @GaryW I don't think it is possible to search all file contents with the ProcessWire selector function. Gideon
Tiberium Posted May 30 Posted May 30 16 hours ago, GaryW said: Users should be able to do a search of all file contents. ProcessWire can by itself not read the content of the files - Your default php installation on a typical webhosting service, has no tool installed to open/read the office files. You have to - with an another software or service - read out the content of the files and save it to a textfield to the specific page, where the file is in it. Then you can process it with a site search. Is this a VPS or kind of dedicated server, where you can run for example python scripts (and python module imports) on it, then you can process the files before and save the content in for example in txt-files or json or whatever and then have a php-import script to do the transfer in ProcessWire to a hidden textfield of that page.
BillH Posted May 30 Posted May 30 (edited) Following on from @Tiberium's suggestion, which is how I've done such things myself in the past, if you want to extract text for search out of just about anything there's Apache Tika. It's in Java so, again as suggested, you'd need some sort of VPS or dedicated server – or perhaps you could make it part of a pipeline (e.g. on a local machine) to deliver content to your site. Edited May 30 by BillH 2
teppo Posted May 31 Posted May 31 It's somewhat experimental (never got to use it much), but for SearchEngine there is also an add-on for indexing file contents (plain text, PDF, and most "office formats"): https://github.com/teppokoivula/SearchEngineFileIndexer. Honestly not sure how well it works at the moment 🙂 3
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now