Jump to content

Recommended Posts

Posted

Can ProcessWire do full-text searches of the entire site, plus all document files (docx, pdf, pptx, etc.)?

Posted
16 hours ago, GaryW said:

Users should be able to do a search of all file contents.

ProcessWire can by itself not read the content of the files - Your default php installation on a typical webhosting service, has no tool installed to open/read the office files. You have to - with an another software or service - read out the content of the files and save it to a textfield to the specific page, where the file is in it. Then you can process it with a site search.

Is this a VPS or kind of dedicated server, where you can run for example python scripts (and python module imports) on it, then you can process the files before and save the content in for example in txt-files or json or whatever and then have a php-import script to do the transfer in ProcessWire to a hidden textfield of that page.

Posted (edited)

Following on from @Tiberium's suggestion, which is how I've done such things myself in the past, if you want to extract text for search out of just about anything there's Apache Tika. It's in Java so, again as suggested, you'd need some sort of VPS or dedicated server – or perhaps you could make it part of a pipeline (e.g. on a local machine) to deliver content to your site.

 

Edited by BillH
  • Like 2

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...