Jump to content

Indexer Module Fork


DaveP
 Share

Recommended Posts

Following on from my discoveries mentioned here, I have forked the Indexer Module referred to.

Couple of reasons for this -

  • Contrary to my thinking at the time of writing the post linked above, there was a deprecated/removed function in one of the files used. (Absolutely no idea why that wasn't apparent originally - must have been some code that isn't always called.)
  • The module was set up to use executable parsers on the server for preference and only included a PHP parser for PDFs, not .doc, .docx etc word processor files. This is probably not optimal for shared hosting use as including your own executables is rarely, if ever, an option.

The fork's repository is at https://github.com/DBPreston/Processwire-site-indexer.

I have fixed the error due to use of split(), which was removed from PHP 7.0, and added a .doc parser so that it will now parse word processor files as well as PDFs and updated the module config page to reflect that.

In the small amount of testing I have done so far, it seems to be working. Neither parser is perfect and the whole mode of operation of the module (adding a 'keywords' field to every page) may not be perfect for every use case, but it is certainly very useful in some instances.

I should be very grateful if anyone has the time to give it some further testing, forking, pull requests etc.

  • Like 11
Link to comment
Share on other sites

Wonderful! I asked myself if I should tackle this when I wrote that thread, then forgot about it, and now this.

I'll definitely test it out this week, because the current project (or rather, the client) uses so many PDFs, it drives me crazy.

  • Like 2
Link to comment
Share on other sites

  • 2 weeks later...

For some strange reason, after installing your module today, and I click on "settings" / configure, nothing happens.

I cleared module cache, the rest of the site is running fine.

PW 3.0.92, PHP 7 (or 7.1, not sure). Tracy Debugger says nothing, I also don't see any related things in error logs.

Any idea?

Link to comment
Share on other sites

            'pdfpath' => "/usr/bin/pdftotext",
            'wordpath' => "/usr/bin/wvText",

What is is this? Do I have to install some Unix tools as root? I'm on a shared host.

Os should they refer to the files in the import folder? o_O

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...