Jump to content

Search engine considerations and headache


Recommended Posts

Sorry if this a bit of a rant.

I'm in the last steps of getting our intranet "reborn" on PW, and one still unanswered question is what search backend to use. As with most intranets it holds quite a big number of office and pdf files. I've already tried a few of the major open source search engines like ElasticSearch, Solr, Sphinx and OpenSearchServer, and I found all of them lacking in two regards - for one, the documentation is all over the place so you never know if the piece you're reading even applies to the version you're using, and the implementations of their APIs are just horribly awkward and extremely picky in regards to the slightest deviations from their "standard" (which isn't really concisely documented) syntax.

None of them comes with a basic permission system, which is an absolute must have. I can probably work around that with facets or the like, but still... you'd think everybody all over the world but me only indexes public webpages. Design decisions like using JSON objects that have multiple identically named properties make me doubt the sanity of those maintaining the software.

Looking at all these points, I'm now also considering rolling my own on top of an InnoDB fulltext index, just re-using the text extractors I've already running in the old system (up to now only feeding the extracted plaintext into a MySQL table and doing literal searches), adding a fulltext index, setting up a lean API module for the few search variants I need and be done with it. That, of course, still leaves the topic of extracting relevant snippets open - should I write my own UDF for that, or are there (functional and maintained) third party extensions available to do just that? A question that warrants some more digging for an educated decision.

I'm still a bit torn, but there's also the time factor to consider. If any of you has experiences with searches (especially with implementing visibility of search content using a group-based permission concept) and could throw in a few pointers or experiences, I'd be glad.

  • Like 2
Link to comment
Share on other sites

  • 1 year later...

  • Create New...