Jump to content

Module: Site indexer


Recommended Posts

Hi all!

I have created this new module that improve the current search engine on PW:

https://github.com/USSliberty/Processwire-site-indexer (Beta version)

The main idea is to create an hidden field that store keywords (separated by space). The keywords are generated automatically from all text fields in the page, plus PDFs and DOCs files.

So if you create new text fields you can forget to add they on Search Page Module.

The only thing to do after install, is to change the list of fields in Search Page (see attachment). In fact you need to search only in "indexer" field.

post-834-0-54191500-1368028283_thumb.png

NOTE 1: At this time the module index only when you save the page. In the next week maybe i will add the complete-site re-index.

NOTE 2: The files are indexed with 2 Unix packages (poppler-utils & wv). I have tried without success with pure PHP classes, but if know a class that works fine i can add it into module.

ADB

  • Like 15
Link to comment
Share on other sites

Looking forward to checking this out - thanks!

I have been using the attached set of functions for a long time to extract text from PDFs. Probably not as powerful as poppler, but might do what you need. I made some poorly documented changes to the original. Anyway, maybe you'll find something useful in there.

pdf2txt.php

  • Like 1
Link to comment
Share on other sites

Sorry you had no luck with that class. It has been working well for me - several hundred PDFs with no failures so far.

I have attached one that definitely works so maybe you can figure out what the issue might be.

EDIT: Notice that in the main function I changed it so it always uses the handleV2 function. The V3 one wasn't working for me, but you might want to look into that some more.

ian_newsletter_405 (2).pdf

Link to comment
Share on other sites

That is weird. I looked back at my php source downloads and I was running 5.3.3 at some point back in 2010 and that script was still working then. It is strange that none of those options are working. Let me know if there is any testing I can do at my end for you that might help.

When I get a minute or two I might try implementing that class into your module and see how it works for me.

Link to comment
Share on other sites

Ok, I just tried integrating that original pdf2txt script I sent you into your module and it always returns nothing. However if I place the attached version in a web accessible location, edit the last line to point to a PDF file, it works perfectly (albeit with lots of non-fatal php errors that should be dealt with at some point).

Can you try this and see if it works for you at least?

pdf2txt.php

Link to comment
Share on other sites

 Ok, I just tried integrating that original pdf2txt script I sent you into your module and it always returns nothing. However if I place the attached version in a web accessible location, edit the last line to point to a PDF file, it works perfectly (albeit with lots of non-fatal php errors that should be dealt with at some point).

 

Can you try this and see if it works for you at least?

It's true, it works alone but if i integrate into my module doesn't return nothing. Very weird...

Edited by Alessio Dal Bianco
Link to comment
Share on other sites

Awesome, I'll be using this with a couple of projects that are just winding up. I'll let you know if I come across any issues. If I find time I might try to go through and clean up that class too - there is a fair bit of unneeded code in there and lots of undefined variables.

Thanks for your hard work on this.

  • Like 1
Link to comment
Share on other sites

  • 1 month later...
  • 2 weeks later...

Hi Alessio,

when I try to access the modules configuration page I get the following error:

Error: Call to undefined function _() (line 84 of C:\Work\sgh\website\htdocs\site\modules\Processwire-site-indexer-master\Indexer.module) 

Am I missing some dependencies here?

Link to comment
Share on other sites

Hi Alessio,

when I try to access the modules configuration page I get the following error:

Error: Call to undefined function _() (line 84 of C:\Work\sgh\website\htdocs\site\modules\Processwire-site-indexer-master\Indexer.module) 

Am I missing some dependencies here?

Hi Timo,

the _() function stands for gettext() function (see here: http://php.net/manual/en/function.gettext.php).

Maybe you have not enable the gettext module ?

  • Like 1
Link to comment
Share on other sites

This is perhaps an obvious feature creep suggestion but you might want to add the option to remove "noise words" like a, the, of, etc. There are lists on the net. These are words that are too common to be meaningful keywords.

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Similar Content

    • By monollonom
      PageMjmlToHtml
      Github: https://github.com/romaincazier/PageMjmlToHtml
      Modules directory: https://processwire.com/modules/page-mjml-to-html/
      A module allowing you to write your Processwire template using MJML and get a converted HTML output using MJML API.
      This is considered to be in alpha and as such needs some testing before being used in production!

      About
      Created by Mailjet, MJML is a markup language making it a breeze to create newsletters displayed consistently across all email clients.
      Write your template using MJML combined with Processwire’s API and this module will automatically convert your code into a working newsletter thanks to their free-to-use Rest API.
      Prerequisite
      For this module to work you will need to get an API key and paste it in the module’s configuration.
      Usage
      Once your credentials are validated, select the template(s) in which you’re using the MJML syntax, save and go visualize your page(s) to see if everything’s good. You will either get error/warning messages or your email properly formatted and ready-to-go.
      From there you can copy/paste the raw generated code in an external mailing service or distribute your newsletter using ProMailer.
      Features
      The MJML output is cached to avoid repetitive API calls Not cached if there are errors/warnings Cleared if the page is saved Cleared if the template file has been modified A simple (dumb?) code viewer highlights lines with errors/warnings A button is added to quickly copy the raw code of the generated newsletter Not added if the page is rendered outside of a PageView Only visible to users with the page’s edit permission A shortcut is also added under “View” in the edit page to open the raw code in a new tab Multi-languages support
      Notes
      The code viewer is only shown to superusers. If there’s an error the page will display:
      Only its title for guests Its title and a message inviting to contact the administrator for editors If you are using the markup regions output strategy, it might be best to not append files to preserve your MJML markup before calling the MJML API. This option is available in the module’s settings.
    • By Marco Ro
      Hi guys!
      I'm a bit anxious because this is the first module I present! (beta modulo) But I will finally be able to share something with the community too! :)
      This is a BETA version of the PayPal payment system called: PayPal Commerce Platform.
      It is an advanced system (Business Pro account is needed) that brings various benefits in terms of fees and above all integrates direct payment with credit/debit cards. 
      The module integrates with Padloper 0.0.2, which is the current installation I'm using.
      This system integrates the classic PayPal buy button, the alternative or local payment method and the new payment system: credit/debit cards that doesn't go through the PayPal account. It is a Stripe-style payment, it connects directly with the bank and integrates 3D security validation.
      I say that it is a BETA because this module currently only works with Sandbox account, to put it live you need to change API url manually (manually for the moment).
      Because this module is not ready for live:
      I would like to have your opinion on how I built the module (is the first one I do). I don't want to share something that is not fish but I need a comparison with someone more experienced than me, for be sure that this is the best way to code the module.
      If you want to try this I created a git, you will find all the instructions for installation and correct operation. (Git has a MIT licensed)
      https://github.com/MarcooRo/processwire-PayPal-Commerce-Platform I hope I did something that you guys can like :)
    • By monollonom
      (once again I was surprised to see a work of mine pop up in the newsletter, this time without even listing the module on PW modules website 😅. Thx @teppo !)
      FieldtypeQRCode
      Github: https://github.com/romaincazier/FieldtypeQRCode
      Modules directory: https://processwire.com/modules/fieldtype-qrcode/
      A simple fieldtype generating a QR Code from the public URL of the page, and more.
      Using the PHP library QR Code Generator by Kazuhiko Arase.

      Options
      In the field’s Details tab you can change between .gif or .svg formats. If you select .svg you will have the option to directly output the markup instead of a base64 image. SVG is the default.
      You can also change what is used to generate the QR code and even have several sources. The accepted sources (separated by a comma) are: httpUrl, editUrl, or the name of any text/URL/file/image field.
      If LanguageSupport is installed the compatible sources (httpUrl, text field, ...) will return as many QR codes as there are languages. Note however that when outputting on the front-end, only the languages visible to the user will be generated.
      Formatting
      Unformatted value
      When using $page->getUnformatted("qrcode_field") it returns an array with the following structure:
      [ [ "label" => string, // label used in the admin "qr" => string, // the qrcode image "source" => string, // the source, as defined in the configuration "text" => string // and the text used to generate the qrcode ], ... ] Formatted value
      The formatted value is an <img>/<svg> (or several right next to each other). There is no other markup.
      Should you need the same markup as in the admin you could use:
      $field = $fields->get("qrcode_field"); $field->type->markupValue($page, $field, $page->getUnformatted("qrcode_field")); But it’s a bit cumbersome, plus you need to import the FieldtypeQRCode's css/js. Best is to make your own markup using the unformatted value.
      Static QR code generator
      You can call FieldtypeQRCode::generateQRCode to generate any QR code you want. Its arguments are:
      string $text bool $svg Generate the QR code as svg instead of gif ? (default=true) bool $markup If svg, output its markup instead of a base64 ? (default=false) Hooks
      Please have a look at the source code for more details about the hookable functions.
      Examples
      $wire->addHookAfter("FieldtypeQRCode::getQRText", function($event) { $page = $event->arguments("page"); $event->return = $page->title; // or could be: $event->return = "Your custom text"; }) $wire->addHookAfter("FieldtypeQRCode::generateQRCodes", function($event) { $qrcodes = $event->return; // keep everything except the QR codes generated from editUrl foreach($qrcodes as $key => &$qrcode) { if($qrcode["source"] === "editUrl") { unset($qrcodes[$key]); } } unset($qrcode); $event->return = $qrcodes; })
    • By Sebi
      AppApiFile adds the /file endpoint to the AppApi routes definition. Makes it possible to query files via the api. 
      This module relies on the base module AppApi, which must be installed before AppApiFile can do its work.
      Features
      You can access all files that are uploaded at any ProcessWire page. Call api/file/route/in/pagetree?file=test.jpg to access a page via its route in the page tree. Alternatively you can call api/file/4242?file=test.jpg (e.g.,) to access a page by its id. The module will make sure that the page is accessible by the active user.
      The GET-param "file" defines the basename of the file which you want to get.
      The following GET-params (optional) can be used to manipulate an image:
      width height maxwidth maxheight cropX cropY Use GET-Param format=base64 to receive the file in base64 format.
    • By MarkE
      This fieldtype and inputfield bundle was built for storing measurement values within a field, rendering them in a variety of formats and converting them to other units or otherwise modifying them via the API.
      The API consists of a number of predefined functions, some of which include...
      render() for rendering the measurement object, valueAs() for converting the value to another unit value, convertTo() for converting the whole measurement object to different units, and add() and subtract() for for modifying the stored value by the value (converted as required) in another measurement. In the admin the inputfield includes a checkbox (which can be optionally disabled) for converting values on page save. For an example if a value was typed in as centimeters, the unit was changed to metres, and the page saved with this checkbox selected, said value would be automatically converted so that e.g. 170 cm becomes 1.7 m.

      A simple length field using Fieldtype Measurement and Inputfield Measurement.
      Combination units (e.g. feet and inches) are also supported.
      Please note that this module is 'proof of concept' at the moment - there are limited units available and quite a lot of code tidying to do. More units will be added shortly.
      See the GitHub at https://github.com/MetaTunes/FieldtypeMeasurement for full details and updates.
×
×
  • Create New...