Jump to content
bernhard

RockFinder - Highly Efficient and Flexible SQL Finder Module

Recommended Posts

@bernhard I've finally had a chance to try out your module tonight for a project where we're loading pages into a large data table (200+ rows) and were hitting a wall.

Using RockFinder I now have the initial page load down to ~2 seconds, down from ~7+ seconds! This is a fantastic module, Bernhard. It looks like it's really well thought out and has some really powerful features for building queries. I love how it extends PW's native selectors and allows you to return standard objects, making it easier to substitute this in for a regular $pages->find. Thank you for making this!

I think I can answer my own question now... The main issue with creating Page objects is that page instantiation requires a trip back to the database. The initial $pages->find converts a selector into a SQL query which returns an array of matching page IDs. Then those IDs are used to go back to the database and get the pages (or pull them from cache if they're already loaded). Then for any page field requested that isn't auto-join, an additional database query is required. If you're looping through a lot of pages, that's a lot of DB queries!

It seems like there might be a way to provide the functionality of RockFinder in the native PW core, as an option when loading pages. You would still end up with Page objects in the end (which in my case would be a huge boon since I like to extend the Page class with a custom class and methods for each template), but we could skip that second trip to the database (getting pages by IDs) if we could just tell PW which fields we wanted it to get up front. After that, any additional fields we didn't specify could be loaded with another trip to the DB, as they are now.

That being said, I'm sure @ryan has a good reason for that second trip to the DB. But it seems like there must be a way that we could improve the speed of native pages, even if it is a hidden/advanced option with some caveats.

 

One minor complaint: I noticed is that the module seems to fail silently and return nothing when it can't find one of the fields. It would be good to throw an exception to make this easier to debug.

Edit: Another thought... Is there a reason not to use WireData and WireArray for the objects returned from RockFinder, in place of an StdObject? This would allow you to use WireArray's built in sorting and selecting features on the result set:

$results = $this->database->query($this->getSql());
if($array) {
	$objects = $results->fetchAll(\PDO::FETCH_ASSOC);
}
else {
	$objects = $results->fetchAll(\PDO::FETCH_CLASS, '\ProcessWire\WireData');
	$objects = (new WireArray())->import($objects);
}

 

  • Like 4
  • Thanks 1

Share this post


Link to post
Share on other sites
6 hours ago, thetuningspoon said:

Using RockFinder I now have the initial page load down to ~2 seconds, down from ~7+ seconds! This is a fantastic module, Bernhard. It looks like it's really well thought out and has some really powerful features for building queries. I love how it extends PW's native selectors and allows you to return standard objects, making it easier to substitute this in for a regular $pages->find. Thank you for making this!

Great to hear that. I wouldn't call it well thought out, though. When I started building it I had far too less knowledge about what's going on on the regular page finder. I think the module could be improved a lot on several sides. But it get's all the jobs done for me and I can build RockGrids with thousands of pages that load in some ms compared to loading times above 30s that I had before.

6 hours ago, thetuningspoon said:

It seems like there might be a way to provide the functionality of RockFinder in the native PW core, as an option when loading pages.

I'd love to have a better version of RockFinder in the core 🙂 

6 hours ago, thetuningspoon said:

Another thought... Is there a reason not to use WireData and WireArray for the objects returned from RockFinder, in place of an StdObject? This would allow you to use WireArray's built in sorting and selecting features on the result set:

Sounds like a good idea, I'll have a look 🙂 

 

PS: 2 seconds still sounds very slow for 200 rows. May I see your finder setup?

Share this post


Link to post
Share on other sites
4 hours ago, bernhard said:

PS: 2 seconds still sounds very slow for 200 rows. May I see your finder setup?

You're right... I was not counting some of the pages involved. There are at least 2 to 3 times that many.

Also, I am counting the entire time from request to first response (not using a debug timer)

  • Like 1

Share this post


Link to post
Share on other sites

A bit of an update after some more experimentation tonight. 

I added pages.templates_id as a field always selected in the RockFinder results, and then attempted to use the $pages->newPage() method to take the results of the SQL query and convert them into a PageArray of Page objects. This worked, thus eliminating that second trip to the database I mentioned (and also bypassing the page cache and probably some other nice features of the normal page creation process).

Unfortunately, this slowed the whole thing way down again. So I'm thinking now that it is something else about constructing Pages that is slow. Maybe generating the page path or some other properties are the problem. Perhaps I need to load up some additional fields up front. Will have to test more.

WireData/WireArray works great, though.

  • Like 3
  • Thanks 1

Share this post


Link to post
Share on other sites

Hey @thetuningspoon I just tried your suggestion about the WireArray/WireData... It's awesome 🙂 

I did it a little differently though, because my RockGrids don't work the way you implemented it. v1.0.9 introduces a new method: $finder->getWireArray(); This will return a WireArray containing WireData objects just as you suggested:

LNNrbFq.png

For anybody wondering what this update could be used for: You can use all the pw api magic on that objects:

QTuCyqf.png

 

New version is on GitHub - I'll change my public projects to github to have the PW modules directory automatically in sync: https://github.com/BernhardBaumrock/RockFinder/commit/46377ba6ea399f8557893684da3c3dd486b2c44e

  • Like 7

Share this post


Link to post
Share on other sites

Just for reference, I think the easiest way to do aggregations at the moment is creating a regular finder and taking the resulting SQL as subquery and modifying it to your needs:

euOeXd2.png

0lbHxb8.png 

 

And if you need it in your code (not in the RockFinder Tester) you need to set the SQL via $finder->sql:

rqeNQv9.png

  • Like 5

Share this post


Link to post
Share on other sites

Here's a little Tutorial of how to get pages and their parent's titles requested by @mel47

The page structure:

Yq3bOyg.png

The initial basic Finder:

sgUdo58.png

The easy and inefficient way:

8W17HzH.png

This will load all pages in memory and be slow when you have lots of pages!

The a little more complicated but far more efficient way. First, we prepare the finder to join:

4UaeZpm.png

Then we join that finder to the initial finder:

yB2UNpT.png

Then just hide those two unnecessary columns in your final grid:

document.addEventListener('RockGridItemBeforeInit', function(e) {
  if(e.target.id != 'RockGridItem_yourgrid') return;
  var grid = RockGrid.getGrid(e.target.id);
  var colDefs = grid.gridOptions.columnDefs;
  var col;
  
  // all your grid's frontend settings

  grid.getColDef('parent_id').hide = true;
  grid.getColDef('cat_id').hide = true;
});

 

  • Like 3

Share this post


Link to post
Share on other sites

Thanks!

Was able to successfully display children/parent pages. Looks like this :

$contact = new RockFinder('template=contact',  ['title', 'parent_id']);

$org = new RockFinder('template=organisme', ['title']);
$contact->join($org, 'contact', ['id' => 'parent_id']);
return $contact;

However, I can't add any other fields like I normally do:

$contact = new RockFinder('template=contact, include=all',  ['title', 'parent_id', 'nom', 'prenom']);

I guess it have to be done after the join or have to be written in a different way? I couldn't find any example on your documentation similar to that.

Thanks again!

Mel

Share this post


Link to post
Share on other sites

Thanks. Effectively working, I messed up titles of fields. Feel a little bit noob... 🤫

  • Like 1

Share this post


Link to post
Share on other sites

I had a quite some trouble getting this to work with the content from a ProFields Table, as this does not work the same as with a Repeater or PageTable.
But I am happy to say that treating those fields as a 'file' works, by setting this in the options array.  Ie: 

<?php 
	$finder = new RockFinder("template=template_name,limit=10", ["title", "some_field_name"]);
	$field = $finder->addField("my_table_field", ["column_name", "other_column"], ["type" => "file"]);
	$field->seperator = ", ";

 

  • Like 1

Share this post


Link to post
Share on other sites

RockFinder v1.1.0 now supports filters. This makes it possible to use custom access control (eg showing only certain rows on a RockGrid):

https://github.com/BernhardBaumrock/RockFinder/commit/3e4daae0f283280672ab606a1eef1b16497f19fe

 

https://github.com/BernhardBaumrock/RockFinder/blob/3e4daae0f283280672ab606a1eef1b16497f19fe/readme.md#filters--access-control

Filters & Access Control

You can filter the resulting rows by custom callbacks:

// show only rows that have an id > 3
$finder = new RockFinder('id>0, limit=5', ['title', 'status']);
$finder->filter(function($row) {
  return $row->id > 3;
});

img

You can also use these filters for showing only editable pages by the current user. Be careful! This will load pages into memory and you will lose the performance benefits of RockFinder / direct SQL queries.

$finder->filter(function($row) {
  $page = $this->wire->pages->get($row->id);
  return $page->editable();
});

One thing to mention is that you can apply these filters BEFORE or AFTER closures have been applied. If you apply them BEFORE executing the closures it might be a little more performant (because closures will not be executed for rows that have been removed by the filter), but you will not have access to columns that are populated via closures (like page path).

  • Like 5

Share this post


Link to post
Share on other sites

Hey @bernhard I try to show only pages that are published. But right now my query shows all entries.

I tried to use a filter, but it doesn't work. How can I get only published pages?

<?php
$stellen = new RockFinder("template=stelle", ['title', 'stellekosten', 'kostenproclick', 'mitarbeiterid', 'created', 'bookmark','status']);
$stellen->filter(function ($row) {
    $page = $this->wire->pages->get($row->id);
    return $page->isUnpublished();
});

return $stellen->getSQL();

 

Share this post


Link to post
Share on other sites

IT should return only published pages! I can't investigate during the next days, sorry.

Share this post


Link to post
Share on other sites

I have a problem getting to use Rockfinder online correctly.
On my developer network every works fine, but once the site is online, RockFinder stops working.

$finder = new \ProcessWire\RockFinder("template=panddetail,ska_Status='FOR_SALE',limit=0,ska_City!='',sort=ska_City",['ska_City', 'ska_ZipCode']);
$sql = $finder->getSQL();
$finder->sql = "SELECT ska_City AS city, ska_ZipCode AS zip FROM ($sql) AS tmp GROUP BY city";
$ska_Cities = $finder->getArrays();

print_r($ska_Cities);

I don't understand why...

I also used new Rockfinder (without namespaces etc)

I checked PHP version too... 

Is there some specific dependency that Apache or PHP should have for RockFinder?

Share this post


Link to post
Share on other sites
4 hours ago, jmartsch said:

What exactly is the output? Do you get error messages?

 

I've compared the $sql variables on the live server end the local server.  They both are the same.

SELECT `rockfinder`.* FROM /* original pw query */ (SELECT pages.id FROM `pages` JOIN field_ska_status AS field_ska_status ON field_ska_status.pages_id=pages.id AND (((field_ska_status.data='FOR_SALE' ) )) LEFT JOIN field_ska_city AS field_ska_city__blank3 ON field_ska_city__blank3.pages_id=pages.id LEFT JOIN field_ska_city AS _sort_ska_City ON _sort_ska_City.pages_id=pages.id WHERE (pages.templates_id=49) AND (pages.status<2048) AND (((field_ska_city__blank3.pages_id IS NOT NULL AND (field_ska_city__blank3.data!='' OR field_ska_city__blank3.data='0')))) GROUP BY pages.id ORDER BY _sort_ska_City.data ) as `pwfinder` /* rockfinder */ LEFT JOIN ( SELECT `pages`.`id` AS `id`, `ska_City`.`ska_City` AS `ska_City` FROM `pages` /* --- join ska_City --- */ LEFT JOIN (SELECT `pages_id` AS `pageid`, `ska_City`.`data` AS `ska_City` FROM `field_ska_City` AS `ska_City`) AS `ska_City` ON `ska_City`.`pageid` = `pages`.`id` /* --- end ska_City --- */ ) AS `rockfinder` ON `pwfinder`.`id` = `rockfinder`.`id` /* end rockfinder */

SELECT `rockfinder`.* FROM /* original pw query */ (SELECT pages.id FROM `pages` JOIN field_ska_status AS field_ska_status ON field_ska_status.pages_id=pages.id AND (((field_ska_status.data='FOR_SALE' ) )) LEFT JOIN field_ska_city AS field_ska_city__blank3 ON field_ska_city__blank3.pages_id=pages.id LEFT JOIN field_ska_city AS _sort_ska_City ON _sort_ska_City.pages_id=pages.id WHERE (pages.templates_id=49) AND (pages.status<2048) AND (((field_ska_city__blank3.pages_id IS NOT NULL AND (field_ska_city__blank3.data!='' OR field_ska_city__blank3.data='0')))) GROUP BY pages.id ORDER BY _sort_ska_City.data ) as `pwfinder` /* rockfinder */ LEFT JOIN ( SELECT `pages`.`id` AS `id`, `ska_City`.`ska_City` AS `ska_City` FROM `pages` /* --- join ska_City --- */ LEFT JOIN (SELECT `pages_id` AS `pageid`, `ska_City`.`data` AS `ska_City` FROM `field_ska_City` AS `ska_City`) AS `ska_City` ON `ska_City`.`pageid` = `pages`.`id` /* --- end ska_City --- */ ) AS `rockfinder` ON `pwfinder`.`id` = `rockfinder`.`id` /* end rockfinder */

But $ska_Cities array remains empty on live server. 

Yet a copy of site on development server $ska_Cities is print_r:

Array ( [0] => Array ( [city] => Aalst ) [1] => Array ( [city] => Duinbergen ) [2] => Array ( [city] => Heist-aan-Zee ) [3] => Array ( [city] => Knokke-Heist ) [4] => Array ( [city] => Oostende ) [5] => Array ( [city] => Varsenare ) [6] => Array ( [city] => Wenduine ) [7] => Array ( [city] => Zeebrugge ) [8] => Array ( [city] => Zwevegem ) ) 

ps: NO errors

Share this post


Link to post
Share on other sites
9 hours ago, KarlvonKarton said:

I've compared the $sql variables on the live server end the local server.  They both are the same.

Is the MySQL server the same version ?

Second, are you SURE that the template id 49 is the same on both dev and prod site ?

edit: nevermind, I didn't saw `template=panddetail` on first instance..

Share this post


Link to post
Share on other sites
1 hour ago, flydev said:

Is the MySQL server the same version ?

  • Serverversie: 5.6.41-log - MySQL Community Server (GPL) = LIVE
  • Serverversie: 5.6.38 - MySQL Community Server (GPL) = DEVELOPMENT

Share this post


Link to post
Share on other sites

I have tried to execute the SQL with phpmyadmin on both servers (live and dev).

On DEV it works, but not on LIVE.

The problem is that RockFinder switches from uppercase to lowercase in the SQL.
Just check this SQL:

SELECT `rockfinder`.* FROM /* original pw query */ (SELECT pages.id FROM `pages` JOIN field_ska_status AS field_ska_status ON field_ska_status.pages_id=pages.id AND (((field_ska_status.data='FOR_SALE' ) )) LEFT JOIN field_ska_city AS field_ska_city__blank3 ON field_ska_city__blank3.pages_id=pages.id LEFT JOIN field_ska_city AS _sort_ska_City ON _sort_ska_City.pages_id=pages.id WHERE (pages.templates_id=49) AND (pages.status<1024) AND (((field_ska_city__blank3.pages_id IS NOT NULL AND (field_ska_city__blank3.data!='' OR field_ska_city__blank3.data='0')))) GROUP BY pages.id ORDER BY _sort_ska_City.data ) as `pwfinder` /* rockfinder */ LEFT JOIN ( SELECT `pages`.`id` AS `id`, `ska_City`.`ska_City` AS `ska_City` FROM `pages` /* --- join ska_City --- */ LEFT JOIN (SELECT `pages_id` AS `pageid`, `ska_City`.`data` AS `ska_City` FROM `field_ska_City` AS `ska_City`) AS `ska_City` ON `ska_City`.`pageid` = `pages`.`id` /* --- end ska_City --- */ ) AS `rockfinder` ON `pwfinder`.`id` = `rockfinder`.`id` /* end rockfinder */

In the JOIN the field is field_ska_city, but in the last FROM the field is field_ska_City...

Is that normal behavior?  Is this a bug?

 

The code I use:

$finder = new \ProcessWire\RockFinder("template=panddetail,ska_Status='FOR_SALE',ska_City!='',sort=ska_City",['ska_City']);
$sql = $finder->getSQL();
print_r($sql);
$finder->sql = "SELECT ska_city AS city FROM ($sql) AS tmp GROUP BY city";
$ska_Cities = $finder->getValues('city');
print_r($ska_Cities);
Quote

 

Share this post


Link to post
Share on other sites
1 hour ago, KarlvonKarton said:

The code I use:


$finder = new \ProcessWire\RockFinder("template=panddetail,ska_Status='FOR_SALE',ska_City!='',sort=ska_City",['ska_City']);
$sql = $finder->getSQL();
print_r($sql);
$finder->sql = "SELECT ska_city AS city FROM ($sql) AS tmp GROUP BY city";
$ska_Cities = $finder->getValues('city');
print_r($ska_Cities);

 

shouldn't be ? :

 

[...]
$finder->sql = "SELECT ska_City AS city FROM ($sql) AS tmp GROUP BY city";
[...]

 

Share this post


Link to post
Share on other sites
4 hours ago, flydev said:

$finder->sql = "SELECT ska_City AS city FROM ($sql) AS tmp GROUP BY city";

It should, but it doesn't change the outcome.  (I've tried both upper or lowercase)

Whatever I try (upper, lowercase), the SQL statement created by RockFinder is changing the name of the table to 'field_ska_City' while the name of the table is 'field_ska_city'... (in the last FROM - see SQL above)

Share this post


Link to post
Share on other sites

I don't think RF changes any lower/uppercase. Quickly googling for "case sensitivity table mysql" returns tons of results though, suggesting it may have to do with how mySQL handles case-sensitivity on different OS... just one random example: https://searchdatacenter.techtarget.com/answer/Naming-and-renaming-MySQL-files-for-case-sensitivity

 

  • Like 3

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By adrian
      Tracy Debugger for ProcessWire
      The ultimate “swiss army knife” debugging and development tool for the ProcessWire CMF/CMS

       
      Integrates and extends Nette's Tracy debugging tool and adds 35+ custom tools designed for effective ProcessWire debugging and lightning fast development
      The most comprehensive set of instructions and examples is available at: https://adrianbj.github.io/TracyDebugger
      Modules Directory: http://modules.processwire.com/modules/tracy-debugger/
      Github: https://github.com/adrianbj/TracyDebugger
      A big thanks to @tpr for introducing me to Tracy and for the idea for this module and for significant feedback, testing, and feature suggestions.
    • By adrian
      This module allows you to automatically rename file (including image) uploads according to a configurable format
      This module lets you define as many rules as you need to determine how uploaded files will be named and you can have different rules for different pages, templates, fields, and file extensions, or one rule for all uploads. Renaming works for files uploaded via the admin interface and also via the API, including images added from remote URLs.   Github: https://github.com/adrianbj/CustomUploadNames
      Modules Directory: http://modules.processwire.com/modules/process-custom-upload-names/
      Renaming Rules
      The module config allows you to set an unlimited number of Rename Rules. You can define rules to specific fields, templates, pages, and file extensions. If a rule option is left blank, the rule with be applied to all fields/templates/pages/extensions. Leave Filename Format blank to prevent renaming for a specific field/template/page combo, overriding a more general rule. Rules are processed in order, so put more specific rules before more general ones. You can drag to change the order of rules as needed. The following variables can be used in the filename format: $page, $template, $field, and $file. For some of these (eg. $field->description), if they haven't been filled out and saved prior to uploading the image, renaming won't occur on upload, but will happen on page save (could be an issue if image has already been inserted into RTE/HTML field before page save). Some examples: $page->title mysite-{$template->name}-images $field->label $file->description {$page->name}-{$file->filesize}-kb prefix-[Y-m-d_H-i-s]-suffix (anything inside square brackets is is considered to be a PHP date format for the current date/time) randstring[n] (where n is the number of characters you want in the string) ### (custom number mask, eg. 001 if more than one image with same name on a page. This is an enhanced version of the automatic addition of numbers if required) If 'Rename on Save' is checked files will be renamed again each time a page is saved (admin or front-end via API). WARNING: this setting will break any direct links to the old filename, which is particularly relevant for images inserted into RTE/HTML fields. The Filename Format can be defined using plain text and PW $page variable, for example: mysite-{$page->path} You can preserve the uploaded filename for certain rules. This will allow you to set a general renaming rule for your entire site, but then add a rule for a specific page/template/field that does not rename the uploaded file. Just simply build the rule, but leave the Filename Format field empty. You can specify an optional character limit (to nearest whole word) for the length of the filename - useful if you are using $page->path, $path->name etc and have very long page names - eg. news articles, publication titles etc. NOTE - if you are using ProcessWire's webp features, be sure to use the useSrcExt because if you have jpg and png files on the same page and your rename rules result in the same name, you need to maintain the src extension so they are kept as separate files.
      $config->webpOptions = array(     'useSrcExt' => false, // Use source file extension in webp filename? (file.jpg.webp rather than file.webp) ); Acknowledgments
      The module config settings make use of code from Pete's EmailToPage module and the renaming function is based on this code from Ryan: http://processwire.com/talk/topic/3299-ability-to-define-convention-for-image-and-file-upload-names/?p=32623 (also see this post for his thoughts on file renaming and why it is the lazy way out - worth a read before deciding to use this module). 
       
       
      NOTE:
      This should not be needed on most sites, but I work with lots of sites that host PDFs and photos/vectors that are available for download and I have always renamed the files on upload because clients will often upload files with horrible meaningless filenames like:
      Final ReportV6 web version for John Feb 23.PDF

    • By Mike Rockett
      Jumplinks for ProcessWire
      Release: 1.5.56
      Composer: rockett/jumplinks
      Jumplinks is an enhanced version of the original ProcessRedirects by Antti Peisa.
      The Process module manages your permanent and temporary redirects (we'll call these "jumplinks" from now on, unless in reference to redirects from another module), useful for when you're migrating over to ProcessWire from another system/platform. Each jumplink supports wildcards, shortening the time needed to create them.
      Unlike similar modules for other platforms, wildcards in Jumplinks are much easier to work with, as Regular Expressions are not fully exposed. Instead, parameters wrapped in curly braces are used - these are described in the documentation.
      Under Development: 2.0, to be powered by FastRoute
      As of version 1.5.0, Jumplinks requires at least ProcessWire 2.6.1 to run.
      View on GitLab
      Download via the Modules Directory
      Read the docs
      Features
      The most prominent features include:
      Basic jumplinks (from one fixed route to another) Parameter-based wildcards with "Smart" equivalents Mapping Collections (for converting ID-based routes to their named-equivalents without the need to create multiple jumplinks) Destination Selectors (for finding and redirecting to pages containing legacy location information) Timed Activation (activate and/or deactivate jumplinks at specific times) 404-Monitor (for creating jumplinks based on 404 hits) Additionally, the following features may come in handy:
      Stale jumplink management Legacy domain support for slow migrations An importer (from CSV or ProcessRedirects) Feedback & Feature Requests
      I’d love to know what you think of this module. Please provide some feedback on the module as a whole, or even regarding smaller things that make it whole. Also, please feel free to submit feature requests and their use-cases.
      Note: Features requested so far have been added to the to-do list, and will be added to 2.0, and not the current dev/master branches.
      Open Source

      Jumplinks is an open-source project, and is free to use. In fact, Jumplinks will always be open-source, and will always remain free to use. Forever. If you would like to support the development of Jumplinks, please consider making a small donation via PayPal.
      Enjoy! :)
×
×
  • Create New...