bernhard

RockSqlFinder (outdated thread, link to current version inside)

Recommended Posts

Posted (edited)

I am not sure if I would ever have a practical use for this myself (at least not in the foreseeable future), but I must say it's a pleasure to see this project seeing the light of day.

Kudos.

For me, it's always a treat to see what people come up with using PW. Goes to show you never can tell.

PW really is "not just another CMS", but a framework as well.

Hope I find some time taking a test-ride with your module very soon...

Edited by dragan
slight wording edit
  • Like 2

Share this post


Link to post
Share on other sites
7 minutes ago, dragan said:

I am not sure if I would ever have a practical use for this myself (at least not in the foreseeable future), but I must say it's a pleasure to see this project seeing the light of day.

I'm quite sure you will ;) Actually it's a VERY important part of my RockGrid module. This makes building Grids (or Datatables, or "Listers" how we in the PW world would call it) a breeze. You just throw a $pages->findObjects() call to the grid and let it do the rest (sorting, filtering, pagination etc).

Another usecase where this can really be a lifesaver is all kinds of data exports, feed generation etc. - It makes a huge difference to load 10.000 pw pages compared to executing an SQL and a quick php foreach.

I'm working on the module again because it was too limited for my needs. I think I will come up with a good solution the next days! (And proper docs)

  • Like 2

Share this post


Link to post
Share on other sites

Thanks for this module @bernhard ,  I am going to test it heavily today.

In the project I built, similar to what you showed us, we are calling 30,000 pages to 100,000 pages per iteration (17 atm) and doing heavy aggregation on it to generate statistics and chart - a pain, hopefully fixed with your module. At this moment I am using this module quite modified for my need which add custom functions to the Page object by hooking it. I think that today I will re-write my hooks using your module. Feedback coming. Thanks again for the work and congratulation 👍

  • Like 1

Share this post


Link to post
Share on other sites

@flydev that's exactly what it is built for. I think you'll love it combined with RockGrid ;) 

I worked on it quite heavily yesterday because at the moment it is not possible to query referenced page fields easily. See this example:

5acdc7b9823bd_2018-04-1110_30_19-RockFinanceProcessWirefinance_to.png.a9f7c926e01a12643e1bd4b657c19eaf.png

"account" is the bank account where the transaction relates to. "account:title" should show "MyBankAccount 1" and "MyBankAccount 2" but it shows the "title" of the transaction (uniqid).

I'm tackling this today, so if you have some time to wait, that might make sense. Or you might come up with a good solution for that? I thought of adding dot-notation, like "account.title.status" showing the account id, the account title and its pw-status. When I tried that, I realized that it might be better (or the only way) to JOIN all the field's tables so that they are available to query for the related information afterwards. This is not possible with the current subquery setup.

The other option I will check today is to make easy JOINS possible. This might be easier. Something like this:

$pages->findObjects(
    ['template=transaction', ['id', 'title', 'account', 'images:description'], [
        'name' => 'trans',
        'joins' => ['template=account', ['id', 'title', 'status'], ['name' => 'acc', 'on' => 'acc.id = trans.account']],
        'concat' => ['account', 'images'],
    ]]
);

What do you think?

  • Like 1

Share this post


Link to post
Share on other sites

Phew... This was quite hard but finally I made progress.

Please see the dev branch if you want to play around @flydev : https://gitlab.com/baumrock/RockSqlFinder/tree/dev#notes (other parts of the readme are outdated meanwhile, It's just some notes for myself and old examples..., sorry no more time today).

This is an example query with a join. This is necessary for pagefields when you want to get not only the page id's but also the related page's fields information like "title":

5ad0b9bb15404_2018-04-1315_58_10-EditPage_mariarocksqlfinder_test.thumb.png.1e292055dbb12a3d208df5e8dfb4e9fe.png

5ad0b9c5457ab_2018-04-1315_57_59-EditPage_Homerocksqlfinder_test.thumb.png.e7c8b146a09ab299cf0acfc3310105fc.png

 

Next step is to make CONCAT operations possible so that the join will return a concatenated string of ids and titles (miau, cat ..., mimimimi). This should also make aggregations like sum() possible (see here: https://gitlab.com/baumrock/RockSqlFinder/tree/dev#notes -> not much change needed for the resulting sql).

 

The new version of the module is completely recursive, so I hope (haha...) that support for image fields with description (in this case it's not a related table but on the same table like the imagefield) easy. We will see...

PS: Actually I just realized that defining the field "cats" is stupid when we define a join named 'cats'. Because I'm removing this field later anyhow... need to change this. You see there will be some more changes in the future, sorry...

  • Like 4

Share this post


Link to post
Share on other sites

Another update is online (branch DEV3): https://gitlab.com/baumrock/RockSqlFinder/tree/dev3

This is a major upgrade, a complete rewrite and finally a really good solution imho:

  • I changed the syntax to object-notation in preference of nested associative arrays and it's a lot cleaner and easier to setup and read now.
  • I rewrote the logic completely. It is NOT recursive any more but it takes away a LOT of complexity. You can add "fields" to your find-operation and those "fields" are fully customizable classes.
  • The module ships with predefined classes for FieldtypeText (easy as it is just a column in the DB), FieldtypeFile (that shows concatenated filenames and descriptions) and FieldtypeRepeater (that shows concatenated data of the repeateritem's fields).
  • The user can add custom FieldFinders in /site/assets/RockFinder (this will be the final name of the module).
  • Possibility to add custom separators for concatenated fields
  • Easy to create custom SQL by wrapping the created SQL in your own statement (SELECT ... FROM $finder->getSQL() GROUP BY ...)

Examples:

Query 10.000 invoices and return their date and value:

5ad6602ab4fab_2018-04-1722_50_36-EditPage_mariarockfinder_test.png.51c4b9a54f7a1c6e237130ab41f53713.png

 

Query all persons and show content of images (fieldtypeimage) and repeater (fieldtyperepeater):

5ad6614ca8f4b_2018-04-1723_03_51-EditPage_mariarockfinder_test.png.a0e039185fdcd88dded83f855406a541.png

 

Same example but with image descriptions and all repeateritem's fields:

5ad6635c2314f_2018-04-1723_12_34-EditPage_mariarockfinder_test.png.69d0ccb330480ca285908b1688070dc1.png

 

Find 35.000 pages: This takes significantly longer than finds until 10.000 items. Maybe @adrian or @theo you have time to compare these results on your setups?

5ad6649017723_2018-04-1723_17_35-EditPage_mariarockfinder_test.png.3cf0b6ca027a4f4dc6a90d75b4c77a7b.png

 

Todos:

  • Support for FieldtypePage (and other fieldtypes?)
  • Support for Multilanguage (should not be hard, I've developed it with that in mind)
  • Aggregation functions like sum(), avg(), min(), max()
  • Support for closures?

 

What do you guys think? :) 

  • Like 6

Share this post


Link to post
Share on other sites

@bernhard I'm sure it's great what you are doing.

Unfortunately, I have no time for participating or testing and I don't have a setup with 10 000+ pages atm.

But keep up the good work!

Share this post


Link to post
Share on other sites

Another performance test using findMany()

5ad6fe58e9c45_2018-04-1809_53_43-EditPage_Homerockfinder_test.png.5faa6978ae24d438ea3e19af80f9d6d0.png

So this looks like findMany() is a lot faster, but this is not true because creating the proper array of data takes longer than with RockFinder:

 

$selector = 'parent=/data';

$finder = new RockFinder($selector, ['title', 'headline', 'summary']);
t();
$result = $finder->getObjects();
d($rf = t()*1000, 'query time in ms (rockfinder)');
d(count($result), 'items');
d($result[0], 'first item');

t();
$result = $pages->findMany($selector); //$finder->getObjects();
d($fm = t()*1000, 'query time in ms (findmany)');
d($count = count($result), 'items');
d($result[0], 'first item');

t();
$arr = [];
foreach($result as $p) {
    $arr[] = (object)[
        'id' => $p->id,
        'title' => $p->title,
        'headline' => $p->headline,
        'summary' => $p->summary,
    ];
}
d($fm2 = t()*1000, 'create array');
d($arr[0]);

d("$count items: rockfinder = " . round($rf,2) . "ms | findmany = " . round($fm+$fm2,2) . "ms | " . round($rf/($fm+$fm2)*100, 2) . "%");

Result:

Quote

35000 items: rockfinder = 7229.4ms | findmany = 29767.4ms | 24.29%

Some other tests:

$selector = 'parent=/persons';
// 11 items: rockfinder = 3.8ms | findmany = 7ms | 54.29%

$selector = 'parent=/dogs';
// 1000 items: rockfinder = 41ms | findmany = 722.1ms | 5.68%

$selector = 'parent=/cats';
// 5000 items: rockfinder = 221.4ms | findmany = 1660.8ms | 13.33%

$selector = 'parent=/invoices';
// 10002 items: rockfinder = 526.6ms | findmany = 3385.3ms | 15.56%

$selector = 'parent=/data';
// 35000 items: rockfinder = 7161.4ms | findmany = 27722.9ms | 25.83%

$selector = 'parent=/data2';
// 91300 items: rockfinder = 59523.6ms | findmany = 76495.8ms | 77.81%

What is very interesting (and not good), is that the time needed for RockFinder increases disproportionately when querying > 10.000 pages; 10.000 items = 500ms, but 3x10.000 pages = 7.000ms :( Maybe any sql experts have an idea?

5ad710ecba23c_2018-04-1811_33_12-Mappe1-Excel.png.cb999f63bee2ba59af5e051d3f166156.png

Share this post


Link to post
Share on other sites

Sorry guys for all those posts... Found the performance-killer: It is the ORDER BY field(`pages`.`id`, 52066,52067,52068,52069,52070 ... ) part.

Without retaining the sort order of the pages->findIDs it is a LOT faster (4s without sort compared to 60s with sort and 75 using findMany):

91300 items: rockfinder = 4385.5ms | findmany = 74213.9ms | 5.91%

I'll add this as an additional option and switch sort order OFF by default since sorting will be done by RockGrid anyhow :)

1 hour ago, theo said:

Unfortunately, I have no time for participating or testing and I don't have a setup with 10 000+ pages atm.

No problem at all. I need this stuff for my own work, so any help is welcome but of course not expected :)

 

PS: again the tests without sort order

11 items: rockfinder = 4ms | findmany = 6.9ms | 57.97%
1000 items: rockfinder = 35.7ms | findmany = 744.2ms | 4.8%
5000 items: rockfinder = 165ms | findmany = 1675.4ms | 9.85%
10002 items: rockfinder = 327ms | findmany = 3359.5ms | 9.73%
35000 items: rockfinder = 1745.2ms | findmany = 28547.7ms | 6.11%
91300 items: rockfinder = 4385.5ms | findmany = 74213.9ms | 5.91%

5ad718176258e_2018-04-1812_03_48-Mappe1-Excel.png.9f65438c43ac618e38f79469ce1c9acd.png

Now that looks a lot better, doesn't it? O0

  • Like 4
  • Thanks 1

Share this post


Link to post
Share on other sites

@bernhard Glad you found the bottleneck.

But I'd leave sorting on by default and make an option to turn it off ( IIUC ).

  • Like 1

Share this post


Link to post
Share on other sites
5 hours ago, theo said:

But I'd leave sorting on by default and make an option to turn it off ( IIUC ).

JS does the sorting client side, so I guess it is better not to sort it twice...

Share this post


Link to post
Share on other sites
1 minute ago, szabesz said:

JS does the sorting client side, so I guess it is better not to sort it twice...

Sort order in database can affect the results returned. So, unless you are 'finding' everything, it is good to have the option to sort server-side.

  • Like 3

Share this post


Link to post
Share on other sites

The "option" to sort on the server side is of course always there. But I set it to use sorting by default now. I think the worse performance will not matter in most of the cases. If it does, the user will realise it and can easily turn it off. It would be worse to think everything works as expected and then get some unexpected results due to sorting turned off while you think it should actually return a sorted resultset ( like from a selector containing the sort keyword ).

Thanks for your feedback.

  • Like 2

Share this post


Link to post
Share on other sites

I just took a look at the number of pages I got into the system, there is 407446 pages found, expecting around 1.2M pages at the end of the year. I didn't got the time to test your module again deeper but it look promising.

The only thing I am scratching my head in, its the calculation of multiple fields on 100K+ pages at one time... Did you tried something already ? Actually, doing some calculation on 119k pages on one go take about 19seconds and rendering a chart from this number of pages take about two or three minutes! 🙈

 

  • Like 1

Share this post


Link to post
Share on other sites
35 minutes ago, flydev said:

The only thing I am scratching my head in, its the calculation of multiple fields on 100K+ pages at one time... Did you tried something already ? Actually, doing some calculation on 119k pages on one go take about 19seconds and rendering a chart from this number of pages take about two or three minutes! 🙈

Can you be more precise please? What kind of calculations are you talking about? I'll try an answer with a guess ;)

 

This is a quick test i did with 10.000 fake invoices. The result is great (but see later). Here the setup for the Tracy Console:

$selector = 'parent=/invoices, limit=2'; // the limit is for demo of the sql
$finder = new RockFinder($selector, ['value', 'date']);
$finder->sort = false;
$sql = $finder->getSQL();
d($sql, [6,999]);

This finds all invoices and returns their "date" (a datetime field) and their "value" (random number between 1 and 100). The resulting SQL is this:

SELECT
  `pages`.`id` AS `id`,
  `value`.`value` AS `value`,
  `date`.`date` AS `date`
FROM
  `pages`

/* --- join value --- */
LEFT JOIN (SELECT
  `pages_id` AS `pageid`,
  `value`.`data` AS `value`
FROM `field_value` AS `value`) AS `value` ON `value`.`pageid` = `pages`.`id`
/* --- end value --- */


/* --- join date --- */
LEFT JOIN (SELECT
  `pages_id` AS `pageid`,
  `date`.`data` AS `date`
FROM `field_date` AS `date`) AS `date` ON `date`.`pageid` = `pages`.`id`
/* --- end date --- */

WHERE
  `pages`.`id` IN (42063,42064)

 

Now the funny part: You can do all kinds of custom SQL with that returned SQL statement, for example grouping the invoices by month and building a sum of their values:

$results = $this->database->query("SELECT sum(value) as monthsum, DATE_FORMAT(date, '%Y-%m') as date FROM ($sql) AS rockfinder group by date");

And here the absolutely awesome result:

5ad8c107da2d2_2018-04-1918_16_29-EditPage_5ad5fcc0a875crockfinder_test.thumb.png.d8c5efb97dc6a0f82cea120f1c51a8c7.png

 

So you get aggregated data for 12 months in no-time (166ms!!) with some VERY basic SQL. I'll think about how this can be made even easier in the future. Maybe by providing a method that does the query and returns the result. I'm open to suggestions.

The maybe greatest thing about that is that you do not need to take care about processwire page statuses. That has always been a real pain when working with SQL because it makes things so complex. You always need to think about adding WHERE pages.status < ... and so on. That sucks. Now you just build your custom "view" with RockFinder and do some simple SQL on that result.

You get the idea?

Edit: I put $finder->getSQL() inside the timer, now it took 280ms - still great IMHO. And see how easy it is to add an invoice-count for that month for example O0 

5ad8c33378c90_2018-04-1918_25_31-EditPage_5ad5fcc0a875crockfinder_test.thumb.png.11d3f0bb83c8a0f208bb99eb015aeb7b.png

  • Like 3

Share this post


Link to post
Share on other sites
Posted (edited)

Using the module from the dev2 branch, I made a test based on your example but on my side it return an error.

$selector = "template=transactions,date>=1522620000,date<=1523051999"; 
$finder = new RockFinder($selector, ['montant_especes', 'date']);
$finder->sort = false;
$sql = $finder->getSQL();
$results = $this->database->query("SELECT sum(montant_especes) as especes FROM ($sql) AS rockfinder");

 

Quote

Exception: SQLSTATE[42S22]: Column not found: 1054 Unknown column 'field_montant_especes.data' in 'field list' on line: 306 in /www/sites/kp/wwwroot/wire/core/WireDatabasePDO.php

 

The field exist in the database :

montant.png.b8f0d7f33903c1afad31376c0bbf1c3c.png

 

PS: copy/pasting the query returned by Finder in PHPMyAdmin throw the same error.

 

Also, the query returned look weird, check the where clause :

SELECT
  `field_montant_especes`.`data`,
  `field_date`.`data`
FROM (
  SELECT 
    `field_montant_especes`.`data` AS `montant_especes`,
    `field_date`.`data` AS `date`
  FROM `pages`
  LEFT JOIN `field_montant_especes` as `field_montant_especes` on `field_montant_especes`.`pages_id` = `pages`.`id`
  LEFT JOIN `field_date` as `field_date` on `field_date`.`pages_id` = `pages`.`id`
) AS ``
WHERE
  ``.`id` IN (...)

 

Edited by flydev
sql

Share this post


Link to post
Share on other sites

Okay, I tried it yesterday, but got this error on the dev3 branch:

Exception: Class '\ProcessWire\RockFinderFieldText' not found on line: 58 in /www/sites/kp/wwwroot/site/modules/RockSqlFinder/RockFinder.module.php

 

Share this post


Link to post
Share on other sites

Sorry, please try to rename your module's folder to "RockFinder" instead of "RockSqlFinder"

  • Like 1
  • Thanks 1

Share this post


Link to post
Share on other sites

I've deleted the RockSqlFinder repo on gitlab and created a RockFinder repo. This will be the final name once I'm done with testing + writing docs.

I've also added a simple tester process module to quickly inspect and test find operations:

screencapture-360-test-admin-setup-rockfindertester-2018-04-22-21_22_41.thumb.png.c33e8f35116b1e7b385441464adb2a38.png

Note that this processmodule requires RockGrid which is not available for public yet, so this screenshot is just a sneak peak what will come.

  • Like 5

Share this post


Link to post
Share on other sites

Just added support for closures: This makes it very easy to use the PW api for each row but also makes it a lot more imperformant:

screencapture-360-test-admin-setup-rockfindertester-2018-04-23-18_08_04.thumb.png.5c36b4360d05ddefa6d5139f27d20d22.png

As you can see in the SQL it is done by adding an empty column to the sql statement and then looping all rows and executing the closure.

Edit:

Just pushed an update that removes one unnecessary $pages->findIDs() call.

Compare sort disabled with closures:

15.3ms	findIDs	id>0, limit=1000
15.3ms	getSQL	
6847.7ms	executeClosures	
6868.4ms	getObjects	Includes executeClosures
6869.7ms	Overall Inputfield Render	

Sort enabled with closures:

15.2ms	findIDs	id>0, limit=1000
15.3ms	getSQL	
6924.6ms	executeClosures	
6949.9ms	getObjects	Includes executeClosures
6951.1ms	Overall Inputfield Render	

Sort enabled without closures (11.400 rows):

71.9ms	findIDs	id>0, limit=0
71.9ms	getSQL	
0ms	executeClosures	
374.4ms	getObjects	Includes executeClosures
378ms	Overall Inputfield Render	

Sort disabled without closures (11.400 rows):

74.4ms	findIDs	id>0, limit=0
74.5ms	getSQL	
0ms	executeClosures	
126.9ms	getObjects	Includes executeClosures
130.5ms	Overall Inputfield Render	

 

  • Like 1

Share this post


Link to post
Share on other sites

I would like to take a look at all this, but I don't have a real site with thousands of pages.

I know about all the various ways I could create + populate PW pages with some dummy data. But do you think you could provide us some sort of data set (site-profile?) that actually has so many entries it's even worth playing around with it? I can imagine having such a quick setup would greatly facilitate further improvements / tests.

 

  • Like 1

Share this post


Link to post
Share on other sites
On 4/23/2018 at 7:15 PM, dragan said:

I would like to take a look at all this, but I don't have a real site with thousands of pages.

I know about all the various ways I could create + populate PW pages with some dummy data. But do you think you could provide us some sort of data set (site-profile?) that actually has so many entries it's even worth playing around with it? I can imagine having such a quick setup would greatly facilitate further improvements / tests.

 

https://transfer.sh/brOON/site-RockFinderTest.zip (will be deleted in 14 days)

  • Thanks 1

Share this post


Link to post
Share on other sites

Thanks!

There goes the weekend...

  • Haha 2

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.