bernhard Posted April 10, 2018 Share Posted April 10, 2018 Please see the current version of this module here: https://modules.processwire.com/modules/rock-finder/ 18 Link to comment Share on other sites More sharing options...
dragan Posted April 10, 2018 Share Posted April 10, 2018 (edited) I am not sure if I would ever have a practical use for this myself (at least not in the foreseeable future), but I must say it's a pleasure to see this project seeing the light of day. Kudos. For me, it's always a treat to see what people come up with using PW. Goes to show you never can tell. PW really is "not just another CMS", but a framework as well. Hope I find some time taking a test-ride with your module very soon... Edited April 10, 2018 by dragan slight wording edit 2 Link to comment Share on other sites More sharing options...
bernhard Posted April 10, 2018 Author Share Posted April 10, 2018 7 minutes ago, dragan said: I am not sure if I would ever have a practical use for this myself (at least not in the foreseeable future), but I must say it's a pleasure to see this project seeing the light of day. I'm quite sure you will Actually it's a VERY important part of my RockGrid module. This makes building Grids (or Datatables, or "Listers" how we in the PW world would call it) a breeze. You just throw a $pages->findObjects() call to the grid and let it do the rest (sorting, filtering, pagination etc). Another usecase where this can really be a lifesaver is all kinds of data exports, feed generation etc. - It makes a huge difference to load 10.000 pw pages compared to executing an SQL and a quick php foreach. I'm working on the module again because it was too limited for my needs. I think I will come up with a good solution the next days! (And proper docs) 2 Link to comment Share on other sites More sharing options...
flydev Posted April 11, 2018 Share Posted April 11, 2018 Thanks for this module @bernhard , I am going to test it heavily today. In the project I built, similar to what you showed us, we are calling 30,000 pages to 100,000 pages per iteration (17 atm) and doing heavy aggregation on it to generate statistics and chart - a pain, hopefully fixed with your module. At this moment I am using this module quite modified for my need which add custom functions to the Page object by hooking it. I think that today I will re-write my hooks using your module. Feedback coming. Thanks again for the work and congratulation ? 1 Link to comment Share on other sites More sharing options...
bernhard Posted April 11, 2018 Author Share Posted April 11, 2018 @flydev that's exactly what it is built for. I think you'll love it combined with RockGrid I worked on it quite heavily yesterday because at the moment it is not possible to query referenced page fields easily. See this example: "account" is the bank account where the transaction relates to. "account:title" should show "MyBankAccount 1" and "MyBankAccount 2" but it shows the "title" of the transaction (uniqid). I'm tackling this today, so if you have some time to wait, that might make sense. Or you might come up with a good solution for that? I thought of adding dot-notation, like "account.title.status" showing the account id, the account title and its pw-status. When I tried that, I realized that it might be better (or the only way) to JOIN all the field's tables so that they are available to query for the related information afterwards. This is not possible with the current subquery setup. The other option I will check today is to make easy JOINS possible. This might be easier. Something like this: $pages->findObjects( ['template=transaction', ['id', 'title', 'account', 'images:description'], [ 'name' => 'trans', 'joins' => ['template=account', ['id', 'title', 'status'], ['name' => 'acc', 'on' => 'acc.id = trans.account']], 'concat' => ['account', 'images'], ]] ); What do you think? 1 Link to comment Share on other sites More sharing options...
bernhard Posted April 13, 2018 Author Share Posted April 13, 2018 Phew... This was quite hard but finally I made progress. Please see the dev branch if you want to play around @flydev : https://gitlab.com/baumrock/RockSqlFinder/tree/dev#notes (other parts of the readme are outdated meanwhile, It's just some notes for myself and old examples..., sorry no more time today). This is an example query with a join. This is necessary for pagefields when you want to get not only the page id's but also the related page's fields information like "title": Next step is to make CONCAT operations possible so that the join will return a concatenated string of ids and titles (miau, cat ..., mimimimi). This should also make aggregations like sum() possible (see here: https://gitlab.com/baumrock/RockSqlFinder/tree/dev#notes -> not much change needed for the resulting sql). The new version of the module is completely recursive, so I hope (haha...) that support for image fields with description (in this case it's not a related table but on the same table like the imagefield) easy. We will see... PS: Actually I just realized that defining the field "cats" is stupid when we define a join named 'cats'. Because I'm removing this field later anyhow... need to change this. You see there will be some more changes in the future, sorry... 4 Link to comment Share on other sites More sharing options...
bernhard Posted April 17, 2018 Author Share Posted April 17, 2018 Another update is online (branch DEV3): https://gitlab.com/baumrock/RockSqlFinder/tree/dev3 This is a major upgrade, a complete rewrite and finally a really good solution imho: I changed the syntax to object-notation in preference of nested associative arrays and it's a lot cleaner and easier to setup and read now. I rewrote the logic completely. It is NOT recursive any more but it takes away a LOT of complexity. You can add "fields" to your find-operation and those "fields" are fully customizable classes. The module ships with predefined classes for FieldtypeText (easy as it is just a column in the DB), FieldtypeFile (that shows concatenated filenames and descriptions) and FieldtypeRepeater (that shows concatenated data of the repeateritem's fields). The user can add custom FieldFinders in /site/assets/RockFinder (this will be the final name of the module). Possibility to add custom separators for concatenated fields Easy to create custom SQL by wrapping the created SQL in your own statement (SELECT ... FROM $finder->getSQL() GROUP BY ...) Examples: Query 10.000 invoices and return their date and value: Query all persons and show content of images (fieldtypeimage) and repeater (fieldtyperepeater): Same example but with image descriptions and all repeateritem's fields: Find 35.000 pages: This takes significantly longer than finds until 10.000 items. Maybe @adrian or @theo you have time to compare these results on your setups? Todos: Support for FieldtypePage (and other fieldtypes?) Support for Multilanguage (should not be hard, I've developed it with that in mind) Aggregation functions like sum(), avg(), min(), max() Support for closures? What do you guys think? 6 Link to comment Share on other sites More sharing options...
theo Posted April 18, 2018 Share Posted April 18, 2018 @bernhard I'm sure it's great what you are doing. Unfortunately, I have no time for participating or testing and I don't have a setup with 10 000+ pages atm. But keep up the good work! Link to comment Share on other sites More sharing options...
bernhard Posted April 18, 2018 Author Share Posted April 18, 2018 Another performance test using findMany() So this looks like findMany() is a lot faster, but this is not true because creating the proper array of data takes longer than with RockFinder: $selector = 'parent=/data'; $finder = new RockFinder($selector, ['title', 'headline', 'summary']); t(); $result = $finder->getObjects(); d($rf = t()*1000, 'query time in ms (rockfinder)'); d(count($result), 'items'); d($result[0], 'first item'); t(); $result = $pages->findMany($selector); //$finder->getObjects(); d($fm = t()*1000, 'query time in ms (findmany)'); d($count = count($result), 'items'); d($result[0], 'first item'); t(); $arr = []; foreach($result as $p) { $arr[] = (object)[ 'id' => $p->id, 'title' => $p->title, 'headline' => $p->headline, 'summary' => $p->summary, ]; } d($fm2 = t()*1000, 'create array'); d($arr[0]); d("$count items: rockfinder = " . round($rf,2) . "ms | findmany = " . round($fm+$fm2,2) . "ms | " . round($rf/($fm+$fm2)*100, 2) . "%"); Result: Quote 35000 items: rockfinder = 7229.4ms | findmany = 29767.4ms | 24.29% Some other tests: $selector = 'parent=/persons'; // 11 items: rockfinder = 3.8ms | findmany = 7ms | 54.29% $selector = 'parent=/dogs'; // 1000 items: rockfinder = 41ms | findmany = 722.1ms | 5.68% $selector = 'parent=/cats'; // 5000 items: rockfinder = 221.4ms | findmany = 1660.8ms | 13.33% $selector = 'parent=/invoices'; // 10002 items: rockfinder = 526.6ms | findmany = 3385.3ms | 15.56% $selector = 'parent=/data'; // 35000 items: rockfinder = 7161.4ms | findmany = 27722.9ms | 25.83% $selector = 'parent=/data2'; // 91300 items: rockfinder = 59523.6ms | findmany = 76495.8ms | 77.81% What is very interesting (and not good), is that the time needed for RockFinder increases disproportionately when querying > 10.000 pages; 10.000 items = 500ms, but 3x10.000 pages = 7.000ms Maybe any sql experts have an idea? 1 Link to comment Share on other sites More sharing options...
bernhard Posted April 18, 2018 Author Share Posted April 18, 2018 Sorry guys for all those posts... Found the performance-killer: It is the ORDER BY field(`pages`.`id`, 52066,52067,52068,52069,52070 ... ) part. Without retaining the sort order of the pages->findIDs it is a LOT faster (4s without sort compared to 60s with sort and 75 using findMany): 91300 items: rockfinder = 4385.5ms | findmany = 74213.9ms | 5.91% I'll add this as an additional option and switch sort order OFF by default since sorting will be done by RockGrid anyhow 1 hour ago, theo said: Unfortunately, I have no time for participating or testing and I don't have a setup with 10 000+ pages atm. No problem at all. I need this stuff for my own work, so any help is welcome but of course not expected PS: again the tests without sort order 11 items: rockfinder = 4ms | findmany = 6.9ms | 57.97% 1000 items: rockfinder = 35.7ms | findmany = 744.2ms | 4.8% 5000 items: rockfinder = 165ms | findmany = 1675.4ms | 9.85% 10002 items: rockfinder = 327ms | findmany = 3359.5ms | 9.73% 35000 items: rockfinder = 1745.2ms | findmany = 28547.7ms | 6.11% 91300 items: rockfinder = 4385.5ms | findmany = 74213.9ms | 5.91% Now that looks a lot better, doesn't it? 5 1 Link to comment Share on other sites More sharing options...
theo Posted April 18, 2018 Share Posted April 18, 2018 @bernhard Glad you found the bottleneck. But I'd leave sorting on by default and make an option to turn it off ( IIUC ). 1 Link to comment Share on other sites More sharing options...
szabesz Posted April 18, 2018 Share Posted April 18, 2018 5 hours ago, theo said: But I'd leave sorting on by default and make an option to turn it off ( IIUC ). JS does the sorting client side, so I guess it is better not to sort it twice... Link to comment Share on other sites More sharing options...
kongondo Posted April 18, 2018 Share Posted April 18, 2018 1 minute ago, szabesz said: JS does the sorting client side, so I guess it is better not to sort it twice... Sort order in database can affect the results returned. So, unless you are 'finding' everything, it is good to have the option to sort server-side. 3 Link to comment Share on other sites More sharing options...
bernhard Posted April 18, 2018 Author Share Posted April 18, 2018 The "option" to sort on the server side is of course always there. But I set it to use sorting by default now. I think the worse performance will not matter in most of the cases. If it does, the user will realise it and can easily turn it off. It would be worse to think everything works as expected and then get some unexpected results due to sorting turned off while you think it should actually return a sorted resultset ( like from a selector containing the sort keyword ). Thanks for your feedback. 2 Link to comment Share on other sites More sharing options...
flydev Posted April 19, 2018 Share Posted April 19, 2018 I just took a look at the number of pages I got into the system, there is 407446 pages found, expecting around 1.2M pages at the end of the year. I didn't got the time to test your module again deeper but it look promising. The only thing I am scratching my head in, its the calculation of multiple fields on 100K+ pages at one time... Did you tried something already ? Actually, doing some calculation on 119k pages on one go take about 19seconds and rendering a chart from this number of pages take about two or three minutes! ? 1 Link to comment Share on other sites More sharing options...
bernhard Posted April 19, 2018 Author Share Posted April 19, 2018 35 minutes ago, flydev said: The only thing I am scratching my head in, its the calculation of multiple fields on 100K+ pages at one time... Did you tried something already ? Actually, doing some calculation on 119k pages on one go take about 19seconds and rendering a chart from this number of pages take about two or three minutes! ? Can you be more precise please? What kind of calculations are you talking about? I'll try an answer with a guess This is a quick test i did with 10.000 fake invoices. The result is great (but see later). Here the setup for the Tracy Console: $selector = 'parent=/invoices, limit=2'; // the limit is for demo of the sql $finder = new RockFinder($selector, ['value', 'date']); $finder->sort = false; $sql = $finder->getSQL(); d($sql, [6,999]); This finds all invoices and returns their "date" (a datetime field) and their "value" (random number between 1 and 100). The resulting SQL is this: SELECT `pages`.`id` AS `id`, `value`.`value` AS `value`, `date`.`date` AS `date` FROM `pages` /* --- join value --- */ LEFT JOIN (SELECT `pages_id` AS `pageid`, `value`.`data` AS `value` FROM `field_value` AS `value`) AS `value` ON `value`.`pageid` = `pages`.`id` /* --- end value --- */ /* --- join date --- */ LEFT JOIN (SELECT `pages_id` AS `pageid`, `date`.`data` AS `date` FROM `field_date` AS `date`) AS `date` ON `date`.`pageid` = `pages`.`id` /* --- end date --- */ WHERE `pages`.`id` IN (42063,42064) Now the funny part: You can do all kinds of custom SQL with that returned SQL statement, for example grouping the invoices by month and building a sum of their values: $results = $this->database->query("SELECT sum(value) as monthsum, DATE_FORMAT(date, '%Y-%m') as date FROM ($sql) AS rockfinder group by date"); And here the absolutely awesome result: So you get aggregated data for 12 months in no-time (166ms!!) with some VERY basic SQL. I'll think about how this can be made even easier in the future. Maybe by providing a method that does the query and returns the result. I'm open to suggestions. The maybe greatest thing about that is that you do not need to take care about processwire page statuses. That has always been a real pain when working with SQL because it makes things so complex. You always need to think about adding WHERE pages.status < ... and so on. That sucks. Now you just build your custom "view" with RockFinder and do some simple SQL on that result. You get the idea? Edit: I put $finder->getSQL() inside the timer, now it took 280ms - still great IMHO. And see how easy it is to add an invoice-count for that month for example 3 Link to comment Share on other sites More sharing options...
flydev Posted April 20, 2018 Share Posted April 20, 2018 (edited) Using the module from the dev2 branch, I made a test based on your example but on my side it return an error. $selector = "template=transactions,date>=1522620000,date<=1523051999"; $finder = new RockFinder($selector, ['montant_especes', 'date']); $finder->sort = false; $sql = $finder->getSQL(); $results = $this->database->query("SELECT sum(montant_especes) as especes FROM ($sql) AS rockfinder"); Quote Exception: SQLSTATE[42S22]: Column not found: 1054 Unknown column 'field_montant_especes.data' in 'field list' on line: 306 in /www/sites/kp/wwwroot/wire/core/WireDatabasePDO.php The field exist in the database : PS: copy/pasting the query returned by Finder in PHPMyAdmin throw the same error. Also, the query returned look weird, check the where clause : SELECT `field_montant_especes`.`data`, `field_date`.`data` FROM ( SELECT `field_montant_especes`.`data` AS `montant_especes`, `field_date`.`data` AS `date` FROM `pages` LEFT JOIN `field_montant_especes` as `field_montant_especes` on `field_montant_especes`.`pages_id` = `pages`.`id` LEFT JOIN `field_date` as `field_date` on `field_date`.`pages_id` = `pages`.`id` ) AS `` WHERE ``.`id` IN (...) Edited April 20, 2018 by flydev sql Link to comment Share on other sites More sharing options...
bernhard Posted April 20, 2018 Author Share Posted April 20, 2018 @flydev please use dev3, it's a complete rewrite! 2 Link to comment Share on other sites More sharing options...
flydev Posted April 20, 2018 Share Posted April 20, 2018 Okay, I tried it yesterday, but got this error on the dev3 branch: Exception: Class '\ProcessWire\RockFinderFieldText' not found on line: 58 in /www/sites/kp/wwwroot/site/modules/RockSqlFinder/RockFinder.module.php Link to comment Share on other sites More sharing options...
bernhard Posted April 20, 2018 Author Share Posted April 20, 2018 Sorry, please try to rename your module's folder to "RockFinder" instead of "RockSqlFinder" 1 1 Link to comment Share on other sites More sharing options...
bernhard Posted April 22, 2018 Author Share Posted April 22, 2018 I've deleted the RockSqlFinder repo on gitlab and created a RockFinder repo. This will be the final name once I'm done with testing + writing docs. I've also added a simple tester process module to quickly inspect and test find operations: Note that this processmodule requires RockGrid which is not available for public yet, so this screenshot is just a sneak peak what will come. 5 Link to comment Share on other sites More sharing options...
bernhard Posted April 23, 2018 Author Share Posted April 23, 2018 Just added support for closures: This makes it very easy to use the PW api for each row but also makes it a lot more imperformant: As you can see in the SQL it is done by adding an empty column to the sql statement and then looping all rows and executing the closure. Edit: Just pushed an update that removes one unnecessary $pages->findIDs() call. Compare sort disabled with closures: 15.3ms findIDs id>0, limit=1000 15.3ms getSQL 6847.7ms executeClosures 6868.4ms getObjects Includes executeClosures 6869.7ms Overall Inputfield Render Sort enabled with closures: 15.2ms findIDs id>0, limit=1000 15.3ms getSQL 6924.6ms executeClosures 6949.9ms getObjects Includes executeClosures 6951.1ms Overall Inputfield Render Sort enabled without closures (11.400 rows): 71.9ms findIDs id>0, limit=0 71.9ms getSQL 0ms executeClosures 374.4ms getObjects Includes executeClosures 378ms Overall Inputfield Render Sort disabled without closures (11.400 rows): 74.4ms findIDs id>0, limit=0 74.5ms getSQL 0ms executeClosures 126.9ms getObjects Includes executeClosures 130.5ms Overall Inputfield Render 1 Link to comment Share on other sites More sharing options...
dragan Posted April 23, 2018 Share Posted April 23, 2018 I would like to take a look at all this, but I don't have a real site with thousands of pages. I know about all the various ways I could create + populate PW pages with some dummy data. But do you think you could provide us some sort of data set (site-profile?) that actually has so many entries it's even worth playing around with it? I can imagine having such a quick setup would greatly facilitate further improvements / tests. 1 Link to comment Share on other sites More sharing options...
bernhard Posted May 5, 2018 Author Share Posted May 5, 2018 On 4/23/2018 at 7:15 PM, dragan said: I would like to take a look at all this, but I don't have a real site with thousands of pages. I know about all the various ways I could create + populate PW pages with some dummy data. But do you think you could provide us some sort of data set (site-profile?) that actually has so many entries it's even worth playing around with it? I can imagine having such a quick setup would greatly facilitate further improvements / tests. https://transfer.sh/brOON/site-RockFinderTest.zip (will be deleted in 14 days) 1 Link to comment Share on other sites More sharing options...
dragan Posted May 5, 2018 Share Posted May 5, 2018 Thanks! There goes the weekend... 2 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now