bernhard Posted December 14, 2017 Share Posted December 14, 2017 I'm still looking for the best solution for my datatables module and I think there is no way around using direct sql queries... That makes it more complicated to setup but saves a lot of other troubles like proper caching, cache maintenance etc... Unfortunately such sql queries can get quite complex unless I'm missing any pw magic that makes it easier to do? This is what i have so far to query all pages of type "basic-page": select id, title.data as title, headline.data as headline, body.data as body, (select group_concat(data separator ',') from field_test_page where pages_id = p.id) as test_page, test_repeater.data as test_repeater from pages as p left join field_title as title on title.pages_id = p.id left join field_headline as headline on headline.pages_id = p.id left join field_body as body on body.pages_id = p.id left join field_test_repeater as test_repeater on test_repeater.pages_id = p.id where p.templates_id = 29 the repeater is a little tricky for example. it gets even worse when you have to handle multilanguage fields or you want references to other pages' field values... edit: maybe better like this? select id, (select data from field_title where pages_id = p.id) as title, (select data from field_field1 where pages_id = p.id) as field1, (select data11041 from field_field1 where pages_id = p.id) as field1_de, parent_id, (select data from field_title where pages_id = p.parent_id) as parent_title from pages as p where p.templates_id = 44 thanks for any hints 1 Link to comment Share on other sites More sharing options...
dragan Posted December 15, 2017 Share Posted December 15, 2017 Not sure about repeaters, but to find all pages that use template basic-page, you could do this: SET @tid = (SELECT id FROM templates WHERE name='basic-page'); SELECT DISTINCT p.id, p.name FROM pages AS p, templates as t WHERE p.templates_id = @tid or this (include page titles) SET @tid = (SELECT id FROM templates WHERE name='basic-page'); SELECT DISTINCT p.id, p.name, ft.data AS title FROM pages AS p, templates AS t, field_title AS ft WHERE p.templates_id = @tid AND (p.id = ft.pages_id) 2 Link to comment Share on other sites More sharing options...
bernhard Posted December 15, 2017 Author Share Posted December 15, 2017 thanks for your suggestion dragan! i modified your example a little bit. it's also clear and easy to extend with other fields... but still i find this one easier to read and maintain (having one line for each field vs. in your example (and my previous tries) it's 3 lines for each field): select id, (select data from field_title where pages_id = p.id) as title, (select data from field_field1 where pages_id = p.id) as field1, (select data11041 from field_field1 where pages_id = p.id) as field1_de, parent_id, (select data from field_title where pages_id = p.parent_id) as parent_title, (select group_concat(data separator ',') from field_test_page where pages_id = p.id) as test_page from pages as p where p.templates_id = 44 i did some tests and it seems that both have the same performance... 65ms for a query over all 10.000 rows 1 hour ago, dragan said: Not sure about repeaters See "test_page" in the example above - also not that hard to get. Only problem could be special characters and quotes - not sure about that... but i think in that situations it would be the best to add a field that gets populated via a save hook then you could populate whatever value you want easily via PHP. Link to comment Share on other sites More sharing options...
kongondo Posted December 15, 2017 Share Posted December 15, 2017 16 hours ago, bernhard said: I think there is no way around using direct sql queries.. Why is this the case? 2 Link to comment Share on other sites More sharing options...
bernhard Posted December 15, 2017 Author Share Posted December 15, 2017 because it's a LOT more performant. constructing my datastring via pages->find() and a foreach takes 16 seconds for 10.000 rows and 5 columns (needing some extra seconds for every column added) whereas querying the database directly needs only some milliseconds. my other idea was to cache the table rows on the dedicated templates but that leads to problems when you have a "parent" or "category" column because then it would take several seconds to recreate the cache of the table when the name of the category changes (updating up to thousands of rows' cache). It also leads to a lot of redundant data. All of that problems are solved when the DB is queried directly 2 Link to comment Share on other sites More sharing options...
kongondo Posted December 15, 2017 Share Posted December 15, 2017 (edited) 56 minutes ago, bernhard said: because it's a LOT more performant. constructing my datastring via pages->find() and a foreach takes 16 seconds for 10.000 rows and 5 columns (needing some extra seconds for every column added) whereas querying the database directly needs only some milliseconds. OK, I see. Maybe you've already discussed this elsewhere. If that's the case, apologies. Why would you need to query 10,000 rows at the same time? I suppose because you want to cache the data to make subsequent fetching faster. However, wouldn't pagination work better in this case (if you were to use a $pages->find() with limit)? You could even combine that with incremental caching, both server and client-side if you wanted. If you don't want pagination, you also have virtual pagination or infinite scrolling. I am just thinking out loud here. I haven't tested your datatables module yet. If I could avoid writing SQL queries myself, I'd go for it . Edit: I see you've already discussed this in your other module's thread. So, we better continue there. Edited December 15, 2017 by kongondo 2 Link to comment Share on other sites More sharing options...
bernhard Posted December 15, 2017 Author Share Posted December 15, 2017 thanks kongondo, all valid points. the reason why i want to have all date loaded at once is because it makes the development of my module a lot easier and has some huge benefits over loading data paginated via ajax. when i have all the data available at the client i can use datatables' api to filter, sort, query, draw charts etc.; that would be very hard to achieve using server side techniques. i was also trying to avoid direct sql queries, that's why i took other approaches for my first two versions of the module. but it turnes out that all of that approaches have some really big drawbacks. actually building the queries via SQL manually is not as difficult as i thought (the genious api and the easy page->find() operations was one of the main reasons i fell in love with processwire). and i have some nice ideas how to make it even more comfortable and easy. I'm quite sure it will be really easy to use for everybody 1 Link to comment Share on other sites More sharing options...
bernhard Posted December 17, 2017 Author Share Posted December 17, 2017 after seeing @adrian also used joins in this example (https://processwire.com/talk/topic/15524-preview-rockdatatables/?do=findComment&comment=158104) I did a quick google and it seems that there is a slight performance benefit in using joins over using subqueries. When writing quieries manually (without any helping functions like in adrians example) i think it is better to use subqueries as they are a lot easier to write, read and maintain and they are less likely to return wrong results imho. see also https://stackoverflow.com/questions/2577174/join-vs-sub-query I also did a quick test on my setup: t(); $rows = []; $result = $this->db->query("select id, title.data as title, field1.data as field1 from pages as p left join field_title as title on title.pages_id = p.id left join field_field1 as field1 on field1.pages_id = p.id where p.templates_id = 44"); while($row = $result->fetch_object()) { $rows[] = $row; } d($rows); d(t(), 'joins'); $rows = []; $result = $this->db->query("select id, (select data from field_title where pages_id = p.id) as title, (select data from field_field1 where pages_id = p.id) as field1 from pages as p where p.templates_id = 44"); while($row = $result->fetch_object()) { $rows[] = $row; } d($rows); d(t(), 'subqueries'); Most of the time the joins were a little faster, but not always - so I think it should not matter regarding performance which one to use: Link to comment Share on other sites More sharing options...
adrian Posted December 17, 2017 Share Posted December 17, 2017 Personally I have just alway felt more comfortable with JOINS so I totally understand where the OP in that SO thread is coming from Don't forget about: $rows = $result->fetchAll(PDO::FETCH_ASSOC); That will save the time required to loop through all 10,000 results to populate an array Also, you should probably be using $database (PDO), rather than $db (mysqli) 2 Link to comment Share on other sites More sharing options...
bernhard Posted December 17, 2017 Author Share Posted December 17, 2017 On 17.12.2017 at 5:41 PM, adrian said: That will save the time required to loop through all 10,000 results to populate an array thx, this was only for testing On 17.12.2017 at 5:41 PM, adrian said: Also, you should probably be using $database (PDO), rather than $db (mysqli) thx, didn't know that! writing on an answer in the datatables thread - i'm working on your findArray hook 2 Link to comment Share on other sites More sharing options...
bernhard Posted January 24, 2018 Author Share Posted January 24, 2018 Ok today I hit a very strange problem regarding those mentioned sql queries. I built a little processmodule that shows the result of my queries. The queries themselves are stored in a PHP file which makes it possible to create different VIEWS (just like DB views) and include them easily via php's include(). Simple example: // file allpages.php return "SELECT * FROM pages"; // file publishedpages.php $allpages = include(__DIR__ . '/allpages.php'); return "SELECT * FROM ($allpages) AS allpages WHERE allpages.status = 1"; This seemed to work great, so I continued to create some more complex queries like the ones mentioned above: SELECT effort.id AS id ,effort.status as status ,(SELECT data FROM field_rockprojecteffort_date WHERE pages_id = effort.id) AS date ,(SELECT pages_id FROM field_rockproject_efforts WHERE data = effort.id) AS projectid FROM pages AS effort WHERE templates_id = 95 Template id 95 is for "rockprojectefforts", thats items for all kind of actions related to a project. So far, so good. Now the strange part: I want to create a combined listing of all projects and all efforts ( project, firsteffortdate (=when the project started), lasteffortdate (=when it ends), etc )... <?php $projects = include(__DIR__ . '/allprojects.php'); $efforts = include(__DIR__ . '/allefforts-slow.php'); return "SELECT * ,(SELECT `date` FROM ($efforts) as `efforts` WHERE `projectid` = `projects`.`id` LIMIT 1) as `fromdate` FROM ($projects) as `projects` WHERE `projects`.`status` = 1 "; This works, but it is terribly slow (3,6 seconds)! The included SLOW file looks like this: <?php // old non-performant query return "SELECT `effort`.`id` AS `id` ,`effort`.`status` as `status` /* data for the effort */ ,(SELECT `data` FROM `field_title` WHERE `pages_id` = `effort`.`id`) AS `title` ,(SELECT `data` FROM `field_rockprojecteffort_date` WHERE `pages_id` = `effort`.`id`) AS `date` ,(SELECT `data` FROM `field_rockprojecteffort_price` WHERE `pages_id` = `effort`.`id`) AS `price` ,(SELECT `data` FROM `field_rockprojecteffort_vat` WHERE `pages_id` = `effort`.`id`) AS `vat` ,(SELECT `data` FROM `field_rockprojecteffort_amount` WHERE `pages_id` = `effort`.`id`) AS `amount` /* data for the corresponding project */ ,(SELECT `pages_id` FROM `field_rockproject_efforts` WHERE `data` = `effort`.`id`) AS `projectid` ,(SELECT `data` FROM `field_title` WHERE `pages_id` = `projectid`) AS `projecttitle` ,(SELECT `data` FROM `field_rockproject_status` WHERE `pages_id` = `projectid`) AS `projectstatus` /* we also need the clientid to search for the first effort of a client */ ,(SELECT `data` FROM `field_rockproject_client` WHERE `pages_id` = `projectid`) AS `clientid` FROM `pages` AS `effort` WHERE `templates_id` = 95 "; Whereas this file works just fine (0,03 seconds): <?php return "SELECT pages.id as id ,pages.status as status ,date.data as date ,efforts.pages_id as projectid ,title.data as title FROM pages JOIN field_rockprojecteffort_date AS date ON date.pages_id = pages.id JOIN field_rockproject_efforts AS efforts ON efforts.data = pages.id JOIN field_title AS title ON title.pages_id = pages.id WHERE templates_id = 95 "; another strange thing is that both queries as single queries work just fine (one needs 0,07 seconds, the slower one 0,11 seconds - but I don't think that is the issue): I can live with the JOIN even though it is a little less readable and more complicated to setup, but I would really love to know what is going on and why this is the case. BTW: On the second reload everything is even quicker so I guess there is some caching in the background. I did not find any options for disabling the cache in the WireDatabasePDO class - could this be a mysql internal cache? Thanks for your time! Link to comment Share on other sites More sharing options...
flydev Posted January 24, 2018 Share Posted January 24, 2018 (edited) 48 minutes ago, bernhard said: could this be a mysql internal cache? Yes, its called MySQL Query Cache To disable it for a connection : SET SESSION query_cache_type = OFF; More details there : https://www.dbrnd.com/2015/08/mysql-query-cache/ https://www.dbrnd.com/2015/08/mysql-query-cache-configuration/ Edited January 24, 2018 by flydev links 1 1 Link to comment Share on other sites More sharing options...
FrancisChung Posted January 24, 2018 Share Posted January 24, 2018 @bernhard , have you looked at the query plan for both SQL statements using a profiling tool or SQL Management tool? I had a quick look at your SQL and the type of select query you're doing (lots of fields from lots of tables), an inner join is (almost) always going to be faster than subqueries . There are instances where subqueries can be quicker than a join. If the subset of data returned by a Subquery is much smaller than if the table was used in a join query, (1000 rows vs a million row table for example) than this is where Subqueries can really shine. I'm guessing your tables are not? If you can also use EXISTS, NOT EXISTS statements with a subquery, it's probably a good candidate as well because of the aforementioned reasons. Normally, I'm a big advocate of readability of code but I draw the line on SQL because very often this is the place for bottlenecks for entire applications & workflows and it's the one place where a poorly optimised query can really cause issues down the line in too many ways. More readable code which is less optimised might only cost you a ms or a few seconds. Poorly optimised queries can cost you hours and even days depending on the dataset. Proper indenting, good (SQL) coding standards and concise & detailed commenting can go a long way to improve readability of those yucky joins 2 1 Link to comment Share on other sites More sharing options...
bernhard Posted January 25, 2018 Author Share Posted January 25, 2018 18 hours ago, flydev said: Yes, its called MySQL Query Cache Thanks! Didn't know about that. This worked to turn it off: SET GLOBAL query_cache_size = 0; 17 hours ago, FrancisChung said: @bernhard , have you looked at the query plan for both SQL statements using a profiling tool or SQL Management tool? No, never heard about such tools My experience with SQL is quite limited so I'd be very thankful for some helpful links in this regard, thanks Thank you both for your helpful answers! Link to comment Share on other sites More sharing options...
SamC Posted January 25, 2018 Share Posted January 25, 2018 4 hours ago, bernhard said: My experience with SQL is quite limited Sitting here learning MySQL for the first time (along with PHP, i.e. getting some actual core skills...), typing into the monitor stuff like ALTER and ADD. Then reading this thread. Then realising that I got a long road ahead. Gonna put the kettle on. Link to comment Share on other sites More sharing options...
FrancisChung Posted January 25, 2018 Share Posted January 25, 2018 I'm very well versed in MS SQL Server but not as much in MySQL. But the underlying principles are the same. Pretty succinct explanation herehttps://www.sitepoint.com/using-explain-to-write-better-mysql-queries/ And to get in deeper https://dev.mysql.com/doc/refman/5.5/en/using-explain.html It may also beneficial to create indexes for some of the fields you're joining with (e.g.efforts.data, pages_id) but best to verify with a query plan using the Explain command as above to verify your mileage. Indexes with low cardinality (i.e. low number of unique values) usually perform worse than not having any indexes at all. 1 1 Link to comment Share on other sites More sharing options...
adrian Posted January 25, 2018 Share Posted January 25, 2018 @bernhard - not sure if this is useful or not, but it shows you the SQL that the PageFinder uses for a given selector. 4 Link to comment Share on other sites More sharing options...
SamC Posted January 26, 2018 Share Posted January 26, 2018 @adrian that's awesome! Need to try this, will be very helpful to see what queries are actually being run. Thanks. @bernhard I was using mysql workbench yesterday, maybe that would be of use to you? You can get visuals etc. I've found it really useful so far. https://dev.mysql.com/doc/workbench/en/wb-performance-explain.html Link to comment Share on other sites More sharing options...
bernhard Posted January 26, 2018 Author Share Posted January 26, 2018 @adrian thanks I was aware of that but I don't think it is of big help in my case. @SamC thanks, looks interesting I've always worked with PhpMyAdmin (on most hostings) and HeidiSQL (for local dev with laragon). The visual explain analyzer looks interesting though. Seems that HeidiSQL should also have this feature but it didn't work here on my first try... I'll play around with that in the future - for now I'm happy with my results since I decreased the time for the query from terrible 90s to 900ms (uncached) Thanks for all your input! Link to comment Share on other sites More sharing options...
Pixrael Posted January 26, 2018 Share Posted January 26, 2018 @bernhard Incredible results! Can you summarize which option you used finally? Link to comment Share on other sites More sharing options...
bernhard Posted January 26, 2018 Author Share Posted January 26, 2018 @Pixrael the summary is basically: Do a really bad job before with lots of foreach etc and then refactor everything to proper sql statements that join the related data together in a fraction of the time At least I'm on step two of "make it work, make it fast, make it pretty" Jokes aside: The background is that I have quite some performance issues with my CRM - there is an admin page that lists all projects and all their revenues. The revenues are themselfes pages with additional informations (like type of revenue (internal/external/etc), date and so on). This way it is possible to do all kinds of great stuff with that data (fake data here): Listing all those related items is not so easy, thats why I used regular PW api before, looping all projects, summing up all related revenues etc.; And of course that is terribly inefficient. But a lot easier than setting up SQL queries with joins, sum(), concat() etc... Since all the joins etc. are quite complex I created a little ProcessModule that will be part of my Datatables module (planned for this year) that lists the queries instantly, makes it possible to combine different views and - what stants out from regular sql tools - can interpret PHP. So you can for example create custom db views with variable setups: Spoiler <?php /** * list that shows all revenues related to a project */ $s = 0; $dt3 = $this->modules->get('RockDatatables3'); $year = 2018; $query = ''; // create queries for all efforttypes $field = $this->fields->get('rockprojecteffort_type'); foreach($field->type->getOptions($field)->each('value') as $eftypeid => $eftype) { if($eftype == 'invoiceitem') continue; // skip "rechnungsposten" $s++; // values $query .= ",("; $query .= "SELECT sum(net) FROM {$dt3->getViewQuery('allrevenues')} AS ef_subquery_$s WHERE ef_subquery_$s.projectid = sub_revenues.projectid AND DATE_FORMAT(ef_subquery_$s.date, '%Y') = $year AND type = $eftypeid "; $query .= ") AS {$eftype}_0"; // values for each month for($i = 1; $i <= 12; $i++) { $s++; $monthstr = $year.$i; $query .= ",("; $query .= "SELECT sum(net) FROM {$dt3->getViewQuery('allrevenues')} AS ef_subquery_$s WHERE ef_subquery_$s.projectid = sub_revenues.projectid AND DATE_FORMAT(ef_subquery_$s.date, '%Y%c') = $monthstr AND type = $eftypeid "; $query .= ") AS {$eftype}_$i"; } } // get revenues $s++; $revenues = "SELECT projectid ,sum(net) AS nettotal ,sum(gross) AS grosstotal $query FROM {$dt3->getViewQuery('allrevenues')} AS sub_revenues WHERE projectpwstatus = 1 /* only published projects */ GROUP BY projectid "; /** * main query */ return "SELECT * FROM {$dt3->getViewQuery('allprojects')} AS projects LEFT JOIN ($revenues) AS revenues ON revenues.projectid = projects.id "; Resulting in this PS: Just realized that this topic went quite far from the initial title (best query for pages->find() ), so I'll change it. @adrian 's suggestion was of course on spot of the old thread title 1 Link to comment Share on other sites More sharing options...
dragan Posted January 26, 2018 Share Posted January 26, 2018 On 25.1.2018 at 2:38 PM, SamC said: Then realising that I got a long road ahead Don't want to divert this thread, but I first got my feet wet with more advanced SQL stuff reading https://philip.greenspun.com/sql/index.html He used AOL server and Oracle, but the underlying principles and problems are similar in the entire DB-world I guess. And what's more, the guy was involved with really huge companies and lists some real-life problems and solution-approaches. Was kinda fun to read back in the day (the nerd-kind-of-fun). tl;dr: Just keep on learning, today there's a million times more learning opportunities than 30 yrs ago (free software, online tutorials/courses etc., cheap hosting, and faster internet than dialup...). 4 1 Link to comment Share on other sites More sharing options...
FrancisChung Posted January 26, 2018 Share Posted January 26, 2018 @bernhard, you might also consider using Views or Stored Procedures to encapsulate the logic in the DB rather than code. I'm guessing that the user might be filtering on the dataset returned, but that's mainly done on the client side, not the server/db side? Link to comment Share on other sites More sharing options...
bernhard Posted January 27, 2018 Author Share Posted January 27, 2018 16 hours ago, FrancisChung said: @bernhard, you might also consider using Views or Stored Procedures to encapsulate the logic in the DB rather than code. I'm guessing that the user might be filtering on the dataset returned, but that's mainly done on the client side, not the server/db side? Thanks, I've already used VIEWS for some of the operations, but the problem I had was that the definition of the view was done inside phpmyadmin and this was totally unmaintainable! See here: So maybe there are better tools for such tasks? Not only is the definition of the view a 1-line-string but also I cannot add any comments to it. My "dynamically created views" that I have now are far better to read and can also use php (variables and functions). The latter would not be possible using standard views, would it? I'm talking about something like this: <?php // demo-sql-query.php $year = 2018; $months = ''; for($i=1; $i<=12; $i++) { $months .= ",(SELECT month_$i FROM demo WHERE year = $year) AS month_$i"; } $sql = "SELECT id $months FROM pages LEFT JOIN ... WHERE ... "; // result SELECT id ,(SELECT month_1 FROM demo WHERE year = 2018) AS month_1 ... ,(SELECT month_12 FROM demo WHERE year = 2018) AS month_12 FROM ... The problem with that technique is that it is always hard to debug since you have to execute the command and you cannot just click on the view in phpmyadmin. Thats why i created the processmodule to list thos "views". So i just select "demo-sql-query.php" and see the resulting table ("view") But I'm totally open to better solutions and learning new things. Link to comment Share on other sites More sharing options...
FrancisChung Posted January 27, 2018 Share Posted January 27, 2018 @bernhard, have you tried using another tool like MySQLWorkBench for example? Perhaps it has inbuilt formatting that is better? Did you create the views inside phpMyAdmin? If you go down the view / stored procedure route, the normal workflow is to use DB scripts that create the views / stored procedures on the DB. So if you need to make a change to the logic, you would edit the view / stored proc. logic on the script and then run the script on the DB. These scripts can be generated using the generate script/schema functionality of these DB Management tools like MySQLWorkbench etc. They normally have the following structure. (Following is pseudo code) IF <DB Object> Exists Then Drop <DB Object> CREATE <DB Object> ...<Business Logic here> ... GRANT <DB Object> Permissions Having these scripts will allow you make automated deployment easier later on. Also, if you create your objects this way, this may solve your issue with the formatting because you have a direct hand in formatting the SQL that's being used. Link to comment Share on other sites More sharing options...
Recommended Posts