Hari KT Posted May 10, 2014 Share Posted May 10, 2014 Hey, So this have already been asked by some, and I have noticed your replies that Processwire is very fast, and if not you can cache the results. And I agree the same, still I have a question. We have around 6,00,000 ( 6 Lakh ) pages, and PW can deliver in ~1.9 to ~4 seconds. ie really awesome thing. One concern is at times it goes slow to ~5 seconds to 8 seconds ( I am talking about onload speed in browser , not the rendering speed ). So in my findings the problem is like we have a few cron jobs running via gearman to insert and update certain images and more than that . We are using PW Page api to do the insert / update stuffs. ( May be around a few 1000's of pages are fetched and process at a time ) . The question is whether there is a way to get the raw sql queries that we need to run on a single insert / update of a Page . Eg : $page = $this->pages->get("id={$data->article_id}"); if ($page->template == 'something') { $date = new DateTime(); $page->update_field = "Some data" $page->save(); } So the idea if I know the row sql query to run, we can get and insert via pdo with out the api and see how fast is it processing. Let me know if there is a way for the same. Also if you have any other suggestions / optimization tips I am Happy to hear and take. Thank you. Link to comment Share on other sites More sharing options...
teppo Posted May 10, 2014 Share Posted May 10, 2014 If you want queries that PW did while inserting or updating those pages, in debug mode use $database::getQueryLog(). That'll return you the full query log, from which you can grab just the parts you need. Another method would be updating database tables manually with custom SQL, but then you'd have to update/insert at least pages (including sort fields), pages_access, pages_parents (in some cases) and each field-specific table individually. You said that "[...] insert and update certain images and [...]". Depending on various factors (amount of images, file sizes, resize settings etc.) updating and/or fetching images could be very expensive operation. Just saying. 3 Link to comment Share on other sites More sharing options...
Hari KT Posted May 10, 2014 Author Share Posted May 10, 2014 Thank you @teppo for pointing about the QueryLog. Sometimes I forget things, my bad. Link to comment Share on other sites More sharing options...
WillyC Posted May 10, 2014 Share Posted May 10, 2014 $page = $this->pages->get($data->article_id); if ($page->template == 'something') { $stmt = $this->database->prepare( 'update field_update_field set data=:data where pages_id=:pages_id' ); $stmt->bindValue(':data', 'some.data'); $stmt->bindValue(':pages_id', $page->id); $stmt->execute(); } ..but woluld be.that any faster then this : ? i woluld think.not $page = $this->pages->get($data->article_id); if ($page->template == 'something') { $page->update_field = 'some data'; $page->save( 'update_field' ); } If you want queries that PW did while inserting or updating those pages, in debug mode use $database::getQueryLog(). That'll return you the full query log, from which you can grab just the parts you need. getQueryLog() not full queer ys bound values.they contain not 1 Link to comment Share on other sites More sharing options...
Hari KT Posted May 11, 2014 Author Share Posted May 11, 2014 @WillyC I just want to try out whether that will bring any performance improvement when dealing with 1000's of pages. I am not sure whether the PW is using an active record or some what similar methodology for saving / updating. We will see Link to comment Share on other sites More sharing options...
Hari KT Posted May 12, 2014 Author Share Posted May 12, 2014 I am here with another issue noticed when running seige to benchmark stuffs. There is a bottle neck for SQL_CALC_FOUND_ROWS . I was going through this blog post http://www.mysqlperformanceblog.com/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/ and some of my own tests. I am running and testing on ~ 606267 pages. SELECT SQL_CALC_FOUND_ROWS * FROM `pages` LIMIT 5 you will get the result in between ~ 0.42 s to ~ 0.572 s When you are running it in two queries, say SELECT * FROM `pages` LIMIT 5 and SELECT COUNT(*) FROM `pages` you will get the result in ~ 0.014 s and 0.189 s respectively. So a total of ~ 0.203 s. So I feel it will be nice not to have the SQL_CALC_FOUND_ROWS always been used when running any queries. Some of the additional problems to be taken are the queries will be using joins and those will take more seconds. In my case it has been taking ~22 seconds and more when running. Thank you. 3 Link to comment Share on other sites More sharing options...
ryan Posted May 24, 2014 Share Posted May 24, 2014 Thanks Hari, excellent observation and I will take a closer look at this. While COUNT(*) is very fast in that context, ProcessWire never executes such a query with no where or join conditions. If I recall that speed difference doesn't hold true once you are counting the results of a rather complex query (as most PW page queries are). Though if we can tweak a little more speed out of using a count rather than a select found_rows() then I'd be all for it. I've been through this consideration before and had settled on sql_calc_found_rows being the better option at the time, but it's been awhile and I'd like to revisit it. I'm now working on a project with apeisa where we deal with hundreds of thousands of pages and we're always looking for ways to optimize performance, so it seems like a good opportunity to benchmark this again. One thing to mention also is that the PW dev branch is more optimized in this respect. It is quite a bit more selective as to when it uses SQL_CALC_FOUND_ROWS, which ensures it's not putting that overhead on top of situations where it doesn't absolutely need it. You can also disable the behavior in your $pages->find() call by specifying the 'getTotal' option as false, which tells PageFinder not to attempt to determine the total number of matches. But specifying it wouldn't be necessary unless your selector string included a limit=n in it, as PW dev doesn't attempt the count unless you are using a limit (presumably for pagination). $options = array('getTotal' => false); $items = $pages->find("selector", $options); 7 Link to comment Share on other sites More sharing options...
ryan Posted May 24, 2014 Share Posted May 24, 2014 I've updated PageFinder to support both the SQL_CALC_FOUND_ROWS and the COUNT(*) method so that I could compare them. I found some interesting results. My conclusion is that COUNT(*) is indeed faster in many instances, but SQL_CALC_FOUND_ROWS is faster in many other instances. Sometimes one is even twice as fast as the other, making me think we might need to build some logic into PageFinder to help it decide which method to use. Though I'm not yet certain why one is counting so much faster than the other at present. Here are the results. Numbers are in seconds. This was testing on a site with ~45k pages. 1. Selector: id>1 (matches 45786 pages) AVERAGE TOTAL (50 finds) ------------------------------------------- CALC: 0.030606 1.5386 COUNT: 0.026732 1.3451 WINNER 2. Selector: template=facility (matches 38008 pages) AVERAGE TOTAL (50 finds) ------------------------------------------- CALC: 0.197568 9.9118 COUNT: 0.183608 9.2142 WINNER 3. Selector: template=facility, title*=senior (matches 1207 pages) AVERAGE TOTAL (50 finds) ------------------------------------------- CALC: 0.053222 2.6943 WINNER COUNT: 0.08826 4.4469 4. Selector: template=facility, capacity>5 (matches 28616 pages) AVERAGE TOTAL (50 finds) ------------------------------------------- CALC: 0.347964 17.4320 COUNT: 0.208856 10.4764 WINNER 5. Selector: template=facility, capacity>5, zip!='' (matches 28145 pages) AVERAGE TOTAL (50 finds) ------------------------------------------- CALC: 0.61547 30.8082 COUNT: 0.407376 20.4028 WINNER 6. Selector: capacity>5, zip!='', limit=2 (matches 28145 pages) AVERAGE TOTAL (50 finds) ------------------------------------------- CALC: 0.435284 21.7988 WINNER COUNT: 0.903338 45.2016 7. Selector: capacity<5, zip!='', limit=2 (matches 4713 pages) AVERAGE TOTAL (50 finds) ------------------------------------------- CALC: 0.097256 4.8961 WINNER COUNT: 0.176476 8.8574 Here is the script I used to test: <pre><?php include("./index.php"); $timers = array(); $selector = "id>1, limit=2"; // change to desired selector $totalTimer = Debug::timer(); $totalRecords = 0; $max = 50; for($n = 1; $n <= $max; $n++) { $timer = Debug::timer(); $items = wire('pages')->find($selector); $totalRecords = $items->getTotal(); $elapsed = Debug::timer($timer); $timers[] = $elapsed; // clear the find() cache so it doesn't get reused on next find() wire('pages')->uncacheAll(); } $total = 0.0; foreach($timers as $elapsed) $total += (float) $elapsed; $average = $total / count($timers); echo "Selector:\t$selector\n"; echo "Average:\t" . $average . " seconds per find()\n"; echo "Total Time:\t" . Debug::timer($totalTimer) . " seconds for $max find()s\n"; echo "Total Pages:\t$totalRecords\n"; 11 Link to comment Share on other sites More sharing options...
Beluga Posted May 24, 2014 Share Posted May 24, 2014 Can someone test with MariaDB 10? Link to comment Share on other sites More sharing options...
Hari KT Posted May 25, 2014 Author Share Posted May 25, 2014 Thank you @Ryan for sharing the information and test scripts you did. I will try to benchmark on the coming days with the same. I have more than 50K pages . 4 Link to comment Share on other sites More sharing options...
bwakad Posted May 25, 2014 Share Posted May 25, 2014 The only thing I notice here is: 1: single number - bottom winner 2: single word - bottom winner 3: multiple words - top winner 4: single word AND single number - bottom winner 5: single word AND single number - bottom winner 6: multiple numbers - top winner 7: multiple numbers - top winner Based on Ryan's test - looking at the selector - I could say: Using multiple reference of the SAME kind = CALC Using single reference of ANY kind = COUNT Using single reference of DIFFERENT kind = COUNT Hope this fresh view helps... 3 Link to comment Share on other sites More sharing options...
ryan Posted May 25, 2014 Share Posted May 25, 2014 If anyone is interested in experimenting with this one, grab the latest dev branch. In your selector to $pages->find(), you can specify get_total=count to make it use the count method rather than the calc method. You can also specify get_total=0 to disable any kind of total counting. Though note that it's only counting totals if you have a "limit=n" (where n is any number), so don't bother trying to change the count method if you aren't using limit=n somewhere in your query. Example: // calculate total using count method $t = Debug::timer(); $items = $pages->find("template=basic-page, limit=1, get_total=count"); $total = $items->getTotal(); echo "<p>Found $total items in " . Debug::timer($t) . " seconds using count</p>"; // calculate total using calc method (default) $t = Debug::timer(); $items = $pages->find("template=basic-page, limit=1, get_total=calc"); $total = $items->getTotal(); echo "<p>Found $total items in " . Debug::timer($t) . " seconds using calc</p>"; // bypass any kind of counting $t = Debug::timer(); $items = $pages->find("template=basic-page, limit=1, get_total=0"); $total = $items->getTotal(); // will be 0 or 1 echo "<p>Found $total items in " . Debug::timer($t) . " seconds with total count disabled.</p>"; One thing is for certain: counting totals takes up significant overhead regardless of count method. For large scale usages, having the option to disable total counting with get_total=0 will be valuable for sure. For instance, if you wanted to retrieve the 3 newest news items, but didn't care how many there were total, get_total=0 would be worthwhile. Likewise, if you are using a "Next Page / Prev Page" type of pagination (rather than numbered pagination links) then you also wouldn't need the total count. 7 Link to comment Share on other sites More sharing options...
bwakad Posted May 25, 2014 Share Posted May 25, 2014 Unfortunately, my development is still on-going, and do not have much pages to work with this. Assuming, dev branch is an update to the core... But I would like to help analyze test results if it can help. Link to comment Share on other sites More sharing options...
adrian Posted May 25, 2014 Share Posted May 25, 2014 Assuming, dev branch is an update to the core... You can get the latest dev from here: https://github.com/ryancramerdesign/ProcessWire/archive/dev.zip This version will become 2.5 fairly soon. Link to comment Share on other sites More sharing options...
bwakad Posted May 25, 2014 Share Posted May 25, 2014 So, installing that, and then import my pages and fields? Although, I am not sure how to accomplish that without importing old core... Link to comment Share on other sites More sharing options...
kongondo Posted May 25, 2014 Share Posted May 25, 2014 (edited) So, installing that, and then import my pages and fields? Although, I am not sure how to accomplish that without importing old core... If you are using ProcessWire 2.4 (which I assume you are), all you probably need to copy over is the folder /wire/ (deleting/replacing the old /wire/ first). ProcessWire never touches your site's folders, i.e. /site/. In some upgrades, replacing the old index.php and .htaccess may be necessary. If this is required, it is specified in the upgrade process. If you only need to copy over /wire/, instead of replacing the old /wire/, you can just rename it to say, /wire-old/. This way, you can quickly test the old (2.4) and the new (2.5)....by renaming the respective folders. Note: only do this if you know what you are doing ....In addition, as much as upgrading PW is relatively safe and easy, always make a back-up of your database...(which you should be doing anyway regardless of upgrades...) Edited May 25, 2014 by kongondo Link to comment Share on other sites More sharing options...
Hari KT Posted June 6, 2014 Author Share Posted June 6, 2014 Sorry I was a bit late to give you the details on benchmarking. Got busy with different stuffs. Here is the code that ran on around 701407 `some-page` and `different-page` of `1158` pages. There is more types of pages, so probably the total of pages will be more. <?php require dirname(__DIR__) . '/index.php'; $pages = wire('pages'); $t = Debug::timer(); $items = $pages->find("template=some-page, limit=1, get_total=count"); $total = $items->getTotal(); echo "<p>Found $total some-pages in " . Debug::timer($t) . " seconds using count</p>" . PHP_EOL; wire('pages')->uncacheAll(); // calculate total using calc method (default) $t = Debug::timer(); $items = $pages->find("template=some-page, limit=1, get_total=calc"); $total = $items->getTotal(); echo "<p>Found $total some-pages in " . Debug::timer($t) . " seconds using calc</p>" . PHP_EOL; wire('pages')->uncacheAll(); $t = Debug::timer(); $total = $pages->count("template=some-page"); echo "<p>Found $total some-pages using count() in " . Debug::timer($t) . " seconds using count</p>" . PHP_EOL; wire('pages')->uncacheAll(); $t = Debug::timer(); $items = $pages->find("template=different-page, limit=1, get_total=count"); $total = $items->getTotal(); echo "<p>Found $total different-pages in " . Debug::timer($t) . " seconds using count</p>" . PHP_EOL; wire('pages')->uncacheAll(); // calculate total using calc method (default) $t = Debug::timer(); $items = $pages->find("template=different-page, limit=1, get_total=calc"); $total = $items->getTotal(); echo "<p>Found $total different-pages in " . Debug::timer($t) . " seconds using calc</p>" . PHP_EOL; wire('pages')->uncacheAll(); $t = Debug::timer(); $total = $pages->count("template=different-page"); echo "<p>Found $total different-page count() in " . Debug::timer($t) . " seconds using count</p>" . PHP_EOL; wire('pages')->uncacheAll(); $t = Debug::timer(); $first = $pages->get("template=some-page, sort=-some-page_created"); echo "<p>Found {$first->title} in " . Debug::timer($t) . " seconds</p>" . PHP_EOL; wire('pages')->uncacheAll(); $t = Debug::timer(); $first = $pages->get("template=some-page, sort=-some-page_created, get_total=0"); echo "<p>Found {$first->title} with get_total=0 in " . Debug::timer($t) . " seconds</p>" . PHP_EOL; wire('pages')->uncacheAll(); Result <p>Found 701407 some-page in 2.4525 seconds using count</p> <p>Found 701407 some-page in 2.7801 seconds using calc</p> <p>Found 701407 some-page using count() in 2.6784 seconds using count</p> <p>Found 1158 different-page in 0.0328 seconds using count</p> <p>Found 1158 different-page in 0.0166 seconds using calc</p> <p>Found 1158 source count() in 0.0028 seconds using count</p> <p>Found some title in 3.5964 seconds</p> <p>Found some title with get_total=0 in 0.0188 seconds</p> Indeed the last one has shown a tremendous improvement when kept `get_total=0` . Thank you 3 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now