Hari KT

Optimizing code / queries

Recommended Posts

Hey,

So this have already been asked by some, and I have noticed your replies that Processwire is very fast, and if not you can cache the results.

And I agree the same, still I have a question.

We have around 6,00,000 ( 6 Lakh ) pages, and PW can deliver in ~1.9 to ~4 seconds. ie really awesome thing. One concern is at times it goes slow to ~5 seconds to 8 seconds ( I am talking about onload speed in browser , not the rendering speed ).

So in my findings the problem is like we have a few cron jobs running via gearman to insert and update certain images and more than that ;) . We are using PW Page api to do the insert / update stuffs. ( May be around a few 1000's of pages are fetched and process at a time ) .

The question is whether there is a way to get the raw sql queries that we need to run on a single insert / update of a Page . Eg :

$page = $this->pages->get("id={$data->article_id}");
if ($page->template == 'something') {
    $date = new DateTime();
    $page->update_field = "Some data"
    $page->save();
}

So the idea if I know the row sql query to run, we can get and insert via pdo with out the api and see how fast is it processing.

Let me know if there is a way for the same.

Also if you have any other suggestions / optimization tips I am Happy to hear and take.

Thank you.

Share this post


Link to post
Share on other sites

If you want queries that PW did while inserting or updating those pages, in debug mode use $database::getQueryLog(). That'll return you the full query log, from which you can grab just the parts you need.

Another method would be updating database tables manually with custom SQL, but then you'd have to update/insert at least pages (including sort fields), pages_access, pages_parents (in some cases) and each field-specific table individually.

You said that "[...] insert and update certain images and [...]". Depending on various factors (amount of images, file sizes, resize settings etc.) updating and/or fetching images could be very expensive operation. Just saying.

  • Like 3

Share this post


Link to post
Share on other sites

Thank you @teppo for pointing about the QueryLog. Sometimes I forget things, my bad.

Share this post


Link to post
Share on other sites
$page = $this->pages->get($data->article_id);
if ($page->template == 'something') {
    $stmt = $this->database->prepare( 'update field_update_field set data=:data where pages_id=:pages_id' );
    $stmt->bindValue(':data', 'some.data'); 
    $stmt->bindValue(':pages_id', $page->id); 
    $stmt->execute(); 
}
 
..but woluld be.that any faster then this : ? i woluld think.not
 
$page = $this->pages->get($data->article_id);
if ($page->template == 'something') {
    $page->update_field = 'some data';
    $page->save( 'update_field' ); 
}

If you want queries that PW did while inserting or updating those pages, in debug mode use $database::getQueryLog(). That'll return you the full query log, from which you can grab just the parts you need.

getQueryLog() not full queer ys

bound values.they contain not

  • Like 1

Share this post


Link to post
Share on other sites

@WillyC I just want to try out whether that will bring any performance improvement when dealing with 1000's of pages.

I am not sure whether the PW is using an active record or some what similar methodology for saving / updating.

We will see :)

Share this post


Link to post
Share on other sites

I am here with another issue noticed when running seige to benchmark stuffs.

There is a bottle neck for SQL_CALC_FOUND_ROWS . I was going through this blog post

http://www.mysqlperformanceblog.com/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/

and some of my own tests.

I am running and testing on ~ 606267 pages.

SELECT SQL_CALC_FOUND_ROWS * FROM `pages` LIMIT 5

you will get the result in between ~ 0.42 s to ~ 0.572 s

When you are running it in two queries, say

SELECT * FROM `pages` LIMIT 5

and

SELECT COUNT(*) FROM `pages`

you will get the result in ~ 0.014 s and 0.189 s respectively.  So a total of ~ 0.203 s.

So I feel it will be nice not to have the SQL_CALC_FOUND_ROWS always been used when running any queries.

Some of the additional problems to be taken are the queries will be using joins and those will take more seconds. In my case it has been taking ~22 seconds and more when running.

Thank you.

  • Like 3

Share this post


Link to post
Share on other sites

Thanks Hari, excellent observation and I will take a closer look at this. While COUNT(*) is very fast in that context, ProcessWire never executes such a query with no where or join conditions. If I recall that speed difference doesn't hold true once you are counting the results of a rather complex query (as most PW page queries are). Though if we can tweak a little more speed out of using a count rather than a select found_rows() then I'd be all for it. I've been through this consideration before and had settled on sql_calc_found_rows being the better option at the time, but it's been awhile and I'd like to revisit it. I'm now working on a project with apeisa where we deal with hundreds of thousands of pages and we're always looking for ways to optimize performance, so it seems like a good opportunity to benchmark this again. 

One thing to mention also is that the PW dev branch is more optimized in this respect. It is quite a bit more selective as to when it uses SQL_CALC_FOUND_ROWS, which ensures it's not putting that overhead on top of situations where it doesn't absolutely need it. You can also disable the behavior in your $pages->find() call by specifying the 'getTotal' option as false, which tells PageFinder not to attempt to determine the total number of matches. But specifying it wouldn't be necessary unless your selector string included a limit=n in it, as PW dev doesn't attempt the count unless you are using a limit (presumably for pagination). 

$options = array('getTotal' => false); 
$items = $pages->find("selector", $options); 
  • Like 7

Share this post


Link to post
Share on other sites

I've updated PageFinder to support both the SQL_CALC_FOUND_ROWS and the COUNT(*) method so that I could compare them. I found some interesting results. My conclusion is that COUNT(*) is indeed faster in many instances, but SQL_CALC_FOUND_ROWS is faster in many other instances. Sometimes one is even twice as fast as the other, making me think we might need to build some logic into PageFinder to help it decide which method to use. Though I'm not yet certain why one is counting so much faster than the other at present. Here are the results. Numbers are in seconds. This was testing on a site with ~45k pages.

1. Selector: id>1 (matches 45786 pages)

       AVERAGE       TOTAL (50 finds)
-------------------------------------------
CALC:  0.030606      1.5386 
COUNT: 0.026732      1.3451 WINNER


2. Selector: template=facility (matches 38008 pages)

       AVERAGE       TOTAL (50 finds)
-------------------------------------------
CALC:  0.197568      9.9118
COUNT: 0.183608      9.2142 WINNER


3. Selector: template=facility, title*=senior (matches 1207 pages)

       AVERAGE       TOTAL (50 finds)
-------------------------------------------
CALC:  0.053222      2.6943 WINNER
COUNT: 0.08826       4.4469


4. Selector: template=facility, capacity>5 (matches 28616 pages)

       AVERAGE       TOTAL (50 finds)
-------------------------------------------
CALC:  0.347964      17.4320
COUNT: 0.208856      10.4764 WINNER


5. Selector: template=facility, capacity>5, zip!='' (matches 28145 pages)

       AVERAGE       TOTAL (50 finds)
-------------------------------------------
CALC:  0.61547       30.8082
COUNT: 0.407376      20.4028 WINNER


6. Selector: capacity>5, zip!='', limit=2 (matches 28145 pages)

       AVERAGE       TOTAL (50 finds)
-------------------------------------------
CALC:  0.435284      21.7988 WINNER
COUNT: 0.903338      45.2016


7. Selector: capacity<5, zip!='', limit=2 (matches 4713 pages)

       AVERAGE       TOTAL (50 finds)
-------------------------------------------
CALC:  0.097256      4.8961 WINNER
COUNT: 0.176476      8.8574

Here is the script I used to test:

<pre><?php

include("./index.php");

$timers = array();
$selector = "id>1, limit=2"; // change to desired selector
$totalTimer = Debug::timer();
$totalRecords = 0;
$max = 50;

for($n = 1; $n <= $max; $n++) {
  $timer = Debug::timer();
  $items = wire('pages')->find($selector);
  $totalRecords = $items->getTotal();
  $elapsed = Debug::timer($timer);
  $timers[] = $elapsed;
  // clear the find() cache so it doesn't get reused on next find()
  wire('pages')->uncacheAll(); 
}

$total = 0.0; 
foreach($timers as $elapsed) $total += (float) $elapsed; 
$average = $total / count($timers); 

echo "Selector:\t$selector\n"; 
echo "Average:\t" . $average . " seconds per find()\n";
echo "Total Time:\t" . Debug::timer($totalTimer) . " seconds for $max find()s\n";
echo "Total Pages:\t$totalRecords\n";

  • Like 11

Share this post


Link to post
Share on other sites

Thank you @Ryan for sharing the information and test scripts you did.

I will try to benchmark on the coming days with the same. I have more than 50K pages ;) .

  • Like 4

Share this post


Link to post
Share on other sites

The only thing I notice here is:

1: single number - bottom winner

2: single word - bottom winner

3: multiple words - top winner

4: single word AND single number - bottom winner

5: single word AND single number - bottom winner

6: multiple numbers - top winner

7: multiple numbers - top winner

Based on Ryan's test - looking at the selector - I could say:

Using multiple reference of the SAME kind = CALC

Using single reference of ANY kind = COUNT

Using single reference of DIFFERENT kind = COUNT

Hope this fresh view helps...

  • Like 3

Share this post


Link to post
Share on other sites

If anyone is interested in experimenting with this one, grab the latest dev branch. In your selector to $pages->find(), you can specify get_total=count to make it use the count method rather than the calc method. You can also specify get_total=0 to disable any kind of total counting. Though note that it's only counting totals if you have a "limit=n" (where n is any number), so don't bother trying to change the count method if you aren't using limit=n somewhere in your query. Example:

// calculate total using count method
$t = Debug::timer(); 
$items = $pages->find("template=basic-page, limit=1, get_total=count"); 
$total = $items->getTotal();
echo "<p>Found $total items in " . Debug::timer($t) . " seconds using count</p>";

// calculate total using calc method (default)
$t = Debug::timer();
$items = $pages->find("template=basic-page, limit=1, get_total=calc"); 
$total = $items->getTotal();
echo "<p>Found $total items in " . Debug::timer($t) . " seconds using calc</p>";

// bypass any kind of counting
$t = Debug::timer();
$items = $pages->find("template=basic-page, limit=1, get_total=0"); 
$total = $items->getTotal(); // will be 0 or 1
echo "<p>Found $total items in " . Debug::timer($t) . " seconds with total count disabled.</p>";

One thing is for certain: counting totals takes up significant overhead regardless of count method. For large scale usages, having the option to disable total counting with get_total=0 will be valuable for sure. For instance, if you wanted to retrieve the 3 newest news items, but didn't care how many there were total, get_total=0 would be worthwhile. Likewise, if you are using a "Next Page / Prev Page" type of pagination (rather than numbered pagination links) then you also wouldn't need the total count. 

  • Like 7

Share this post


Link to post
Share on other sites

Unfortunately, my development is still on-going, and do not have much pages to work with this. Assuming, dev branch is an update to the core...

But I would like to help analyze test results if it can help.

Share this post


Link to post
Share on other sites

So, installing that, and then import my pages and fields? Although, I am not sure how to accomplish that without importing old core...

Share this post


Link to post
Share on other sites

So, installing that, and then import my pages and fields? Although, I am not sure how to accomplish that without importing old core...

If you are using ProcessWire 2.4 (which I assume you are), all you probably need to copy over is the folder /wire/ (deleting/replacing the old /wire/ first). ProcessWire never touches your site's folders, i.e. /site/. In some upgrades, replacing the old index.php and .htaccess may be necessary. If this is required, it is specified in the upgrade process. If you only need to copy over /wire/, instead of replacing the old /wire/, you can just rename it to say, /wire-old/. This way, you can quickly test the old (2.4) and the new (2.5)....by renaming the respective folders. Note: only do this if you know what you are doing :D....In addition, as much as upgrading PW is relatively safe and easy, always make a back-up of your database...(which you should be doing anyway regardless of upgrades...)

Edited by kongondo

Share this post


Link to post
Share on other sites

Sorry I was a bit late to give you the details on benchmarking.

Got busy with different stuffs. Here is the code that ran on around 701407 `some-page` and `different-page` of `1158` pages. There is more types of pages, so probably the total of pages will be more.

<?php
require dirname(__DIR__) . '/index.php';
$pages = wire('pages');

$t = Debug::timer();
$items = $pages->find("template=some-page, limit=1, get_total=count");
$total = $items->getTotal();
echo "<p>Found $total some-pages in " . Debug::timer($t) . " seconds using count</p>" . PHP_EOL;

wire('pages')->uncacheAll();

// calculate total using calc method (default)
$t = Debug::timer();
$items = $pages->find("template=some-page, limit=1, get_total=calc");
$total = $items->getTotal();
echo "<p>Found $total some-pages in " . Debug::timer($t) . " seconds using calc</p>" . PHP_EOL;

wire('pages')->uncacheAll();

$t = Debug::timer();
$total = $pages->count("template=some-page");
echo "<p>Found $total some-pages using count() in " . Debug::timer($t) . " seconds using count</p>" . PHP_EOL;

wire('pages')->uncacheAll();

$t = Debug::timer();
$items = $pages->find("template=different-page, limit=1, get_total=count");
$total = $items->getTotal();
echo "<p>Found $total different-pages in " . Debug::timer($t) . " seconds using count</p>" . PHP_EOL;

wire('pages')->uncacheAll();

// calculate total using calc method (default)
$t = Debug::timer();
$items = $pages->find("template=different-page, limit=1, get_total=calc");
$total = $items->getTotal();
echo "<p>Found $total different-pages in " . Debug::timer($t) . " seconds using calc</p>" . PHP_EOL;

wire('pages')->uncacheAll();

$t = Debug::timer();
$total = $pages->count("template=different-page");
echo "<p>Found $total different-page count() in " . Debug::timer($t) . " seconds using count</p>" . PHP_EOL;

wire('pages')->uncacheAll();

$t = Debug::timer();
$first = $pages->get("template=some-page, sort=-some-page_created");
echo "<p>Found {$first->title} in " . Debug::timer($t) . " seconds</p>" . PHP_EOL;
wire('pages')->uncacheAll();

$t = Debug::timer();
$first = $pages->get("template=some-page, sort=-some-page_created, get_total=0");
echo "<p>Found {$first->title} with get_total=0 in " . Debug::timer($t) . " seconds</p>" . PHP_EOL;
wire('pages')->uncacheAll();

Result

<p>Found 701407 some-page in 2.4525 seconds using count</p>
<p>Found 701407 some-page in 2.7801 seconds using calc</p>
<p>Found 701407 some-page using count() in 2.6784 seconds using count</p>

<p>Found 1158 different-page in 0.0328 seconds using count</p>
<p>Found 1158 different-page in 0.0166 seconds using calc</p>
<p>Found 1158 source count() in 0.0028 seconds using count</p>

<p>Found some title in 3.5964 seconds</p>
<p>Found some title with get_total=0 in 0.0188 seconds</p>

Indeed the last one has shown a tremendous improvement when kept `get_total=0` .

Thank you

  • Like 3

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By deltavik
      Hey guys, 
      I have two performance related questions. 
      Question 1. find vs get
      In my setup, I have a template called 'article,' which is a child of 'articles.' There are two ways to  get all articles.
      This: 
      $pages->get("/articles")->children; or this: 
      $pages->find("template=article"); And if I wanted to further filter the articles by author , the queries would look like these: 
      $pages->get("/articles")->children("author=john"); and
      $pages->find("template=article,author=john");  
      I think "get" would be faster in such situations. What are your thoughts? Are there any best coding practices for PW performance? 
      Question 2: URL Routing
      The article urls in my old site are in this format: http://domainname.com/article-url. 
      But, in PW, the urls look like this: http://domainname.com/articles/article-url. 
      For URL redirection, based on what I was able to find on the forums,  I made the homepage accept url segments. Then I route the requests based on the url segment (which essentially is the 'name' of the article page.) 
      The question now is -- how big of performance hit will the site take due to this routing? Or am I needlessly thinking too much?
      thanks,