Repeater performance problem - and a solution

nik · September 25, 2012

We encountered a performance problem with a custom search today, finding out that one particular database query lasted seconds. Digging deeper revealed it has to do with repeaters when there are enough of them.

The site has pages for companies (about 1500 of them) each of which has repeater items attached. Most of them have one single item, total amount of repeater items being under 2000. This leads to about 3500 extra pages in the tree as a penalty for using repeaters (yes, there would have been other possible approaches as well). So, the amount of pages isn't anything massive.

Now, when a find("parent=/companies/, title%=keyword") is executed, the repeater fieldtype catches the use of title field (used in the repeaters as well) and tries to ensure no repeater pages get returned by adding !has_parent=<repeaters-page-id> to the selector string. While this is logically right, the resulting SQL query becomes slow due to the extra join used to filter out repeater pages.

With those 1500 + 1500 + 2000 = 5000 pages there are about 9000 rows in the pages_parents table (which is used for implementing has_parent selector). This isn't that much either, but it becomes a problem when there's a join between pages and pages_parents.

I'm not saying it wouldn't be possible to optimize the query, but as I'm not certainly skilled enough to do it, I went down another road. And the solution is dead simple - actually it's possible to catch it by just looking at those bolded selector strings above .

Ready? Ok, if we're searching for pages whose parent is /companies/ why should pages under /processwire/repeaters/ be explicitly filtered out? There's no way they'll match, so we'll just skip the extra repeater filter. And if you we're actually trying to find pages under /processwire/repeaters/ you didn't want the filter to kick in either (well, don't do that anyways, they're internals).

Here's a little diff showing the change I made to fix this thing for us (there's a pull request on the way Ryan):


$ git diff
diff --git a/wire/modules/Fieldtype/FieldtypeRepeater/FieldtypeRepeater.module b/wire/modules/Fieldtype/FieldtypeRepeater/FieldtypeRepeater.module
index 20bb52c..ab9db28 100644
--- a/wire/modules/Fieldtype/FieldtypeRepeater/FieldtypeRepeater.module
+++ b/wire/modules/Fieldtype/FieldtypeRepeater/FieldtypeRepeater.module
@@ -138,6 +138,10 @@ class FieldtypeRepeater extends Fieldtype implements ConfigurableModule {
// ensure that the repeaters own queries work since they specify a templates_id
if($name == 'templates_id' && in_array($selector->value, $templatesUsedByRepeaters)) $includeAll = true;

+ // optimization: if parent (or parent_id) is given, there's no need to explicitly exclude repeaters
+ // TODO: has_parent with values other than parents of repeaters-page could be catched as well
+ if($name == 'parent' || $name == 'parent_id') $includeAll = true;
+
if($includeAll) break;
}
}

Another performance issue solved. Not a very common one I think, but still worth a fix.

Edit: Ryan, no pull request this time. Didn't manage to do it right. Next time there'll be a branch before any changes, maybe that'll do it.

Edited September 25, 2012 by nik

ryan · September 25, 2012

Thanks Nik, this is a good find and fix. I am updating the code in the dev branch to include the line you added (no need for a pull request here). Using the same logic, I've also added a couple more things:

1. If wire('user')->isGuest(); then there's no need to exclude repeater pages because the user already doesn't have access to them. Meaning, they aren't going to show up in the results anyway, so no need to have the extra filter.

2. if 'has_parent' is specified with a value other than homepage, we don't need to have the filter either. Yes, they could still be included if you did a $pages->find('has_parent=/processwire/, title*=something'); but I think that's okay and maybe still an expected behavior. Though I think this is a rare query anyway.

Another related idea is that I could bypass the "has_parent!=n" filter entirely and just have it exclude by template. For example, rather than adding "has_parent!=n", it could add "templates_id!=1|2|3" (where 1, 2, 3 are templates used internally by repeaters), which would probably be more efficient than the join that comes from has_parent.

Last thing I want to mention is that the "%=" uses a MySQL "LIKE" which is non-indexed and inherently slow. You can get better performance by using "*=", which uses a fulltext index. Though the difference in speed isn't noticeable on smaller sites, but might make a difference in your case.

ryan · September 25, 2012

Nik, this has been pushed to the dev branch:

https://github.com/ryancramerdesign/ProcessWire/commit/eecd862a6ecf90f2c6b1c173c4d129c5152a31e7

In addition adding in the isGuest() check and the has_parent check, I changed the line you added to be this:

// optimization: if parent, parent_id, template, or templates_id is given, and an equals '=' operator is used, 
// there's no need to explicitly exclude repeaters since the parent and/or template is specific
if(in_array($name, array('parent', 'parent_id', 'template', 'templates_id')) && $selector->operator == '=') $includeAll = true;

Basically added in 'template' and 'templates_id' since those are also specific enough so as to warrant exclusion of the filter. And added an extra check that the operator was "=", since if they used something like not equals (!=), or greater than/less than (>< <= >=), that would still warrant inclusion of the filter. But just in case I got any of this wrong, I've kept it in the dev branch for further testing.

nik · September 25, 2012

You sure are fast. And it was expected you came up with other similar optimizations as well.

Changes are looking good and performance gain is what I hoped for: using the dev branch cut the time on that single query from about 3 seconds down to less than 20 milliseconds. And the resulting data is the same.

Another related idea is that I could bypass the "has_parent!=n" filter entirely and just have it exclude by template. For example, rather than adding "has_parent!=n", it could add "templates_id!=1|2|3" (where 1, 2, 3 are templates used internally by repeaters), which would probably be more efficient than the join that comes from has_parent.

+1 for this! As template_id is a native field in pages, I'm sure that would perform much better just because of that. In addition that change would boost this kind of queries even more as MySQL would be able to drop more rows based on where conditions. So go for it.

Actually all of this has revealed a performance issue with "has_parent!=n". That still performs poorly when there are lots of pages, especially in a deep structure resulting in lots of rows in pages_parents and thus large join between pages and pages_parents. Luckily that particular type of selector isn't one of the most used. Still possibly worth thinking if there was something that could be done. (Edit: also without the negation - I'll have to look into this more!)

Last thing I want to mention is that the "%=" uses a MySQL "LIKE" which is non-indexed and inherently slow. You can get better performance by using "*=", which uses a fulltext index. Though the difference in speed isn't noticeable on smaller sites, but might make a difference in your case.

This is what we suspected ourselves at first also as the query in question had several fields using "%=". But it turned out that dropping all of those from the query didn't help at all. And those tests I mentioned above had these "LIKE"-operators in place again, so no problem there. I guess we need more data to hit performance issues with "LIKE".

And added an extra check that the operator was "=", since if they used something like not equals (!=), or greater than/less than (>< <= >=), that would still warrant inclusion of the filter. But just in case I got any of this wrong, I've kept it in the dev branch for further testing.

Yes, this operator thing popped into my mind also when driving home from work. Didn't see that coming at first, good thing you did.

By the way, testing these kind of things needs MySQL query cache to be turned off as it screws up results when same exact queries are repeated. I did a little addition like this to wire/core/DatabaseQuerySelect.php during my tests:


$ git diff
diff --git a/wire/core/DatabaseQuerySelect.php b/wire/core/DatabaseQuerySelect.php
index b9cbcee..4f82d2d 100644
--- a/wire/core/DatabaseQuerySelect.php
+++ b/wire/core/DatabaseQuerySelect.php
@@ -98,6 +98,8 @@ class DatabaseQuerySelect extends DatabaseQuery {
}
if(!$sql) $sql = "SELECT ";

+ $sql .= "SQL_NO_CACHE ";
+
foreach($select as $s) $sql .= "$s,";
$sql = rtrim($sql, ",") . " ";
return $sql;

This doesn't catch every single query but vast majority of them. I was wondering if a config variable could be added so that this SQL_NO_CACHE addition could be controlled from site/config.php?

It was also helpful to add "Queries"-part of wire/templates-admin/debug.inc to the very template that had the problem. Maybe a little module should be written to display that debug data on every page when debug flag is on. Actually there are other things I'd also like to see in the bottom of every page when developing things - so yet another entry on my growing modules-to-do-list.

Edited September 25, 2012 by nik

DaveP · September 26, 2012

@nik +1 for the idea of a module to show debug info within a front end template. This was one of the things I found particularly helpful in CodeIgniter. (It shows things like $_GET & $_POST, db queries etc.) Probably beyond my skills, but would be really welcome if someone has the time to put something together.

ryan · September 26, 2012

Good ideas nik! I've added a new $config->dbCache variable. When set to boolean false (in your /site/config.php file) it adds the SQL_NO_CACHE in exactly the same way you posted. This will appear on the dev branch this week.

Sign In

Repeater performance problem - and a solution

Recommended Posts

nik

Link to comment

Share on other sites

ryan

Link to comment

Share on other sites

ryan

Link to comment

Share on other sites

nik

Link to comment

Share on other sites

DaveP

Link to comment

Share on other sites

ryan

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Similar Content

FieldtypeRepeater items accidentally removed from page. How to restore?

Selector works for admin but not for guest

[Solved] Title Field in Repeaters Not Saving???

Repeater field doesn't open (incorrect use / bug)

The STRANGEST BUG I have ever seen!!!

Browse

Activity

My Activity Streams

Store

My Details

Support