Jump to content

Performance problem, possibly a bug in Pages class?


nik
 Share

Recommended Posts

I've been diving into the core of PW trying to understand performance problems when there are lots of pages, especially ones with repeater fields in their templates. The problem is that creating a page takes seconds (rather than milliseconds) when certain conditions are met.

Setup #1

  • one container page, say /countries/
  • hundreds of subpages with children, /countries/<country>[/<state>]/<city>/<data> for example
  • create a new page, parent being /countries/

Setup #2

  • a large page hierarchy with not that many pages directly under one parent, but hundreds of pages with template(s) that each have the same repeater field
  • structure becomes flat again as repeaters use internally page hierarchy of this kind: /processwire/repeaters/for-field-123/for-page-456/1234567890-1234-1
    (--> .../for-page-nnn/... are all siblings even when the actual pages are not)
  • create a page with a repeater field used all over the site, parent is irrelevant

And of course I've got both of these on one site :).

Now, what I don't quite understand is what the Pages class method ___save() does just before calling some hooks in the end (from: wire/core/Pages.php):

if(($isNew && $page->parent->id) || ($page->parentPrevious && !$page->parent->numChildren)) {
 $page = $page->parent; // new page or one that's moved, lets focus on it's parent
 $isNew = true; // use isNew even if page was moved, because it's the first page in it's new parent
}
if($page->numChildren || $isNew) {
 // check if entries aren't already present perhaps due to outside manipulation or an older version
 $n = 0;
 if(!$isNew) {
$result = $this->db->query("SELECT COUNT(*) FROM pages_parents WHERE parents_id={$page->id}");
list($n) = $result->fetch_array();
$result->free();
 }
 // if entries aren't present, if the parent has changed, or if it's been forced in the API, proceed
 if($n == 0 || $page->parentPrevious || $page->forceSaveParents === true) {
$this->saveParents($page->id, $page->numChildren + ($isNew ? 1 : 0));
 }
}

So, for new (and moved) pages it always ends up calling saveParents() for the parent page of the page being saved. saveParents() then works recursively deeper into the tree calling itself for every child page with children of its own.

Given the setup #1 this means going through every .../<country>/ and .../<state> and .../<city> page there is. On the other hand, with setup #2 every single repeater instance (.../<for-page-nnn>/) with data inside will be handled. It doesn't really surprise me anymore this takes a while: saveParents() is deleting rows and inserting new ones into pages_parents for thousands of pages, unnecessarily if I'm right.

It does seem necessary to call saveParents() for the parent page if the page being saved is its first child (pages_parents has a row for each ancestor of the parent page, but not for the page or parent page itself), but the recursion goes all over the place. I think it should be somehow restricted to the new/moved page and its children (and so on) only. Also, the second argument to saveParents() gets a wrong value when $page refers to the parent page as numChildren has already been incremented for the parent page (though this doesn't actually break anything).

As long as new pages can't have children when they're saved for the first time, saveParents() shouldn't be called at all when saving a new page. If child pages may exist, it isn't of course that straightforward but still the saveParents() recursion should be restricted as stated above.

I hope my explanation makes sense to you. It's quite possible there is something laying under the hood that I just can't see now having studied this thing for a while (well, hours). But it sure seems like a bug to me :).

  • Like 9
Link to comment
Share on other sites

Nik, thanks for your efforts in debugging this. I'm impressed by the depth to which you've gone into this part. This is a pretty amazing first post. :)

As you might have guessed, the whole point of this code is to enable the has_parent selector (used by any $page->find() command) to support isolation of results to the structure within. The logic is a little tricky (and makes my head hurt) though thankfully the amount of code behind it is small.

I think you are definitely on to something here, and I need to get to where you are. Just to confirm, it sounds like what we have here is an unnecessary bottleneck, but not something more? (i.e. nothing causing incorrect values to be returned from API functions?) You mentioned a noticeable slowdown of seconds when creating a page. I've not experienced this before, despite what I think may be similar structures and scale on my own sites, so the first thing I need to do is reproduce the problem that led you into this. Is it possible for me to get a copy of the database that you are using where the problem appears? (email ryan at this domain or PM me). Or if not, can I see the site map (like one generated by this script). You mentioned countries, cities, states in a hierarchy, which I can certainly figure out how to re-create. But I've already got structures like that on almost every site I make, some which run into the tens of thousands of pages, so I may be missing something key.

The closer I can get to the data/structure you are actually working with, the better I can understand and isolate it. Once I fully understand it from your context, it sounds like I'll be able to reproduce it in any context. Have you tried making any code changes consistent with the items you brought up in an attempt to fix? If so, it'd be helpful for me to get a look at that too.

Once I can reproduce something similar to what you are getting, I'm going to put a debug timer at the beginning and end of the code segment you pasted, just to confirm that this is the place where the bottleneck is. You may have already done this, in which case let me know and I'll skip that part.

Thanks again for your efforts here,

Ryan

Link to comment
Share on other sites

Just to confirm, it sounds like what we have here is an unnecessary bottleneck, but not something more? (i.e. nothing causing incorrect values to be returned from API functions?)

Yes, it's just a bottleneck. Everything works right and data is correct in the database.

Is it possible for me to get a copy of the database that you are using where the problem appears? (email ryan at this domain or PM me).

The original database has customer data in it, but I created a test environment where I'm able to reproduce all of this. Working from a fresh installation of PW 2.2.2 this is how to get there (for those who are interested):

Setup #1

  • create page /countries/
  • measure time taken to save a new child page (./do.php test)
  • add 200 countries with 2 cities in each of them ./do.php populate)
  • measure time taken to save a new child page again (./do.php test)
  • --> page creation slowed down with a factor of 5-15 (varies of course depending on the environment)

Setup #2

  • install module "FieldtypeRepeater"
  • create a repeater field "dummy", add field "title" to it (not really a useful repeater...)
  • create a no-file template "basic-page-with-repeater" (duplicate fields from "basic-page")
  • add field "dummy" to template "basic-page-with-repeater"

  • create page /countries-repeater/
  • measure time taken to save a new child page (./do.php repeater-test)
  • add 100 countries with 50 cities in each of them (./do.php repeater-populate)
  • measure time taken to save a new child page again (./do.php repeater-test)
  • --> page creation slowed down with a factor of 100 or more

And here's the dirty little script (do.php) I'm referring to above. Place it in "site" directory (and fix the shebang if needed).


#!/usr/bin/php
<?php

include("../index.php");

switch($argv[1]) {
case 'populate':
	populate();
	break;
case 'test':
	test();
	break;
case 'repeater-populate':
	repeaterPopulate();
	break;
case 'repeater-test':
	repeaterTest();
	break;
default:
	echo "Unknown op {$argv[1]}\n";
	break;
}

function addPage($parent, $name, $tpl='basic-page') {
$p = new Page();
$p->parent = $parent;
$p->template = $tpl;
$p->title = $name;

$start = microtime(true);
echo "addPage $name\n";
$p->save();
$diff = microtime(true) - $start;	
printf(".. {$p->path} saved in %.6f seconds\n", $diff);

return $p;
}

function populate() {
$container = wire('pages')->get("/countries/");	

foreach(range(1, 100) as $countryN) {
	$country = addPage($container, "country-$countryN");
	foreach(range(1, 2) as $cityN) {
		$city = addPage($country, "city-$cityN");
	}
}
}

function test() {
addPage(wire('pages')->get('/countries/'), 'test');
}

function repeaterPopulate() {
$container = wire('pages')->get("/countries-repeater/");

foreach(range(1, 100) as $countryN) {
	$country = addPage($container, "country-$countryN", "basic-page-with-repeater");
	foreach(range(1, 50) as $cityN) {
		$city = addPage($country, "city-$cityN", "basic-page-with-repeater");

		$rep = $city->dummy->getNew();
		$rep->title = 'repeater title';
		$city->save();
	}
}
}

function repeaterTest() {
$test = addPage(wire('pages')->get('/countries-repeater/'), 'test', 'basic-page-with-repeater');
$rep = $test->dummy->getNew();
$rep->title = 'repeater title';
$test->save();
}

Ryan, I also emailed you the database dump with fields, templates, and container pages set up, so that you could work from exactly where I'm at (starting from the actual tests in both setups).

Have you tried making any code changes consistent with the items you brought up in an attempt to fix? If so, it'd be helpful for me to get a look at that too.

No, I haven't had time to go there yet. But I will do that unless you came up with a working solution before ;).

Once I can reproduce something similar to what you are getting, I'm going to put a debug timer at the beginning and end of the code segment you pasted, just to confirm that this is the place where the bottleneck is. You may have already done this, in which case let me know and I'll skip that part.

Well, yes. That's where I started from =). Here's a git diff for Pages.php with debug timers and output. Put it in place before populating and you'll notice the strange behaviour immediately.


diff --git a/wire/core/Pages.php b/wire/core/Pages.php
index c40042d..1b96c7c 100644
--- a/wire/core/Pages.php
+++ b/wire/core/Pages.php
@@ -565,7 +565,11 @@ class Pages extends Wire {
					}
					// if entries aren't present, if the parent has changed, or if it's been forced in the API, proceed
					if($n == 0 || $page->parentPrevious || $page->forceSaveParents === true) {
+							   $start = microtime(true);
+							   echo "  saveParents({$page->id})\n";
							$this->saveParents($page->id, $page->numChildren + ($isNew ? 1 : 0));
+							   $diff = microtime(true) - $start;
+							   printf("  ..took %.6f seconds\n", $diff);
					}
			}

@@ -660,7 +664,11 @@ class Pages extends Wire {
					);

			while($row = $result->fetch_array()) {
+					   $start = microtime(true);
+					   echo "	recursion: saveParents({$row['id']})\n";
					$this->saveParents($row['id'], $row['numChildren']);	
+					   $diff = microtime(true) - $start;
+					   printf("	..took %.6f seconds\n", $diff);
			}
			$result->free();

As you can see, saveParents() gets called extensively. There rows get first deleted and the inserted again to table pages_parents. I checked the rows in that table before and after inserting a new page with setup #2 and the only difference was parents information for the newly created page - as expected.

Niklas

  • Like 3
Link to comment
Share on other sites

Niklas,

Thanks this has been hugely helpful. Your test scripts and database really saved me a lot of time and helped me to get a handle on the issue quickly. Actually your do.php was a pretty outstanding example of using the API with a shell script for automation. I think that I've now got it fixed, though need more eyes on it to be sure. But here are the page creation times with the fixes implemented:

Setup 1

do.php test BEFORE populate

.. /countries/test7807394638/ saved in 0.008997 seconds

.. /countries/test1201072181/ saved in 0.010174 seconds

.. /countries/test4845661424/ saved in 0.010104 seconds

do.php test AFTER populate

.. /countries/test13939a4379/ saved in 0.010184 seconds

.. /countries/test8963721816/ saved in 0.010156 seconds

.. /countries/test8775327378/ saved in 0.009929 seconds

Setup 2

do.php test-repeater BEFORE repeater-populate

.. /countries-repeater/test1328343619/ saved in 0.027560 seconds

.. /countries-repeater/test1178892434/ saved in 0.029362 seconds

.. /countries-repeater/test1854160826/ saved in 0.029181 seconds

do.php test-repeater AFTER repeater-populate

.. /countries-repeater/test2117723102/ saved in 0.031516 seconds

.. /countries-repeater/test1829907000/ saved in 0.031705 seconds

.. /countries-repeater/test1610663476/ saved in 0.031419 seconds

I didn't take comparisons before, as I went in straight to work once I confirmed there was an issue. However, you had mentioned it would take measurable seconds before, so I'm thinking this is a major improvement? I'm also not surprised by the very slight increase in time after repeater-populate, given that repeater-populate added more than 30,000 pages to the system. One thing is for sure, doing 'do.php populate' is exponentially faster than before.

Page names have random numbers in them to prevent ProcessWire's from consuming time in a loop trying to make the page name unique (skewing the numbers). So rather than making each page name 'test', I made it 'test'.mt_rand();

The repeater test pages take at least 2x as long to create as the non repeater test pages because the repeater tests are populating a repeater item on every page creation, which the other test isn't doing. Basically it's creating 2x the number of pages on each call with the repeater test.

The Fix

ProcessWirePagesParentsFix.zip

Can you try replacing your /wire/core/Pages.php and /wire/core/PageFinder.php with the files attached? Please let me know your results. These fix it for me, but this is something with broad scope and it definitely needs more eyes on it.

I honestly don't know what I was thinking when I wrote the piece of code you originally quoted from Pages.php. It may have worked in terms of producing accurate results, but had a real legibility problem, and of course the bottleneck problem. Part of the issue was that pages_parents didn't originally come with ProcessWire, so it was added in later. We had to deal with situations where people might not yet have a pages_parents index, which led to some code that doesn't make as much sense in today's context. I went ahead and rewrote that part, among some other details.

Test Package

Because some of this logic had to be rewritten, I'm naturally concerned about breaking stuff. So I put together a quick test package that runs through various tests to confirm that it's returning all the results it should. This test script assumes the setup you've described with the /countries/ and cities below them. It tries moving around a few pages and executing find()s and confirming it's returning the right pages. I'm including here just in case you or anyone else wants to help test or can think of other tests we should perform with it. This should be placed in /site/ like the do.php script.

#!/usr/bin/php
<?php

include("../index.php");

/**
* Perform a find test calling $parent->find($selector)
*
* If the quantity found is different from the provided $expectedCount
* then the test fails.
* 
* @param Page|string $parent Parent page to call find() from, either Page or path (string) to it.  
* @param string $selector Selector string to use in the find()
* @param int $expectedCount Quantity required to PASS the test.
*
*/
function findTest($parent, $selector, $expectedCount) {
 $expectedCount = (int) $expectedCount; 
 if(is_string($parent)) $parent = wire('pages')->get($parent);
 echo "\n\t+ pages.get({$parent->url}).find($selector); ";
 $matches = $parent->find($selector);  
 $numMatches = (int) count($matches);
 if($numMatches == $expectedCount) {
   echo "PASS ($expectedCount)\n";

 } else {
   echo "*****FAILED***** ($numMatches != $expectedCount)\n";
 }
}

/**
* Move $page to $parent
*
* @param Page $page
* @param Page|string $parent May be parent page object or path to it (string)
*
*/
function movePage($page, $parent) {
 if(!is_object($parent)) $parent = wire('pages')->get($parent);
 echo "\n\n----------------------------------------------------------";
 echo "\nMoving {$page->path} to {$parent->path}{$page->name}/\n";
 $page->parent = $parent; 
 $page->save();
}

/**
* Start the tests
*
*/

$page = wire('pages')->get('name=country-97');
$selector = "name={$page->name}, has_parent!=/countries-repeater/";

movePage($page, '/');
findTest('/', $selector, 1);
findTest('/countries/', $selector, 0);

movePage($page, '/countries/');
findTest('/', $selector, 1);
findTest('/countries/', $selector, 1);

movePage($page, '/countries/country-99/');
findTest('/', $selector, 1);
findTest('/countries/', $selector, 1);
findTest('/countries/country-99/', $selector, 1);

movePage($page, '/countries/country-99/city-1/');
findTest('/', $selector, 1);
findTest('/countries/', $selector, 1);
findTest('/countries/country-99/', $selector, 1);
findTest('/countries/country-99/city-1/', $selector, 1);
findTest('/countries/country-99/city-2/', $selector, 0);
findTest('/countries/country-96/', $selector, 0);
findTest('/countries/country-95/city-1/', $selector, 0);

movePage($page, '/countries/');
findTest('/', $selector, 1);
findTest('/countries/', $selector, 1);
findTest('/countries/country-99/', $selector, 0);
findTest('/countries/country-99/city-1/', $selector, 0);
findTest('/countries/country-99/city-2/', $selector, 0);
findTest('/countries/country-96/', $selector, 0);
findTest('/countries/country-95/city-1/', $selector, 0);

$countries = wire('pages')->get('name=countries');
movePage($countries, '/about/'); 
findTest('/countries/', $selector, 0);
findTest('/about/', $selector, 1); 
findTest('/about/countries/country-99/', $selector, 0); 
findTest('/about/countries/', $selector, 1); 

movePage($countries, '/');

echo "\n";

Thanks again for your help in finding and debugging this issue.

  • Like 1
Link to comment
Share on other sites

Ryan,

Great, it seems to me you nailed it! My own tests now execute as they should, not slowing down anymore as there are more pages. And the new code in Pages.php makes more sense to me as well. So at least no obvious flaws there I can find.

Creating pages with repeaters does take more time than creating pages without them, but as you said that's expected behaviour. This has a huge impact on the project I've been working on. Actually, I had to drop repeaters from first version because of this performance issue. I think I didn't mention it in the beginning, but on top of being unbearably slow, memory consumption was not at an acceptable level either when handling large amount of pages. I haven't tested this part yet, but I'll sure be telling you if this didn't get fixed as well (wouldn't be that surprising if it actually did). Anyway, now I'm able to return to my original plan and utilize repeaters the way they deserve to be utilized.

I also ran your test package successfully. One comment on the script though. The very first get()-call trusts the data has been populated in right order:

$page = wire('pages')->get('name=country-97');

If "do.php repeater-populate" is run before "do.php populate", this will return country-97 under /countries-repeater/ and fail at the second move (Chosen parent '/countries/' already has a page named 'country-97'). So has_parent=/countries/ or something else needs to be added there - or of course one can populate everything in the right order..

I have to say I really like the work you're doing with ProcessWire - keep it up. Once I managed to get the issue out in the public, it got instantly taken care of. So, thanks!

More posts to come from here now that I've started :). I'm glad to be of help wherever I can.

  • Like 1
Link to comment
Share on other sites

Just wanted to say that the memory consumption issue I mentioned in my previous post most probably was about having debug mode on (as I usually do on a development site). So that part seems to be ok as well. And thanks go to Ryan once again - just read about this from a tweet by @processwire :).

  • Like 1
Link to comment
Share on other sites

Great, it seems to me you nailed it! My own tests now execute as they should, not slowing down anymore as there are more pages. And the new code in Pages.php makes more sense to me as well. So at least no obvious flaws there I can find.

Great! Thanks for testing it out and reporting back. I'm going to give it a couple more days here to test before I move it into the public source, but so far so good. Likewise, please let me know if you run into any concerns.

More posts to come from here now that I've started :). I'm glad to be of help wherever I can.

Thanks, we are lucky to have you here!

Just wanted to say that the memory consumption issue I mentioned in my previous post most probably was about having debug mode on (as I usually do on a development site). So that part seems to be ok as well. And thanks go to Ryan once again - just read about this from a tweet by @processwire

Your do.php repeater-populate is actually what prompted me to write that tweet. :) I ran out of memory before it could create all the pages. Then I realized I had debug mode on and it made sense why it was running out of memory. Now I'm debating putting a limit on the number of queries it'll save in memory for debugging purposes, to prevent this problem.

Link to comment
Share on other sites

  • 3 weeks later...

Just a note for those following this thread : The fix was committed by Ryan on 07/22/12.

https://github.com/ryancramerdesign/ProcessWire/commit/cca37369a0c68a76f6e824430e2b0b57f77dac54

Nice work Nik and a quick fix / improvement by Ryan. @Ryan - maybe you should change Niks' forum status from "Newbie" to something more reflective of his skill. A manual override. ;)

  • Like 1
Link to comment
Share on other sites

Nice work Nik and a quick fix / improvement by Ryan. @Ryan - maybe you should change Niks' forum status from "Newbie" to something more reflective of his skill. A manual override. ;)

Well thank you Jeff. I think I'll just have to write a couple of more posts here at the forum to get rid of that status - but not only nonsense like this one :).

  • Like 1
Link to comment
Share on other sites

You can actually change it by editing your profile if you like - click on your name at the top-right of the page, then on My Profile, then click Edit My Profile near the top right of that screen and you can change your Member Title halfway down the page :)

Link to comment
Share on other sites

@Pete: Either I'm blind or Member Title isn't shown there for me. Could it be that you see it as your an admin..? Not that I was about to change it, just being curious ;).

Nope, it's not just that apparently (@adamkiss). Maybe it's not there for Newbies? Or plain Members?

Well, one post closer to become a Jr. Member again...

Link to comment
Share on other sites

Nope, there's nothing between "Edit my About Me page" button and "My Birthday" dropdowns. At least there wasn't before this very post :D.

Edit: and there still isn't. But I think it's a good thing after all. Let moderators and admins be whatever they want though :).

Link to comment
Share on other sites

  • 7 years later...

I know this is ancient, but I've also run into a bottleneck at PagesEditor::saveParents() on one of my projects running PW 3.0.123. 

The scenario is as follows:

1. I am cloning a single page (not recursively) which is in a list with several thousand siblings. 

2. I am manually looping through the original page's children and cloning them over as children of the new page. (I am doing this manually because sometimes I don't want to clone some of the children)

I've timed this entire process at around 7 seconds for a page with just 2 children. Using the Debug timer, I discovered that the initial page and its first child clone over at just .1s each, but the second child clone (and any subsequent children) each take around 7 seconds to complete!

Although I am manually looping through the children and cloning them, using PW's recursive clone seems to encounter the same issue. In fact, my test came in at 8 seconds for the same set of pages.

The relevant code begins at line 733 of PagesEditor.php where saveParents() is called and passed the parent’s parent when a page has changed parents and it has more than one sibling. It seems that the code is recursing into all of the parent page's thousands of siblings, perhaps unnecessarily.

It looks like this line was added in at some point after the fix ryan provided in this thread.

  • Like 1
Link to comment
Share on other sites

I found the commit where that line was added, and the issue that it was related to:

Commit: https://github.com/ryancramerdesign/ProcessWire/commit/8d126246772633be773d72f5262acdf14f4c1e31#diff-bbe4731226c86c286f0e4e95a4756eda

Issue: https://github.com/ryancramerdesign/ProcessWire/issues/1942

One question that comes to mind is whether a clone should be considered a "moved" page (have a parentPrevious property set), since the original page still exists. If it were considered new instead of moved, I think the line in question would not apply.

It still seems like there is a lot of overhead here for a move operation. But I still need to study the issue report linked above more closely. 

  • Like 3
Link to comment
Share on other sites

  • 9 months later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...