ryan Posted May 15, 2020 Share Posted May 15, 2020 This week I was back to focusing on the core and got quite a lot done. A lot of GitHub issue reports were resolved, plus several minor tweaks and additions were made in 3.0.156 as well. But the biggest update was the addition of the $pages->parents() API, which is something that I think you’ll likely have zero use for (and why I’m not putting it into a blog post) but something that the core itself will use quite a bit, and is a really nice improvement for the system and its scalability. So if you don’t mind some technical reading, read on. Whenever you call a $page->find() method ($page, not $pages) or use a “has_parent=“ in a selector, ProcessWire joins in a special table for the purpose called pages_parents. It uses this table to keep track of family relationships that aren’t otherwise apparent. For instance, let’s say we have page “g” that lives at path /a/b/c/d/e/f/g/. Page “g” only knows that it has page “f” as its parent. It doesn’t know that page “e” is its grandparent unless or until page “f” is loaded. Once “f” is loaded, then “f” can reveal its parent “e”. It works the same for every relationship down to the root parent “a”. So the pages are like a linked list or blockchain of sorts, where only 1 relationship forward or backward is known per page. The “pages_parents” table fills in this gap, enabling PW to quickly identify these relationships without having to load all the pages in the family. This is particularly useful in performing find() operations that you want to limit to a branch started by a particular parent. It’s the reason why we have both $pages->find() that searches the entire site, and $page->find() that limits the search within the branch started by $page. I haven’t paid much attention to the code behind this pages_parents table because it generally just worked, needing little attention. But I came across a couple of cases where the data in the table wasn’t fully accurate with the page tree, without a clear reason why. Then I became aware of one large scale case from a PW user where it was a huge bottleneck. It involved a large site (250k+ pages) and a recursive clone operation that appeared to involve hundreds of pages. But that operation was taking an unreasonable 10 minutes, so something wasn’t right. It came down to something going on with the pages_parents table. Once I dove into trying to figure out what was going on, I realized that if I was to have any chance of keeping track of it, we needed a dedicated API for managing these relationships and the table that keeps track of them. So that’s what got a lot of attention this week. While still testing here, it does appear initially that the 10 minute clone time has gone down to a few seconds, and everything about this relationship management is now rewritten, optimized and significantly improved. It was a lot of work, but absolutely worth it for PW. Rebuilding the entire table from scratch now takes between 2-3 seconds on a site with 250k pages and 150k relationships. The new API can be accessed from $pages->parents(). This API is really useful to the work that I do here (maintaining the core) but I’ll be honest, it’s probably not useful to most others, so I won’t go into the details here, other than to say I’m happy with it. But if you are interested, there are methods finding all the parents in a site, or a particular branch, and methods for rebuilding the pages_parents table, among others. Maybe more will be added to it later that actually would be useful in the public API, but for now I’ll likely leave it out of our public API docs. The $sanitizer->selectorValue() method also got a full rewrite this week (actually, one the last few weeks). It’s now quite a bit more comprehensive and configurable (see the new method options). The previous version was just fine, and actually still remains — you can use it by specifying [ ‘version’ => 1 ] in the $options argument to the selectorValue() method. But the new version is coded better and covers more edge cases, plus provides a lot more configurability for the times when you need it. 22 6 Link to comment Share on other sites More sharing options...
Kiwi Chris Posted May 16, 2020 Share Posted May 16, 2020 That's quite a timely update. Just this week I've been working on a site where there are a lot of selectors that depend on various levels of parent/child relationships, and I was wondering about performance, so if this area of Processwire has had a big performance improvement it's very well timed. 2 Link to comment Share on other sites More sharing options...
Raymond Geerts Posted May 18, 2020 Share Posted May 18, 2020 This is great news for our dev team. I can image it took some effort to dig into this but i am glad you found the bottle neck. I can imagine, when the parents table will contain less page parent references, this might give an overall performance boost with large sites that have a lot of repeater (matrix) pages. 3 Link to comment Share on other sites More sharing options...
thetuningspoon Posted May 22, 2020 Share Posted May 22, 2020 Thank you Ryan! The slow rebuild of the page parents table after doing a clone is something that we've run into on one or two of our projects and I'm delighted to hear that this has been addressed. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now