Jump to content
ryan

PW 3.0.156 – Core updates

Recommended Posts

This week I was back to focusing on the core and got quite a lot done. A lot of GitHub issue reports were resolved, plus several minor tweaks and additions were made in 3.0.156 as well. But the biggest update was the addition of the $pages->parents() API, which is something that I think you’ll likely have zero use for (and why I’m not putting it into a blog post) but something that the core itself will use quite a bit, and is a really nice improvement for the system and its scalability. So if you don’t mind some technical reading, read on. 

Whenever you call a $page->find() method ($page, not $pages) or use a “has_parent=“ in a selector, ProcessWire joins in a special table for the purpose called pages_parents. It uses this table to keep track of family relationships that aren’t otherwise apparent. 

For instance, let’s say we have page “g” that lives at path /a/b/c/d/e/f/g/. Page “g” only knows that it has page “f” as its parent. It doesn’t know that page “e” is its grandparent unless or until page “f” is loaded. Once “f” is loaded, then “f” can reveal its parent “e”. It works the same for every relationship down to the root parent “a”. So the pages are like a linked list or blockchain of sorts, where only 1 relationship forward or backward is known per page. The “pages_parents” table fills in this gap, enabling PW to quickly identify these relationships without having to load all the pages in the family. This is particularly useful in performing find() operations that you want to limit to a branch started by a particular parent. It’s the reason why we have both $pages->find() that searches the entire site, and $page->find() that limits the search within the branch started by $page. 

I haven’t paid much attention to the code behind this pages_parents table because it generally just worked, needing little attention. But I came across a couple of cases where the data in the table wasn’t fully accurate with the page tree, without a clear reason why. Then I became aware of one large scale case from a PW user where it was a huge bottleneck. It involved a large site (250k+ pages) and a recursive clone operation that appeared to involve hundreds of pages. But that operation was taking an unreasonable 10 minutes, so something wasn’t right. It came down to something going on with the pages_parents table. 

Once I dove into trying to figure out what was going on, I realized that if I was to have any chance of keeping track of it, we needed a dedicated API for managing these relationships and the table that keeps track of them. So that’s what got a lot of attention this week. While still testing here, it does appear initially that the 10 minute clone time has gone down to a few seconds, and everything about this relationship management is now rewritten, optimized and significantly improved. It was a lot of work, but absolutely worth it for PW. Rebuilding the entire table from scratch now takes between 2-3 seconds on a site with 250k pages and 150k relationships. The new API can be accessed from $pages->parents(). This API is really useful to the work that I do here (maintaining the core) but I’ll be honest, it’s probably not useful to most others, so I won’t go into the details here, other than to say I’m happy with it. But if you are interested, there are methods finding all the parents in a site, or a particular branch, and methods for rebuilding the pages_parents table, among others. Maybe more will be added to it later that actually would be useful in the public API, but for now I’ll likely leave it out of our public API docs. 

The $sanitizer->selectorValue() method also got a full rewrite this week (actually, one the last few weeks). It’s now quite a bit more comprehensive and configurable (see the new method options). The previous version was just fine, and actually still remains — you can use it by specifying [ ‘version’ => 1 ] in the $options argument to the selectorValue() method. But the new version is coded better and covers more edge cases, plus provides a lot more configurability for the times when you  need it. 

  • Like 21
  • Thanks 6

Share this post


Link to post
Share on other sites

That's quite a timely update. Just this week I've been working on a site where there are a lot of selectors that depend on various levels of parent/child relationships, and I was wondering about performance, so if this area of Processwire has had a big performance improvement it's very well timed.

  • Like 2

Share this post


Link to post
Share on other sites

This is great news for our dev team. I can image it took some effort to dig into this but i am glad you found the bottle neck.

I can imagine, when the parents table will contain less page parent references, this might give an overall performance boost with large sites that have a lot of repeater (matrix) pages.

  • Like 3

Share this post


Link to post
Share on other sites

Thank you Ryan! The slow rebuild of the page parents table after doing a clone is something that we've run into on one or two of our projects and I'm delighted to hear that this has been addressed. 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...