Internal optimisation

nicolasbui · January 21, 2011

Hi,

First I would like to congratulate you for the great work. It's a very good idea I was thinking of but never had the time to do The results of your work is really nice.

My wish for the future is about the internal model. Actually you are using a simple Adjacent Tree Model for storing page tree and it work (that's the most important actually!).

My wish is to switch this model to an Nested Set model, as I guess we can get a real boost in queries performance.

Best regards,

Nicolas

Adam Kiss · January 21, 2011

Just a note from wiki about performance

Queries using nested sets can be expected to be faster than queries using a stored procedure to traverse an adjacency list, and so are the faster option for databases which lack native recursive query constructs, such as MySQL[4]. However, recursive SQL queries can be expected to perform comparably for 'find immediate descendants' queries, and much faster for other depth search queries, and so are the faster option for databases which provide them, such as PostgreSQL[5], Oracle[6], and Microsoft SQL Server[7].

Seems nice.

ryan · January 21, 2011

Nicolasbui,

Thanks for your feedback. I am impressed at the level with which you've looked into ProcessWire. I am sure there is lots of room for optimization and improvement in many parts of the software, as the code is all quite young. And finding ways to optimize it is one of the most satisfying parts in my opinion. It's also one of those things that I am very thankful to have people like yourself helping me with. I would like the software to be as optimized as possible.

I agree with you in theory about using a nested set model. While I've not spent a lot of time using the nested set model, I am fascinated by it. But let me explain why I believe the adjacency tree model is preferable in ProcessWire. Then, please correct me if you think I'm wrong on any points.

• I view the nested set model as a compromise for the structure because it would take it far away from the methodology of the API. Currently, the structure matches the terminology and methodology of the API, maintaining a good level of consistency between DB and API. I believe this makes it more accessible for one to understand should they want to break out of the API and into MySQL. While it's rare for someone to need that in producing a site, I think it may be more common for people developing modules.

• With the queries used for page rendering and common API use, there are very few in ProcessWire that would actually benefit from the nested set model (at least from my understanding of it, and in the sites I produce). And, I may be wrong on this, but I can only identify one page render dependent query that would see a possible performance boost, and this query is most commonly executed only once per request. Not to mention, it's already quite fast, executing in 0.0003s (average, on my server with a page 3 levels deep).

• On the sites that I run, the database is rarely the bottleneck. Of course, this will depend on how you are using the API. But most of ProcessWire's queries avoid the tree entirely and are instead focused on the fields that make up a page.

• Should the query mentioned above ever become a bottleneck due to a change in the API or new needs, etc., the plan is to index page paths in the database or in the file system cache. While a compromise to support speed over normalization, the result would be faster than the nested set model. Of course, it would also be more expensive when moving a page, but an okay compromise.

• I think the nested set model may show measurable performance improvement in ProcessWire with sites that have very deep hierarchies. However, such a level of depth is unusual on web sites and not that practical to manage. I don't believe I've ever gone more than 4 or 5 levels deep in a hierarchy, even on a very large site. Even if you see a URL deeper than that (at least on my sites), there's a good chance it's one or more urlSegments used for branching and not an actual page in the hierarchy.

These are the reasons why I think the current model is preferable in ProcessWire, but there may be more benefits to the nested set model applicable to ProcessWire than I realize. Let me know your thoughts. I'm always interested in creating a new branch to experiment if it makes sense. Thanks again for taking the time to look at ProcessWire in this depth.

Thanks,

Ryan

nicolasbui · January 21, 2011

@adamkiss: Yeah, I was thinking of this solution too ... and MySQL 5.1 can do nested queries, isn't it ?

@ryan: thanx for your reply I fully understand your decision on using Adjacent model. It will surely work nice of almost all situation.

I was thinking a bit too far on what I wish I would do in the future. And I can sometime have some very very deep level.

Actually the ProcessWire semantically treat pages as "page" not as node. So after looking into the schema, I think that page can have multiple parents ?

So it's clear that Adjacent is made for It. I would be a headache to use a Nested Set.

As promoted as a jquery like API, I keeped thinking page as nodes (did you mean that ? that would be a great challenge to drupal) so I would be able to select multiples pages at once using a selector then doing a processing job on it... Per example quickly regenerate a big sitemap, creating an index access of an virtual book or sections (Some of my clients have very very big deep level).

But actually as you said, it fit the system and it completly made sens !

And is PHP 5.3 on your roadmap ?

Best regards

Nicolas BUI

ryan · January 21, 2011

Actually the ProcessWire semantically treat pages as "page" not as node. So after looking into the schema, I think that page can have multiple parents? So it's clear that Adjacent is made for It. I would be a headache to use a Nested Set.

A page can have multiple parents in the hierarchy (parent, grandparent, great grandparent, etc.) of course, but each page only has one direct parent. On the other hand every page can have any number of relations which are also pages. You could think of them as parents or categories but the use/terminology depends on what the relations are used for.

As promoted as a jquery like API, I keeped thinking page as nodes (did you mean that ? that would be a great challenge to drupal) so I would be able to select multiples pages at once using a selector then doing a processing job on it... Per example quickly regenerate a big sitemap, creating an index access of an virtual book or sections (Some of my clients have very very big deep level).

Pages are essentially the same thing as nodes in Drupal. Though I don't like the term 'node' because it confuses the heck out of my clients (the people that ultimately have to use the CMS day to day). You can use a page for whatever you want, it doesn't have to be a literal page on the site. Though every page has a URL, which you can think of as it's GUID. Whether you choose to render content at that URL or not is up to you. I prefer URLs as a globally-unique identifier (GUID) because that's the way search engines treat them, possibly even penalizing the same content at multiple URLs (a part of my full time job is search accessibility and optimization). But of course ProcessWire will let you pull content from anywhere and do whatever you want to with it, but it at least associates that content as having a primary association with a page's URL.

You are right that you can select multiple pages using any of it's properties, and then process them in any way you want. To generate a site map, see these:

http://processwire.com/talk/index.php/topic,26.0.html

http://processwire.com/api/include/

Note that there is definitely overhead with what ProcessWire does in translating selectors to queries, finding them, and creating resulting Page objects. I think it's well worth it. But if you are used to selecting just what you want in SQL and generating a giant site map directly from that, you might be disappointed in the performance if dealing with lots and lots of pages (and/or lots of autojoined fields on those pages). If I'm generating a site map for a large site, I'm usually caching the output so that it doesn't have to be generated on each view.

While I've done my best to optimize all of this, I'm sure there is still lots of room for optimization. For instance, I am pushing an update today or tomorrow for better caching selectors and their resulting PageArrays. I'm always tweaking this to make it better/faster, and the more eyes on it, I think the better performance we'll get out of it.

And is PHP 5.3 on your roadmap ?

Yes, namespacing is badly needed for those that plan to include ProcessWire for it's API from other apps and CMSs. But to do that, we have to drop PHP 5.2 support, and I don't think it's safe to do that quite yet.

Thanks,

Ryan

Robert Zelník · August 22, 2011

I am from a Drupal world, but I understand that the term 'node' is confusing, so I understand the use of the term 'page' instead. Anyway, there is another confusion for me now: How can we differentiate between a general 'page' (aka Drupal's node) and a particular page type also called a 'page' (one of the default templates used in a new installation)?

apeisa · August 22, 2011

There are at least one more case where similar "problem" occurs:

Template (as a content type)

Template (as a template file).

Page (as a node)

Page (as a field type)

I am not sure how much this is a problem. I myself have started to talking about templates and template files. It works fine if person somehow knows what template is. But for everyone outside pw they think templates as template files. I had conversation full of misunderstanding when I tried to explain how user access in pw 2.1 works:

"It is configured at the template level" "At the template level? Sounds interesting..." "Well, that is actually pretty good, since there is much less templates than there is pages." "Yes, I understand, but templates... doesn't sound good to me have ua logic in your template files." "Ah, not in template files, but in templates!" "Hmm... not sure if I understand. Doesn't make any sense to me." "Templates are like content types in Drupal" "Ah, now I see."

I haven't had similar conversation yet with pages and page field type, but that day probably comes. We have discussed about these issues before here in the forums, and I think Ryan had good reasoning for this. I really like that pw is not bloated with different terms. That is probably one of the main reasons that pw is so easy to learn.

ryan · August 22, 2011

How can we differentiate between a general 'page' (aka Drupal's node) and a particular page type also called a 'page' (one of the default templates used in a new installation)?

Perhaps we shouldn't have a template called 'page' in the default profile. That was just the naming convention I used there to mean that it was for use by any generic/general/non-specific page. But it has no real relation to the PW term 'page', other than that. And thinking about it more, Drupal also has a template-file called 'page.php', which likewise means something different.

You are right, this is a bit confusing. Thanks for pointing it out. I'm going to rename that 'page' template to something else (maybe 'general') in the default profile.

apeisa · August 22, 2011

Oh, I missed the page template. Renaming that will definitely make things easier.

Robert Zelník · August 22, 2011

My suggestions for default template name: simple-page, basic-page, static-page...

ryan · August 23, 2011

Good suggestions, I think these are even better (will use one of them).

Robert Zelník · August 23, 2011

Btw, template vs template file is confusing also for me. Check this:

http://processwire.com/api/templates/ <-- URL

Template Files <-- Title

These template files are located in this directory: /site/templates/. <-- files location

It is obvious for me to think templates as files for rendering the web site layout (View in MVC model). Wouldn't it be better to let the name "template" for template files and rename the current "templates" to "page structure", "page configuration" or something like that?

Pete · August 23, 2011

Essentially in the backend templates are just "forms", but I was less confused about template files and admin templates personally because they're directly related. That might be my brain just being able to work that out easier.

apeisa · August 23, 2011

Terminology is funny thing - first I felt bad for saying pw templates as templates. I felt those are content types (or maybe "page types"). But after using pw for a while it feels natural and good. Only occasion when this causes problems is when using that jargon with someone who doesn't know pw, but have used other cms (like my example dialogue above).

That is pretty minor concern, and having "duplicates" keeps number of terms needed. And I say "duplicates" because I feel that "Template" and "Template file" are different enough - especially because they are so tightly related.

ryan · August 23, 2011

Different CMSs use different terminology. My opinion is that I'm not comfortable letting another CMS define our terminology. I think we have to stick with the actual definitions of terms. So here is the dictionary:

tem·plate
noun /ˈtemplət/

templates, plural

I can't think of any term that better describes what Templates are in ProcessWire:

1. A shaped piece of metal, wood, card, plastic, or other material used as a pattern for processes such as painting, cutting out, shaping, or drilling

Oops, not that one, sorry This one:

2. Something that serves as a model for others to copy

...and...

3. A preset format for a document or file, used so that the format does not have to be recreated each time it is used

This is what Templates are in PW. A template in PW also has these components:

- fields

- file

- access

- urls

- cache

One large component of a template in PW is it's file. I think the term for that naturally should be: "template file". It could also be "output file", but a template file doesn't necessarily have to produce output so I don't want to use a term with any built-in limitations. Of course, the term "template file" also fits with the definitions above when qualified with "file". A template file also has a natural dependency of the fields used by the page (template) it is processing. The two are interconnected and part of the same thing. I think that changing the terminology or splitting these things up is unnecessary complexity. Some people may want to split them on occasion for technical reasons (and PW will let you in advanced mode), but I want to always target simplicity first. My opinion is that terminology like "template and "template file" fit the definitions and relate to each other in the appropriate manner. But I'll also qualify that by saying that it's also what I'm used to, and I think we all naturally prefer what we're used to.

ryan · August 23, 2011

Sorry I didn't read high up enough on the page before and missed this:

http://processwire.com/api/templates/ <-- URL
Template Files <-- Title

These template files are located in this directory: /site/templates/. <-- files location

I understand what you are saying here and these are good points. But I want to clarify that the file is a component of a template. They are part of the same thing. It's like a car and it's engine. When we are on the file system, it's assumed we are dealing with files. If we open the hood on the car it's assumed we're going to find an engine… yet we're still talking about a car. When you are working with template files, you are working with the template's representation on the file system. There is no other representation of a template on the file system. Given that, I don't think it matters whether we call it ./templates/, /files-for-templates/ or ./template-files/, so I'll choose whatever is easiest to type.

In that URL on processwire.com, I think it's more just a matter of trying to make URLs are short as possible. Template-files would technically be a more accurate fit in that URL. Though everything you do with the API takes place in files, and /api/templates/ at least provides context to the term. But I agree that template-files would have been a better URL name here. But I'm also not sure it matters enough to change it (already been there for a long time).

Sign In

Internal optimisation

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members