Jump to content
SiNNuT

(Template) Caching

Recommended Posts

I'm working on a site in Processwire that has a lot of pages that don't change frequently. Although they seem to render fast enough without caching it seems optimal to cache these pages. Enabling caching works fine, but whenever i update a page the cached version doesn't get deleted. I could delete the cached file manually but this isn't really an option when a client eventually administers the site.

I don't know how caching is implemented but would it be possible to have some sort of 'on update delete cached file' functionality?

Share this post


Link to post
Share on other sites

Hello,

does it 'fail'? Or the cache file has still the same time properties? (because the save process may do a new cahce file in the very saving process)

Or how it's site reacting (the cache file actually may be deleted/rewritten on the first visit of updated page)?

Share this post


Link to post
Share on other sites

Not sure if it fails, because i don't know the way it is supposed to work. It saves a cache file on first page visit. No matter if you update the page later on , the cache file stays the same (i guess until the set cache time is expired). Would be nice if it is aware of changes. On update a page with certain id, delete the associated cache file.

Share this post


Link to post
Share on other sites

Is there easy way to add more intelligence to cache? Meaning stuff like "if page with template news-post is saved, clear cache from page /news/"?

Might be good for simple module, but these could be settings also on template -> cache tab? What you guys think? I think that most beneficial page to cache is homepage, but not sure how to do it now, since it usually pulls data from many other pages.

PS: I really like the possibility to avoid cache with predefined post & get variables! Though it might be good idea to always disable cache, if there is "CommentForm_submit" POST variable available. Or does that open doors for DoS-attack?

Share this post


Link to post
Share on other sites

On a large site, there can really be a lot of cache files. And every page can support up to 1k cache files for URL segments and page numbers. So as the scale increases, it can really slow down the save to selectively clear some stuff and not others. I've found that good compromises are:

1. Use low cache times with the current system on pages you don't want to risk having old content (seconds or minutes rather than hours). 

2. Or, Set the cache to wipe entirely on every page save.

The second option was what PW1 used. It can be done without much overhead because PW's cache looks in a "lastgood" file that has a mtime timestamp of when the cache was last considered good. Any cache files older than the date of that file are considered expired, whether they exist or not. So PW can uncache everything just by updating the mtime of that one file.

Given the above, it would be relatively easy for me to add an option to the template cache settings that says "When a page using this template is saved, clear: 1) this page's cache file; or 2) cache files from all pages." Anything beyond that could involve significantly more overhead, short of major changes to the current CacheFile class (which can certainly be done in the future).

Share this post


Link to post
Share on other sites

I see what you mean with overhead. And that would be kind of micro management anyway. I really like the option to wipe whole cache.

Share this post


Link to post
Share on other sites

Sounds good, I'll plan to implement this – I think we're already almost there.

Share this post


Link to post
Share on other sites

The second way is how it works in MODx - well, their older branch at least. I do agree that it could do with a template-based option as well as a lot of the time it is only a few pages that might change as you guys say.

Share this post


Link to post
Share on other sites

Wiping the entire cache on every page save is a good idea to at least provide. In a system like PW, a given page might pull from several others that are determined at runtime. So there's no way the CMS can know all the possible interrelationships ahead of time. As a result, expiring the cache on every page save is the only way to know for sure that the site is delivering fully up-to-date pages. On the other hand, it's rare that I actually need the entire cache to expire on every save... and if I really need it for some reason, I go to Modules > Page Render > Clear Cache. But caching can be a difficult concept for a client to really understand sometimes, so having the entire cache expire on every page save can reduce the support burden. So in one of these near-term commits, I'm going to go ahead and add an option to the Template editor that says:

When I save a page:

    1. Expire the cache for the saved page only.

    2. Expire the cache for all pages.

Being able to specify that at the template level will provide a lot of flexibility.

Share this post


Link to post
Share on other sites

Just an idea that came up today. At lukio.fi site (which gets good amount of traffic, can't tell you since haven't asked permission) I cache normal content pages pretty heavily (for one week). They don't have anything dynamic, but the navigation menu. If you remove or add children, then it causes problems on the page.

So instead of clearing whole cache it would be beneficial to have third option:

3. Expire the cache for the saved page and it's parent page.

I think that would be perfect in many situations, since that would allow to cache normal news-front template, since cache expires when someone edits/adds/removes news-item pages under that page. What do you think?

Share this post


Link to post
Share on other sites

Are you then going to run into issues elsewhere though as other pages could also potentially be using data from that page? I guess if you put a note next to that option explaining any potential issues then that would work.

What would be great in theory is if there was some way to track wherever a $pages->find call (or other such bits of code) is made in a template file that returns in it's results the page you're saving, as well as any pages that use InputPageSelect (and other such field types) and clear the cache for those pages as well - so basically any page that makes use of the data in the page you're saving should have it's cache cleared. Unfortunately that's impossible in practice for the template file side of things (should be do-able for fields) unless you to a preg_match call on every template...

...maybe that wouldn't be so hard to do actually...? It would require looking for any code inside PHP tags that's selecting pages to list and working out which ones relate to the current page you're saving.

Actually no, I think that would get quite messy and depending on the number of templates could take a bit of time. It also wouldn't be fool-proof - the minute you start putting common template bits into other files that you might include that PW doesn't know about (think header.inc, but yourname.inc <- PW wouldn't know that even existed).

So yeah, ignore my train of thought ;)

Share this post


Link to post
Share on other sites

I agree that the #3 option makes sense. Though thinking we might change the word "parent" to "parents", so that it clears all the way up the tree to the homepage. This seems simple enough.

Pete those are great ideas about locating all the pages with references to the current and clearing those as well. Though I'm afraid to go there because the only way to really guarantee that a site is up-to-date is to expire the entire cache. We can track some things (like page references) but not others. So I worry about the ambiguity of any cache clearing options that involve an unknown set of pages ahead of time. Doing so may make some people think that PW may be smart enough to figure out everything that needs to be cleared.

If there is one thing that causes confusion among clients, it's always caching, in my experience.… Client says:

I accidentally misspelled Shilo Toilolo's name in our press release. I went and fixed it right away, but just got a call from the CEO that it says SHITO TOILETO on our homepage! Help!

I respond "go save the homepage, or just wait an hour, it's on a cache". I've dealt with so many of these support calls in the past, that I tend to use the cache sparingly. :)

A possible 4th option would be one that Antti mentioned earlier, which would be to provide an InputfieldPageListSelectMultiple that lets you specifically select all the pages that should be cleared. While I'd rather make it "clear pages using these templates", the truth is that clearing specific pages (rather than pages using specific templates) is quite a bit simpler to implement in the current system. Though I'm going to toy around with the current cache system sometime to see if there might be a way I can get that per-template cache clearing.

Caching is always a compromise… my opinion is that most people should start with no caching, and only turn it on when they find they need it. And if they find they need it, they should take a close look at the MarkupCache too. But I'll work to expand the caching options, as I think these open a lot of doors to PW's use in high traffic sites.

Share this post


Link to post
Share on other sites

Ah, now that 4th option sounds interesting. You could have options on a page to clear a list of other pages then as you say, but another approach is from the template side of things where for each template you could say something like ANY page using this template clears all other pages using this template when saved.

I guess I've got a few scenarios in my head, but we'd need to jot down all possible scenarios to get this right. Problem is when you start clearing the cache on entire sections of the site then like you say, you may as well clear the whole thing! Maybe a better way would be to have a per page or per template option to stop a page and it's children, or evey page using a template from being cleared from the cache and approach it from that angle? Clear the cache for everything but the pages/templates you specify when a page is saved? Might be less hassle, I don't know - might also be less intuitive but I was thinking that you probably know of sections/pages that, once they're up on the site, will rarely/never change or certainly that they shouldn't have an impact elsewhere.

Share this post


Link to post
Share on other sites

The second option was what PW1 used. It can be done without much overhead because PW's cache looks in a "lastgood" file that has a mtime timestamp of when the cache was last considered good. Any cache files older than the date of that file are considered expired, whether they exist or not. So PW can uncache everything just by updating the mtime of that one file.

It would be very nice to know how often our clients save their pages. I think that it is pretty rare operation on many sites (not even every day), and some sites edit/add content few times in hour. I think that many times even on actively edited sites the "wipe whole cache" could be very good solution (knowing the fact that wiping the whole cache is "cheap" operation - and doing this only on templates that gets pulled through API - usually something like news, events etc). Don't know how much overhead it gives to then always write cache files over and over again on big sites (over thousand pages)? Or is it always cheaper way than letting pw to query db? (of course this depends also how popular the site is - if pages get very few views then we can forget whole caching :))

Share this post


Link to post
Share on other sites

I've been meaning to upgrade the caching options in the templates editor. It's easy to do, so I went ahead and put it in place and it's now committed to the source. Now you can choose any of these cache clearing options:

When page is saved:

  • Clear the page's cache (default)
  • Clear the entire site's cache
  • Clear the page's cache and it's parents (including homepage)
  • Clear specific pages (with page list selection)
  • Don't clear anything

Attached is a screenshot with the #4 option selected.

In ProcessWire, using the cache is definitely not required. I leave it off for smaller or lower traffic sites, and then use it only on some templates with higher traffic sites. But now that there are more clearing options, I may start using it a lot more. But my goal is always to keep ProcessWire fast whether you have the cache turned on or not. But there's no doubt that caching can make a big difference on pages where you are performing heavy operations.

post-1-132614279109_thumb.gif

Share this post


Link to post
Share on other sites

Just brillant! Thanks for integrating this awesome options ;D

Share this post


Link to post
Share on other sites

Awesome stuff Ryan. You are treating us too well!

Share this post


Link to post
Share on other sites

Just testing this cache for first time. It doesn't really save any chache files for me so far except for 1 case where I edited a summary of a page and there was a .cache file saved. Folders are generated but no .cache file. What's going on? ... just noticed that I wasn't really paying attention that it has option "for guest users" ... :) ... now it's flowing.

Share this post


Link to post
Share on other sites

Wow! This is really cool feature. Though not sure I'll be building sites that heavy on traffic anytime soon.  :-[

Anyway, thank you, Ryan!

Share this post


Link to post
Share on other sites

Thanks guys. The main thing I want to add is the ability to select which pages should have their cache cleared by way of what template they are using (i.e. clear all pages using template 'basic-page'). But that's a little more complex, so holding off until it comes up again. :)

Share this post


Link to post
Share on other sites

In the moment the cache clearing options only appear if you specify a cache time > 0. Is it possible to have the options always selectable?

This is my use case:

I build a multi-language using the "Display a different language based on URL pre-segment" approach and want my pages cached. The only pages I can enable caching are my language-gateways, otherwise the first requested language will be used for all languages.

In the moment I am using a simple custom module which clears the cache of every page on any page save, which works fine, but I would prefer tu use your cache clearing options.

Does this make sense or is there some logical error on my side?

Share this post


Link to post
Share on other sites

The reason those options don't appear if the cache time is set to 0 is because 'cache time' is not just a time, but also a toggle. When non-zero, cache is enabled. When zero, cache is not enabled. When cache is not enabled, none of the cache options are even considered by ProcessWire.

But if you want to bypass ProcessWire's cache settings at runtime, you can do so. For example, you could use the 'URL pre-segment' approach, enable the cache, and disable it manually before rendering the page:

$mypage = $pages->get($path);
$mypage->template->cache_time = 0; // disable the cache
echo $mypage->render();
  • Like 1

Share this post


Link to post
Share on other sites

Thank you Ryan. I did not know about template->cache_time as it is missing on the cheatsheet. This looks like a nice solution for my use case!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...