Jump to content


Photo

(Template) Caching


  • Please log in to reply
24 replies to this topic

#1 SiNNuT

SiNNuT

    Sr. Member

  • Members
  • PipPipPipPip
  • 366 posts
  • 231

Posted 30 January 2011 - 10:44 AM

I'm working on a site in Processwire that has a lot of pages that don't change frequently. Although they seem to render fast enough without caching it seems optimal to cache these pages. Enabling caching works fine, but whenever i update a page the cached version doesn't get deleted. I could delete the cached file manually but this isn't really an option when a client eventually administers the site.

I don't know how caching is implemented but would it be possible to have some sort of 'on update delete cached file' functionality?

#2 adamkiss

adamkiss

    Master of the universe

  • Moderators
  • 1,078 posts
  • 289

Posted 30 January 2011 - 10:46 AM

Hello,

does it 'fail'? Or the cache file has still the same time properties? (because the save process may do a new cahce file in the very saving process)

Or how it's site reacting (the cache file actually may be deleted/rewritten on the first visit of updated page)?

#3 SiNNuT

SiNNuT

    Sr. Member

  • Members
  • PipPipPipPip
  • 366 posts
  • 231

Posted 30 January 2011 - 12:11 PM

Not sure if it fails, because i don't know the way it is supposed to work. It saves a cache file on first page visit. No matter if you update the page later on , the cache file stays the same (i guess until the set cache time is expired). Would be nice if it is aware of changes. On update a page with certain id, delete the associated cache file.

#4 ryan

ryan

    Hero Member

  • Administrators
  • 5,753 posts
  • 3102

  • LocationAtlanta, GA

Posted 31 January 2011 - 09:47 AM

Sinnut is right. This was a bug! It has been fixed in the latest commit. Thanks Sinnut!

https://github.com/r...06c058e3ee4e76f

#5 apeisa

apeisa

    Hero Member

  • Moderators
  • 2,517 posts
  • 842

  • LocationVihti, Finland

Posted 12 August 2011 - 01:43 PM

Is there easy way to add more intelligence to cache? Meaning stuff like "if page with template news-post is saved, clear cache from page /news/"?

Might be good for simple module, but these could be settings also on template -> cache tab? What you guys think? I think that most beneficial page to cache is homepage, but not sure how to do it now, since it usually pulls data from many other pages.

PS: I really like the possibility to avoid cache with predefined post & get variables! Though it might be good idea to always disable cache, if there is "CommentForm_submit" POST variable available. Or does that open doors for DoS-attack?

#6 ryan

ryan

    Hero Member

  • Administrators
  • 5,753 posts
  • 3102

  • LocationAtlanta, GA

Posted 12 August 2011 - 04:51 PM

On a large site, there can really be a lot of cache files. And every page can support up to 1k cache files for URL segments and page numbers. So as the scale increases, it can really slow down the save to selectively clear some stuff and not others. I've found that good compromises are:

1. Use low cache times with the current system on pages you don't want to risk having old content (seconds or minutes rather than hours). 

2. Or, Set the cache to wipe entirely on every page save.

The second option was what PW1 used. It can be done without much overhead because PW's cache looks in a "lastgood" file that has a mtime timestamp of when the cache was last considered good. Any cache files older than the date of that file are considered expired, whether they exist or not. So PW can uncache everything just by updating the mtime of that one file.

Given the above, it would be relatively easy for me to add an option to the template cache settings that says "When a page using this template is saved, clear: 1) this page's cache file; or 2) cache files from all pages." Anything beyond that could involve significantly more overhead, short of major changes to the current CacheFile class (which can certainly be done in the future).

#7 apeisa

apeisa

    Hero Member

  • Moderators
  • 2,517 posts
  • 842

  • LocationVihti, Finland

Posted 12 August 2011 - 05:46 PM

I see what you mean with overhead. And that would be kind of micro management anyway. I really like the option to wipe whole cache.

#8 ryan

ryan

    Hero Member

  • Administrators
  • 5,753 posts
  • 3102

  • LocationAtlanta, GA

Posted 12 August 2011 - 08:28 PM

Sounds good, I'll plan to implement this – I think we're already almost there.

#9 Pete

Pete

    Administrator

  • Administrators
  • 1,754 posts
  • 652

  • LocationChester, England

Posted 15 August 2011 - 07:16 AM

The second way is how it works in MODx - well, their older branch at least. I do agree that it could do with a template-based option as well as a lot of the time it is only a few pages that might change as you guys say.

#10 ryan

ryan

    Hero Member

  • Administrators
  • 5,753 posts
  • 3102

  • LocationAtlanta, GA

Posted 15 August 2011 - 08:27 AM

Wiping the entire cache on every page save is a good idea to at least provide. In a system like PW, a given page might pull from several others that are determined at runtime. So there's no way the CMS can know all the possible interrelationships ahead of time. As a result, expiring the cache on every page save is the only way to know for sure that the site is delivering fully up-to-date pages. On the other hand, it's rare that I actually need the entire cache to expire on every save... and if I really need it for some reason, I go to Modules > Page Render > Clear Cache. But caching can be a difficult concept for a client to really understand sometimes, so having the entire cache expire on every page save can reduce the support burden. So in one of these near-term commits, I'm going to go ahead and add an option to the Template editor that says:

When I save a page:
    1. Expire the cache for the saved page only.
    2. Expire the cache for all pages.

Being able to specify that at the template level will provide a lot of flexibility.

#11 apeisa

apeisa

    Hero Member

  • Moderators
  • 2,517 posts
  • 842

  • LocationVihti, Finland

Posted 31 August 2011 - 06:48 AM

Just an idea that came up today. At lukio.fi site (which gets good amount of traffic, can't tell you since haven't asked permission) I cache normal content pages pretty heavily (for one week). They don't have anything dynamic, but the navigation menu. If you remove or add children, then it causes problems on the page.

So instead of clearing whole cache it would be beneficial to have third option:

3. Expire the cache for the saved page and it's parent page.

I think that would be perfect in many situations, since that would allow to cache normal news-front template, since cache expires when someone edits/adds/removes news-item pages under that page. What do you think?

#12 Pete

Pete

    Administrator

  • Administrators
  • 1,754 posts
  • 652

  • LocationChester, England

Posted 31 August 2011 - 07:28 AM

Are you then going to run into issues elsewhere though as other pages could also potentially be using data from that page? I guess if you put a note next to that option explaining any potential issues then that would work.

What would be great in theory is if there was some way to track wherever a $pages->find call (or other such bits of code) is made in a template file that returns in it's results the page you're saving, as well as any pages that use InputPageSelect (and other such field types) and clear the cache for those pages as well - so basically any page that makes use of the data in the page you're saving should have it's cache cleared. Unfortunately that's impossible in practice for the template file side of things (should be do-able for fields) unless you to a preg_match call on every template...

...maybe that wouldn't be so hard to do actually...? It would require looking for any code inside PHP tags that's selecting pages to list and working out which ones relate to the current page you're saving.

Actually no, I think that would get quite messy and depending on the number of templates could take a bit of time. It also wouldn't be fool-proof - the minute you start putting common template bits into other files that you might include that PW doesn't know about (think header.inc, but yourname.inc <- PW wouldn't know that even existed).

So yeah, ignore my train of thought ;)

#13 ryan

ryan

    Hero Member

  • Administrators
  • 5,753 posts
  • 3102

  • LocationAtlanta, GA

Posted 31 August 2011 - 09:39 AM

I agree that the #3 option makes sense. Though thinking we might change the word "parent" to "parents", so that it clears all the way up the tree to the homepage. This seems simple enough.

Pete those are great ideas about locating all the pages with references to the current and clearing those as well. Though I'm afraid to go there because the only way to really guarantee that a site is up-to-date is to expire the entire cache. We can track some things (like page references) but not others. So I worry about the ambiguity of any cache clearing options that involve an unknown set of pages ahead of time. Doing so may make some people think that PW may be smart enough to figure out everything that needs to be cleared.

If there is one thing that causes confusion among clients, it's always caching, in my experience.… Client says:

I accidentally misspelled Shilo Toilolo's name in our press release. I went and fixed it right away, but just got a call from the CEO that it says SHITO TOILETO on our homepage! Help!


I respond "go save the homepage, or just wait an hour, it's on a cache". I've dealt with so many of these support calls in the past, that I tend to use the cache sparingly. :)

A possible 4th option would be one that Antti mentioned earlier, which would be to provide an InputfieldPageListSelectMultiple that lets you specifically select all the pages that should be cleared. While I'd rather make it "clear pages using these templates", the truth is that clearing specific pages (rather than pages using specific templates) is quite a bit simpler to implement in the current system. Though I'm going to toy around with the current cache system sometime to see if there might be a way I can get that per-template cache clearing.

Caching is always a compromise… my opinion is that most people should start with no caching, and only turn it on when they find they need it. And if they find they need it, they should take a close look at the MarkupCache too. But I'll work to expand the caching options, as I think these open a lot of doors to PW's use in high traffic sites.


#14 Pete

Pete

    Administrator

  • Administrators
  • 1,754 posts
  • 652

  • LocationChester, England

Posted 31 August 2011 - 09:49 AM

Ah, now that 4th option sounds interesting. You could have options on a page to clear a list of other pages then as you say, but another approach is from the template side of things where for each template you could say something like ANY page using this template clears all other pages using this template when saved.

I guess I've got a few scenarios in my head, but we'd need to jot down all possible scenarios to get this right. Problem is when you start clearing the cache on entire sections of the site then like you say, you may as well clear the whole thing! Maybe a better way would be to have a per page or per template option to stop a page and it's children, or evey page using a template from being cleared from the cache and approach it from that angle? Clear the cache for everything but the pages/templates you specify when a page is saved? Might be less hassle, I don't know - might also be less intuitive but I was thinking that you probably know of sections/pages that, once they're up on the site, will rarely/never change or certainly that they shouldn't have an impact elsewhere.

#15 apeisa

apeisa

    Hero Member

  • Moderators
  • 2,517 posts
  • 842

  • LocationVihti, Finland

Posted 21 September 2011 - 09:15 AM

The second option was what PW1 used. It can be done without much overhead because PW's cache looks in a "lastgood" file that has a mtime timestamp of when the cache was last considered good. Any cache files older than the date of that file are considered expired, whether they exist or not. So PW can uncache everything just by updating the mtime of that one file.


It would be very nice to know how often our clients save their pages. I think that it is pretty rare operation on many sites (not even every day), and some sites edit/add content few times in hour. I think that many times even on actively edited sites the "wipe whole cache" could be very good solution (knowing the fact that wiping the whole cache is "cheap" operation - and doing this only on templates that gets pulled through API - usually something like news, events etc). Don't know how much overhead it gives to then always write cache files over and over again on big sites (over thousand pages)? Or is it always cheaper way than letting pw to query db? (of course this depends also how popular the site is - if pages get very few views then we can forget whole caching :))

#16 ryan

ryan

    Hero Member

  • Administrators
  • 5,753 posts
  • 3102

  • LocationAtlanta, GA

Posted 21 September 2011 - 12:47 PM

I've been meaning to upgrade the caching options in the templates editor. It's easy to do, so I went ahead and put it in place and it's now committed to the source. Now you can choose any of these cache clearing options:

When page is saved:

  • Clear the page's cache (default)
  • Clear the entire site's cache
  • Clear the page's cache and it's parents (including homepage)
  • Clear specific pages (with page list selection)
  • Don't clear anything

Attached is a screenshot with the #4 option selected.

In ProcessWire, using the cache is definitely not required. I leave it off for smaller or lower traffic sites, and then use it only on some templates with higher traffic sites. But now that there are more clearing options, I may start using it a lot more. But my goal is always to keep ProcessWire fast whether you have the cache turned on or not. But there's no doubt that caching can make a big difference on pages where you are performing heavy operations.

Attached Thumbnails

  • pw-templates-cache.gif


#17 Soma

Soma

    Hero Member

  • Moderators
  • 3,183 posts
  • 1733

  • LocationSH, Switzerland

Posted 21 September 2011 - 01:28 PM

Just brillant! Thanks for integrating this awesome options ;D

@somartist | modules created | support me, flattr my work flattr.com


#18 apeisa

apeisa

    Hero Member

  • Moderators
  • 2,517 posts
  • 842

  • LocationVihti, Finland

Posted 21 September 2011 - 01:39 PM

Awesome stuff Ryan. You are treating us too well!

#19 Soma

Soma

    Hero Member

  • Moderators
  • 3,183 posts
  • 1733

  • LocationSH, Switzerland

Posted 21 September 2011 - 02:05 PM

Just testing this cache for first time. It doesn't really save any chache files for me so far except for 1 case where I edited a summary of a page and there was a .cache file saved. Folders are generated but no .cache file. What's going on? ... just noticed that I wasn't really paying attention that it has option "for guest users" ... :) ... now it's flowing.

@somartist | modules created | support me, flattr my work flattr.com


#20 Sevarf2

Sevarf2

    Sr. Member

  • Members
  • PipPipPipPip
  • 301 posts
  • 13

  • LocationBratislava

Posted 21 September 2011 - 02:09 PM

that's what I was looking for  ;D Thanks




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users