Jump to content


  • Content Count

  • Joined

  • Last visited

  • Days Won


ian last won the day on February 19 2016

ian had the most liked content!

Community Reputation

23 Excellent

About ian

  • Rank
    Jr. Member

Recent Profile Visitors

908 profile views
  1. Thanks @Robin S. That makes perfect sense. I found a way round it for logging but good to know the reasoning behind it.
  2. I've come across some strange behaviour that I can't figure out, while bootstrapping PW from another script. I have a few php scripts that I bootstrap PW in the recommended way, by including ./index.php. This works fine and I can access the API as normal. I use these to retrieve information for old or external links that still reference pages on my pre-Processwire days. The scripts work fine - I can retrieve the necessary record, the information is displayed correctly in the browser and returns a 200 status code (or is correctly redirected - some have 301 redirects to the appropriate new url). However this seems to somehow trigger some aspect of the 404 handling routine. I say some aspect because Chrome dev tools doesn't show a 404, and the actual 404 page isn't displayed, but if I monitor sessions (I have Session Database Handler installed) each call to one of these scripts displays /http404/ as the URL in the active sessions in the admin. (Now, as I write this I wonder if that's the intended behaviour, given that this is not a true page in the page tree). But - if I want to monitor actual 404s (e.g. via the logging method described here) so I can look for missing pages, out of date links etc., the log is filled up with spurious "false 404s" from calls to my bootstrapped scripts. I have a lot of page views to the site like this (many are from an iOS app that retrieves data from the site), and the site gets a lot of visits from search engine spiders, with many old links that I want to pin down and redirect, so it would be nice if I could exclude these spurious 404s from being monitored. Can anyone throw any light on this? Running PW 3.0.94 with a number of modules, but the same behaviour is exhibited on a pretty much vanilla PW install.
  3. Yes, I think so - and you posted while I was writing my reply!
  4. Success (or at least I think so) . Thanks Ryan, of course you were right all along. Please ignore the nonsense above about page request lifecycle. On reading it back it's obvious that's not possible. Here's where I believe I was going wrong: On the first page (Page 1 of the paginated list) there is no page number identifier, e.g. the url is like /thumbnails/noctuidae. Subsequent pages are like /thumbnails/noctuidae/page2, /thumbnails/noctuidae/page3 etc. Nothing unusual here. The setting $config->maxUrlSegments is set to 2. This includes the page number segment if it's there, but crucially if it's not, it allows another page segment on the first page (e.g. /thumbnails/noctuidae/somerandomstringofcharacters. I was validating the $input->urlSegment1 for valid names, but wasn't checking $input->urlSegment2 at all. Hence somerandomstringofcharacters was getting through as segment2 on the first page only, and finding its way into the cached page. Thus, any page links on the first cached page were stuffed with these characters. My validation routine only worked on pages other than the first. Using $input->urlSegmentStr as Ryan suggests and validating against this solves the problem. Thanks Ryan and others for your patience!
  5. Thanks Ryan, I will certainly try the $input->urlSegmentStr - that simplifies it somewhat; I didn't know about that. However, those links are resulting in a 404 now since I added the $config->maxUrlSegments=2 setting. What I can't figure out is how those links are getting into the cached pages if visiting such a URL results in a 404. Would I be correct in thinking that the ProCache cached page is generated earlier in the request lifecycle than my urlSegment/validation checking? I guess that's something I can test on my dev later, but is it possible something like this could happen?: Page is not cached yet, or has recently been cleared/deleted by maintenance Some user visits the page and (perhaps inadvertently) puts bogus junk in the url (/thumbnails/gracillariidae/some-bogus-junk) ProCache detects the page isn't cached and caches it to disk, creating the pagination links containing some-bogus-junk My $input->urlSegmentStr and isValidFamily() routines run and detect an invalid URL User is shown the 404 page but the page has been cached with wrong pagination urls I've no idea whether that's even remotely possible, but that's how it appears. Note: I realise now that the $config->maxUrlSegments considers the pagenumber segment, so $config-maxUrlSegments=2 is correct in my case.
  6. Update: Unfortunately I still haven't got to the bottom of this - the problem is still occurring. Here's what I've tried: Upgraded the live site to PW 2.7.2 to correlate better with my dev system. Uninstalled the Email Obfuscation module and used my own routine Set $config->maxUrlSegments to 2 (*see note below) and cleared the ProCache cache Ensured validation of the urlSegment that pertains to the family against a list of known values (and cleared the ProCache cache) The ProCache is set to expire at 24 hours but at random times before this (after clearing the cache and revisiting the page) the first (Page1) page seems to be regenerated and if so, seems to exhibit the problem. I can't reproduce this myself, it just happens on the site, and if bypassing the ProCache (when logged in etc) it seems OK. Once Page1 is in the cache it breaks any subsequent page links (now that I've set config->maxUrlSegments) unless you bypass it manually by typing in /page2 or /page3 in the address bar. * I'm using just one urlSegment beyond the page itself, but when using maxUrlSegments I have to set this to 2 to work - is this the expected behaviour? Here's a snippet of my code: $metatitle = "British Moths | Thumbnail List by Family | UKMoths"; $title = "Thumbnails by Family"; $sanitizedfam = ""; if (strlen($input->urlSegment(1))) { $sanitizedfam = $sanitizer->text($input->urlSegment(1)); if (!isValidFamily($sanitizedfam)) throw new Wire404Exception(); $specieslist = $pages->find("template=species, fam={$sanitizedfam}, limit=12"); if (!count($specieslist)) throw new Wire404Exception(); $metatitle = "British Moths | Thumbnail List by Family | " . ucfirst($sanitizedfam); $title = "Families: " . ucfirst($sanitizedfam); } The isValidFamily() function is my new validator against known values and throws a 404 if not valid. If there's an urlSegment(1) then the code returns the species containing the family name, or if none then throws a 404. Attached is a partial screenshot of the source current cached version of this page, which is (at the time of writing) exhibiting the problem: http://ukmoths.org.uk/thumbnails/gracillariidae/. I do appreciate any thoughts or further suggestions! Thanks, Ian.
  7. Oh, actually Ryan, I've just re-read your answer about validating the URL segments. I'll look into that. Thanks, Ian.
  8. Hmm - looks like I - spoke too soon - it's still happening :-( I'm still trying to eliminate various things but haven't pinned it down yet. I've now uninstalled the Email Obfuscation module but some of these odd links have reappeared since. If I clear the ProCache (or just delete the specific subtree in the ProCache folder), I can revisit the pages and hence regenerate the cached versions. These seem OK, but some time later when the cache has expired, I revisit and the odd links can be there again (they are in the cached versions too). It all seems rather intermittent. I did set $config->maxUrlSegments=2 for a while, which returned a 404 when an affected link was visited, but didn't prevent the oddity. It just frustrated my visitors! For anyone who's interested, here's the link to the 'base' thumbnail page. http://ukmoths.org.uk/thumbnails/ - any of the thumbnails with more than 12 species will link to paginated versions and could be affected. Appreciate all your help,
  9. Thanks Ryan, I think you may well have hit the nail on the head with the Email Obfuscation module. I found that you can disable this on a template by template basis. Having disabled it for the thumbnails template and cleared the ProCache cache again, things seem to be OK at the moment. Obviously I'll have to keep an eye on it. The template is using URL segments - the page itself is /thumbnails/ and each moth family is an URL segment (crambidae etc.). This then pulls out just the thumbnails for species belonging to that family. The number of families is relatively static but can change. Probably I should set the $config->maxUrlSegments to 1 and then throw a 404 as you say for invalid values. Cheers!, Ian.
  10. @cstevensjr: Thanks, yes - I didn't provide much background information. The live site uses PW 2.6.1 currently, with ProCache, FormBuilder, All In One Minify and Email Obfuscation (EMO). PHP version is 5.3.28. My dev setup is MAMP Pro on Mac, running 5.5.10 and I've updated the dev site to PW 2.7.2 but haven't had chance to update the live site. So yes, there are a few differences! I recently installed Google Analytics but don't know if this behaviour coincided with that - I only noticed it when looking at the urls reported in GA. I'll have to try some things over the weekend to narrow it down and remove a few variables. @Robin S: Only briefly - the first couple of clicks to /page2 and page3 seemed OK after clearing the cache, then I moved to /page4 and the behaviour returned, even on the pages 2 and 3. Once the urls are formatted this way, it's bypassing ProCache I presume because the rewrite doesn't work. @tpr: Yes, the full string of characters in the URL is visible in the view source, and also in the cached file in the ProCache folder. ------- Much appreciated all! Ian.
  11. Hi all, I've just noticed a strange issue with some paginated pages on my site UKMoths, (http://ukmoths.org.uk). I have a series of pages showing thumbnails of moths by family, here: http://ukmoths.org.uk/thumbnails. The opening page shows the families but as you drill down, it displays all the species within a certain family. If there are more than 12 then the output is paginated using standard MarkupPagerNav functionality. On some however, I've noticed some long strings of random characters between the base url and the page notifier. For example the crambidae list has 140 species so has about 12 pages. Page 1 is fine, showing /thumbnails/crambidae, but pages beyond this, instead of the urls being like /thumbnails/crambidae/page2 they are something like /thumbnails/crambidae/BVXAz1div6cNWKM3P5NDP7EoP4WA .... (cut for brevity) ... CSCHd6.9c6Nhh/page2. I can't for the life figure out why this is happening. It seems to be the case for both ProCache version pages and non-cached (when logged in). If I look at the ProCache folder in the assets, the structure looks to be correct - i.e. a crambidae folder and then page2, page3 etc. folders. I should point out that the pages render correctly, even with these odd urls. It doesn't happen across the board though - it just seems to be certain ones - the /thumbnails/elachistidae folder pagination is fine - yet they're all using the same template. And the same site on my dev system is fine. Confused! Any one have any thoughts? Thanks, Ian.
  12. No pun intended - but now I wish I had thought of that! Yes, each moth is a page, and in fact each photo is a child of the 'moth' page, so has it's own page too. This is a historical thing - lots of other sites are linking directly to the individual photo pages from the old site, so I made the new site replicate that structure albeit now with friendly urls. It probably would have been a lot simpler to have the photos as a repeater field in the moth template but I needed individual urls for these external links. Thanks Dave. I submitted it but I think it's still working its way through the system.
  13. Nothing special as far as the photos are concerned. I do usually spend some time optimising the photos before adding them to the site. Of course the 'best' ones feature on the home page! The logo is the only retina-enabled image as I recall, using css and @2x background pngs, but that logic came with the template, not really my work.
  14. Thanks dragan, I'm not sure how many pages there are in Google index terms but there's around 16,500 entries in the pages table. There's a lot of pagination going on too and I think some of my old stuff is still indexed. PW handles it all with no problems though.
  15. http://ukmoths.org.uk Hi, I launched this back in late 2015 but just decided to post it here. It's a complete rebuild in ProcessWire of a site that's been running and growing for at least 15 years in one form or another. The site aims to illustrate all of Britain's moths (not quite there yet!), and despite the obscure subject matter, gets quite a lot of traffic. Hence I've used ProCache to keep things snappy and I'm pleased with the performance. There are over 7000 photos on the site. I should say that the design is not mine, but a purchased template. It took a while to find a template that I thought could portray these under-appreciated creatures in a good light. The most challenging aspects were importing all the data and images from the old system, and getting aspects of the search to work in the way I wanted. The best part is that it's so much easier to add new content! Thanks for ProcessWire, and thanks Ryan and everyone else in the forums! Cheers, Ian.
  • Create New...