Mysterious blank page, no errors to be found and/or how to use alternate URLs path() + pages_paths table.

artfulrobot · February 21

Symptoms:

as title says, just a blank page, no content, but successful http response.
Nothing logged in any of PW's log files, or in php's error log.
Cleared caches, still same problem.
Other pages using same template all work fine.
Page loadable with pages(1234) but *not* loadable with pages('/path/name-of-page') - the latter gets null page object.

Workaround:

I changed the page's name, saved, then changed it back again, and it seems to work.

Possible(?) confounding factors

These news pages live under /news/ however I wanted the URLs to include the published date, like /news/2024-02-21/some-page
It is being accessed via URL hook on /news/[0-9-]{10}/(title[^/]+) with $event->pages->get("/news/$event->title");
The page class overrides the `path()` function which contains `return '/news/' . date('Y-m-d', $this->published) . '/' . $this->name . '/';`
This has worked fine for all the other pages.

Any ideas what might have happened? What could cause pages() to fail to load by page path for this page? Possible asides: I'm not 100% on how to use different urls for pages, but what I have seems/ed to work; for example the template is configured not to use trailing slash, but I found I needed one returned by path(). pages('news/the-page') (without the date) loads the page. So I think path() is only used for rendering. ~~It didn't look like the date-prefix is persisted in the db.~~

EDIT: page_path table

When it was broken, I did a mysqldump. And after it was fixed I repeated. I then used neovim to diff the dumps (mmm, fun). Here's what I found.

I believe the key difference is in the pages_paths table, where the broken version had 'news/1970-01-01/the-page' and the working version had 'news/2024-02-21/the-page'.

My hunch is that something populated the pages_paths table before the page was published, thereby getting the epoch date.

So I need to understand the relationship between a class's path() function, and loading with pages($path).

artfulrobot · March 12

Here's how I ~~solved~~ bodged it:

public function path() {                                                           
  // We're including a trailing slash, even though it's configured not to use them.
                                                                                   
  if (!$this->published) {                                                         
    // Before a page is published, return the basic, hierarchical path.            
    // This will get stored in pages_paths table, which is important so the        
    // page can be found by its hierarchical path.                                 
    return '/news/' . $this->name . '/';                                           
  }                                                                                
  // On a published page, return our facade.                                       
  return '/news/' . date('Y-m-d', $this->published) . '/' . $this->name . '/';     
}

When calling the path() with an unpublished page, return a plain path that matches the actual hierarchy. This results in the correct value being written to the pages_paths table. Then on a published page, return the facade.

artfulrobot · March 13

Update: so this doesn't solve it. Why?

Because the automatic link selector will offer you the facade links to select, but then when you save it complains that the link is wrong and, although it has saved, it won't let you off the screen because it considers it a validation error.

:shrug:

I think I'm going to need to grok the pages() get code to see if there's a way around this. The whole public-facing URLs must match an internal hierarchy constraint is a real bind.

artfulrobot · March 13

So the link validator really strictly imposes the hierarchy. Given a path /a/b/c it looks for a page named 'c' whose parent is named 'b' whose parent is named 'a'. Which is why it fails.

I can quieten the false link validation errors if I enable url segments on the /news/ template - but then the link validation thinks /news/anything exists because it has a handler - even though the handler might give a 404.

I would like to see a different resolve path to page mechanism in use that could accommodate these other layouts.

I'm unclear how and when the PagesPaths module is brought in to assist in this process too. I understand it's supposed to sort of be a bit of a cache to assist lookups.

I'd think something like:

Does the path map to a page in pages_paths? Great, super quick indexed lookup, use that. If not...
Hookable path-to-page method. Tries first the original way (above), if not...
if page handled via segments, has some way to ask the controller for the page to use (may be itself, or some other), if not
check path history
(what about URL hooks?)
invalid.

artfulrobot · March 14

I have no idea if this is of interest to anyone else, but it might be useful for me to document my discovery journey, so I'll continue spouting into the void!

Having looked into how PW fetches pages specified by path, what have I learnt?

$pages->pathFinder() is described as a newer more capable way to find pages by path, but it's not the method that is used by $pages->get() / the PageLoader class. (That PageLoader::getByPath() method does/can use pathFinder but only in the case that its searches bring up multiple possibilities.

The QA/link checker code introduces a concept of sleep/wake which as far as I can guess, it calls sleep when the HTML is written to the database which tries to add an attribute in to identify the page a link goes to by ID. And wake when loading the HTML, possibly replacing URLs with something based on the previously stored ID. I'm not 100%.

The QA code calls $pages->getByPath() up to twice. The first time it looks, given /a/b/c for a page whose name is c and whose parent page's name is b and whose parent page is named a; if that fails it checks whether the page is covered by a template using urlSegments; if that fails it checks the history table (if enabled). Then if not found, it calls again without urlSegments, which causes it to also search for a page called c - if there's only one, then it's happy.

I'm still puzzling over

why the QA method seems to be different from the normal method. Perhaps it's because the QA can't know about access restrictions of the user who will access the link, so only goes on existence?
why a page class' path() output seems to be ignored - I realise that the only way not to ignore it would be to rely on the optional PagePaths module, but that seems reasonable, and is hard wired into core anyway in other places (e.g. multiple matches). If getByPath() used that, perhaps via pathFinder would that fix things I wonder?

artfulrobot · March 25

I think my problem may have been caused by the code in https://processwire.com/talk/topic/29716-mysterious-blank-page-no-errors-to-be-found-andor-how-to-use-alternate-urls-path-pages_paths-table/?do=findComment&comment=239935

because $page->published is not how to tell if a page is published(!) and it has a value for unpublished pages, too. Instead I should have been using $page->isUnpublished()

I also needed this to ensure that the PagePaths module kept its table up-to-date:

$pagePathModule = modules('PagePaths');                                
$this->pages->addHook('published', $pagePathModule, 'hookPageMoved');  
$this->pages->addHook('unpublished', $pagePathModule, 'hookPageMoved');

artfulrobot · April 9

The problem goes deeper.

As well as the hooks above (published/unpublished) i needed to add one on Pages::saved(1:published) because my paths are based on the published date.
I believe there's a bug in core that prevents us being able to implement path() in a page class successfully:
https://github.com/processwire/processwire-issues/issues/1906

Sign In

Mysterious blank page, no errors to be found and/or how to use alternate URLs path() + pages_paths table.

Recommended Posts

artfulrobot

EDIT: page_path table

Link to comment

Share on other sites

artfulrobot

Link to comment

Share on other sites

artfulrobot

Link to comment

Share on other sites

artfulrobot

Link to comment

Share on other sites

artfulrobot

Link to comment

Share on other sites

artfulrobot

Link to comment

Share on other sites

artfulrobot

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Browse

Activity

My Activity Streams

Support

Store

My Details