Jump to content

Mysterious blank page, no errors to be found and/or how to use alternate URLs path() + pages_paths table.


artfulrobot
 Share

Recommended Posts

Symptoms:

  • as title says, just a blank page, no content, but successful http response.
  • Nothing logged in any of PW's log files, or in php's error log.
  • Cleared caches, still same problem.
  • Other pages using same template all work fine.
  • Page loadable with pages(1234) but *not* loadable with pages('/path/name-of-page') - the latter gets null page object.

Workaround:

  • I changed the page's name, saved, then changed it back again, and it seems to work.

Possible(?) confounding factors

  • These news pages live under /news/ however I wanted the URLs to include the published date, like /news/2024-02-21/some-page
  • It is being accessed via URL hook on /news/[0-9-]{10}/(title[^/]+) with $event->pages->get("/news/$event->title");
  • The page class overrides the `path()` function which contains `return '/news/' . date('Y-m-d', $this->published) . '/' . $this->name . '/';`
  • This has worked fine for all the other pages.

Any ideas what might have happened? What could cause pages() to fail to load by page path for this page? Possible asides: I'm not 100% on how to use different urls for pages, but what I have seems/ed to work; for example the template is configured not to use trailing slash, but I found I needed one returned by path(). pages('news/the-page') (without the date) loads the page. So I think path() is only used for rendering. It didn't look like the date-prefix is persisted in the db.

EDIT: page_path table

When it was broken, I did a mysqldump. And after it was fixed I repeated. I then used neovim to diff the dumps (mmm, fun). Here's what I found.

I believe the key difference is in the pages_paths table, where the broken version had 'news/1970-01-01/the-page' and the working version had 'news/2024-02-21/the-page'.

My hunch is that something populated the pages_paths table before the page was published, thereby getting the epoch date.

So I need to understand the relationship between a class's path() function, and loading with pages($path).

 

 

 

Link to comment
Share on other sites

  • artfulrobot changed the title to Mysterious blank page, no errors to be found and/or how to use alternate URLs path() + pages_paths table.
  • 3 weeks later...

Here's how I solved bodged it:

public function path() {                                                           
  // We're including a trailing slash, even though it's configured not to use them.
                                                                                   
  if (!$this->published) {                                                         
    // Before a page is published, return the basic, hierarchical path.            
    // This will get stored in pages_paths table, which is important so the        
    // page can be found by its hierarchical path.                                 
    return '/news/' . $this->name . '/';                                           
  }                                                                                
  // On a published page, return our facade.                                       
  return '/news/' . date('Y-m-d', $this->published) . '/' . $this->name . '/';     
}                                                                                  

When calling the path() with an unpublished page, return a plain path that matches the actual hierarchy. This results in the correct value being written to the pages_paths table. Then on a published page, return the facade.

Link to comment
Share on other sites

Update: so this doesn't solve it. Why?

Because the automatic link selector will offer you the facade links to select, but then when you save it complains that the link is wrong and, although it has saved, it won't let you off the screen because it considers it a validation error.

:shrug:

I think I'm going to need to grok the pages() get code to see if there's a way around this. The whole public-facing URLs must match an internal hierarchy constraint is a real bind.

Link to comment
Share on other sites

So the link validator really strictly imposes the hierarchy. Given a path /a/b/c it looks for a page named 'c' whose parent is named 'b' whose parent is named 'a'. Which is why it fails.

I can quieten the false link validation errors if I enable url segments on the /news/ template - but then the link validation thinks /news/anything exists because it has a handler - even though the handler might give a 404.

I would like to see a different resolve path to page mechanism in use that could accommodate these other layouts.

I'm unclear how and when the PagesPaths module is brought in to assist in this process too. I understand it's supposed to sort of be a bit of a cache to assist lookups.

I'd think something like:

  1. Does the path map to a page in pages_paths? Great, super quick indexed lookup, use that. If not...
  2. Hookable path-to-page method. Tries first the original way (above), if not...
  3. if page handled via segments, has some way to ask the controller for the page to use (may be itself, or some other), if not
  4. check path history
  5. (what about URL hooks?)
  6. invalid.
Link to comment
Share on other sites

I have no idea if this is of interest to anyone else, but it might be useful for me to document my discovery journey, so I'll continue spouting into the void!

Having looked into how PW fetches pages specified by path, what have I learnt?

$pages->pathFinder() is described as a newer more capable way to find pages by path, but it's not the method that is used by $pages->get() / the PageLoader class. (That PageLoader::getByPath() method does/can use pathFinder but only in the case that its searches bring up multiple possibilities.

The QA/link checker code introduces a concept of sleep/wake which as far as I can guess, it calls sleep when the HTML is written to the database which tries to add an attribute in to identify the page a link goes to by ID. And wake when loading the HTML, possibly replacing URLs with something based on the previously stored ID. I'm not 100%.

The QA code calls $pages->getByPath() up to twice. The first time it looks, given /a/b/c for a page whose name is c and whose parent page's name is b and whose parent page is named a; if that fails it checks whether the page is covered by a template using urlSegments; if that fails it checks the history table (if enabled). Then if not found, it calls again without urlSegments, which causes it to also search for a page called c - if there's only one, then it's happy.

I'm still puzzling over

  • why the QA method seems to be different from the normal method. Perhaps it's because the QA can't know about access restrictions of the user who will access the link, so only goes on existence?
  • why a page class' path() output seems to be ignored - I realise that the only way not to ignore it would be to rely on the optional PagePaths module, but that seems reasonable, and is hard wired into core anyway in other places (e.g. multiple matches). If getByPath() used that, perhaps via pathFinder would that fix things I wonder?

 

 

Link to comment
Share on other sites

  • 2 weeks later...

I think my problem may have been caused by the code in https://processwire.com/talk/topic/29716-mysterious-blank-page-no-errors-to-be-found-andor-how-to-use-alternate-urls-path-pages_paths-table/?do=findComment&comment=239935

because $page->published is not how to tell if a page is published(!) and it has a value for unpublished pages, too. Instead I should have been using $page->isUnpublished()

I also needed this to ensure that the PagePaths module kept its table up-to-date:

$pagePathModule = modules('PagePaths');                                
$this->pages->addHook('published', $pagePathModule, 'hookPageMoved');  
$this->pages->addHook('unpublished', $pagePathModule, 'hookPageMoved');

 

Link to comment
Share on other sites

  • 2 weeks later...

The problem goes deeper.

  • As well as the hooks above (published/unpublished) i needed to add one on Pages::saved(1:published) because my paths are based on the published date.
  • I believe there's a bug in core that prevents us being able to implement path() in a page class successfully:
    https://github.com/processwire/processwire-issues/issues/1906
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...