ProcessWire 3.0.173 core updates: New URL hooks

ProcessWire 3.0.173 adds several new requested features and this post focuses on one of my favorites: the ability to hook into and handle ProcessWire URLs, independent of pages.

With this new version on the development branch, several smaller feature requests have been added which have been covered in ProcessWire Weekly 353, 354 and 355. This week the focus was on two larger feature requests which included: 1) the addition of an Inputfield module for entering and selecting tags; and 2) the ability to hook into ProcessWire's request URL routing to add your own custom handlers. The tags Inputfield is mostly done, but there are still a few bits and pieces to wrap up with it before committing to the core. In this post we'll take a look a new type of hook added to ProcessWire 3.0.173 that enables you to add custom handlers for URLs, independent of pages.

Introducing URL/path hooks

New type of hook for handling request URLs

In ProcessWire, request URLs have always directly referred to and mapped to a Page. Requested URLs that don't directly map to a Page instead go to the 404 Page. There are some exceptions such as when URL segments are in use, but every requested URL still must map to a page at some point in the URL structure.

ProcessWire 3.0.173 changes that, as you can now add hooks to handle URLs. This type of hook is attached during the "init" or "ready" states of ProcessWire's boot process. Meaning, you would place the hook code in /site/init.php or /site/ready.php, and autoload modules could do it from their init() or ready() methods. (I recommend using the "init" file or method, since it comes first.) To demonstrate, let’s say that we wanted to output "Hello World" when the requested URL was /hello/world — Here's how we could do it in our /site/init.php or /site/ready.php file:

$wire->addHook('/hello/world', function($event) {
  return 'Hello World';
}); 

If you wanted to output it yourself, then you would just return boolean true, to let PW know you are handling the output instead:

$wire->addHook('/hello/world', function($event) {
  echo 'Hello World';
  return true;
});

When this hook is in place, accessing the URL /hello/world/ simply outputs Hello World and nothing else. Your hook has full control of the request, while having the entire ProcessWire API at your fingertips.

The format of these hooks is exactly the same as any other hook in ProcessWire, except that you have the option of using return $value; (like we are doing above) rather than just $event->return = $value; My examples have the function inline, as part of the addHook() call, just to keep simple. But you can use objects and strings referring to method/function names just like with any other addHook() call.

Telling ProcessWire what page to render

If you wanted it to render a specific Page for that URL, you can return the Page to render, and ProcessWire will take care of making it the $page API variable and rendering it:

$wire->addHook('/hello/world', function($event) {
  return $event->pages->get('/about/contact/');
});

Handling multiple URLs with 1 hook

Rather than just handling 1 URL, let's say that we wanted our hook to be able to handle several: /hello/earth and /hello/mars and /hello/jupiter. This is where things get interesting:

$wire->addHook('/hello/(earth|mars|jupiter)', function($event) {
  return "Hello " . $event->arguments(1);
}); 

Depending on which version of the URL you access, it'll output "Hello earth", "Hello mars" or "Hello jupiter". Notice that we surround the part we want to remember in parenthesis, and we express the OR condition by separating the values we accept with pipes |.

The portion in parenthesis gets remembered and populated to $event->arguments(1). If we had additional parts in parenthesis, then they would get populated to $event->arguments(2), $event->arguments(3) and so on. Should you want it, the $event->arguments(0) always represents the entire URL that was matched.

Using named arguments

Perhaps we prefer to name our arguments rather than using index numbers. There are two different ways to do this. If you want your named argument to match any valid URL segment (or portion of one) then use: simple named arguments. If you want your named argument to match specific things, then use: pattern matching named arguments. Both are discussed below.

Simple named arguments

Simple named arguments can match any valid text in the URL segment and it will be assigned the name that you give it. They are created by wrapping the name you want to use in curly brackets, like {this}. Below is how we could add a hook with an argument named {planet} and also how we could use it in our hook function:

$wire->addHook('/hello/{planet}', function($event) {
  return "Hello " . $event->arguments('planet');
}); 

Since you've specifically named the argument, you can also just access it directly from $event if you prefer:

$wire->addHook('/hello/{planet}', function($event) {
  return "Hello " . $event->planet;
}); 

Because simple named arguments can match any valid text, depending on the case, you may want to filter them in your hook function by returning false when given a value you don't support. This results in a "404 page not found" page being displayed:

$wire->addHook('/hello/{planet}', function($event) {
  if($event->planet === 'earth' || $event->planet === 'mars') {
    return "Hello " . $event->planet;
  } else {
    return false;
  }
});

One simple named argument will match at most one URL segment at a time. But it can also match a partial URL segment. If we only wanted to match URLs having a planet name that started with "great-", like "great-earth" and only return the portion after "great-" to the hook function (i.e. "earth"), then we could do so like this:

$wire->addHook('/hello/great-{planet}', function($event) {
  return "Hello " . $event->planet; // i.e. "Hello earth", etc.
});

Pattern matching named arguments

Pattern matching named arguments will match specific values or a regular expression. They provide more opportunity to filter what URLs get sent to your hook, while still gaining the value of named arguments. To use them, just place the name you want to use in the parenthesis like (name:value), for example:

$wire->addHook('/hello/(planet:earth|mars|jupiter)', function($event) {
  return "Hello " . $event->arguments('planet');
}); 

The named argument pattern above ensures that only the URLs /hello/earth/, /hello/mars/ or /hello/jupiter/ will call our hook function. Since you've specifically named the argument, you can also just access it directly from $event if you prefer:

$wire->addHook('/hello/(planet:earth|mars|jupiter)', function($event) {
  return "Hello " . $event->planet;
}); 

When using pattern matching named arguments, note that you can use any regular expression pattern in the "value" portion of "(name:value)". Actually, this is true for for more than just named arguments. You can use regular expressions anywhere in your path matching definition. More on that below.

You can use any regular expression

If you are curious about more ways that you can match, I should cut to the chase and tell you right now that it'll accept any PCRE regular expression to match the URL (should you want to use them). Though don't be intimidated, for most cases where you would use this feature, you really don't need to know regular expressions.

Still, for those interested, behind the scenes, PW converts your match path to a regular expression (if it isn't one already), and it converts named arguments like {name} or (name:value) to PCRE named capture groups. This is all to make it simpler to look at and simpler to use, since ProcessWire URLs and named arguments are much simpler than regular expressions and capture groups.

Should you want to go full in to a regular expression, feel free to, by using one of the following characters !@#% as the starting and ending delimiters, and any regular expression in-between. Don't use the common slash / as your regular expression delimiter because we are matching URLs/paths, which use the slash for another purpose already.

Practical examples

Outputting JSON data about any page when the last part of the URL is "json"

Let's say that we want to support a site-wide feature where appending "/json" to any URL makes it render a JSON string of information about the page. For instance, /about/history/json/ could be the JSON output for the /about/history/ URL. When you return an array from your hook, ProcessWire automatically coverts it to JSON and sends the application/json content-type header as well. In this case, we'll be matching "json" at the end of the URL, and remembering everything that comes before it in order to retrieve the matching Page:

$wire->addHook('(/.*)/json', function($event) {
  $page = $event->pages->findOne($event->arguments(1));
  if($page->viewable()) return [
    'id' => $page->id,
    'url' => $page->url,
    'title' => $page->title,
    'summary' => $page->summary
  ];
}); 

Testing this out on this site (development version), accessing the URL /blog/posts/stripe-payment-processor-form-builder/json returns the following:

{
  "id": 2784,
  "url": "/blog/posts/stripe-payment-processor-form-builder/",
  "title": "Stripe Payment Processor for FormBuilder",
  "summary": "This week a second new module for processing..."
}

Making short URLs of all blog posts

Let's say that we want all of the blog posts on this site to be accessible at short ID-based URLs, like processwire.com/2784. We could have ProcessWire render them at those short URLs like this:

$wire->addHook('/([0-9]+)', function($event) {
  $id = $event->arguments(1);
  $post = $event->pages->findOne("template=blog-post, id=$id");
  if($post->viewable()) return $post;
});

Or maybe you want to instead redirect to the actual post:

$wire->addHook('/([0-9]+)', function($event) {
  $id = $event->arguments(1);
  $post = $event->pages->findOne("template=blog-post, id=$id");
  if($post->viewable()) $event->session->redirect($post->url);
});

Additional details

Trailing slashes vs. non-trailing slashes

ProcessWire will enforce the trailing-slash state of the request to be consistent with your hook definition. So if you do a $wire->addHook('/foo/bar/', ...) with the trailing slash, then a request for /foo/bar would get 301 redirected to /foo/bar/, and then your hook would execute. Likewise, if your hook was defined as $wire->addHook('/foo/bar', ...) without the trailing slash, then the reverse would be true, and a request for /foo/bar/ would 301 redirect to /foo/bar before your hook would be executed.

If you want to allow for either case (trailing slash or no trailing slash) then append a slash and a question mark to your pattern, like this: $wire->addHook('/foo/bar/?', ...).

Handling pagination

By default, path hooks will not be executed when the request URL includes a pagination number, i.e. /foo/bar/page2. If you want to use pagination, append the simple named argument {pageNum} to the end of your path. When present in your match path, the pagination number will be populated to ProcessWire and used automatically where applicable, just as it would be if PW were rendering a page. It will also be provided in the $event->pageNum argument to your hook (as an integer).

$wire->addHook('/foo/bar/{pageNum}', function($event) {
  return "You are on page $event->pageNum";
}); 

Please note the following about using pagination:

  • If there is no pagination number present in the URL then the default value is 1 (i.e. first pagination).

  • Because pagination number 1 is already implied, a URL ending with pagination number 1, (i.e. /foo/bar/page1) will redirect to to a URL without the /page1 (i.e. /foo/bar/).

  • The pagination number can be accessed from $event->pageNum, $event->arguments('pageNum') or the pageNum property of ProcessWire's $input API variable, i.e. wire()->input->pageNum

  • The {pageNum} must be the last segment in the URL and it must not have a trailing slash.

  • URLs with pagination numbers do not use trailing slashes after the pagination number, regardless of what you specify in your hook.

  • A trailing slash is enforced on the first pagination URL (the one that has no /page[n] number segment). For example, pagination 1 would be /foo/bar/ and pagination 2 would be /foo/bar/page2

  • If you receive an out-of-bounds pagination number, it's a good idea to return false so that ProcessWire will produce a 404.

Return values

The value returned from a URL handing hook determines what happens next. These are the possible return values and results:

  • None: 404 response
  • String: Output of string is sent
  • Page: Returned Page is rendered and made the current $page API variable
  • Array: Converted to JSON and output with "application/json" content-type header
  • True: Boolean true indicates you are handling the URL and are outputting directly
  • False: Boolean false is the same as None, which means a 404 response

In the examples above you see us using return statements directly, rather than $event->return = $value; as you might have seen in other hooks. In fact, you can use either here. The $event->return is used by other hooks because it is a value that can be passed around and modified by multiple hooks. In the case of these URL-handling hooks, I thought it was more likely that a matching hook would dictate and finalize what happens with the request, so using $event->return wasn't really necessary. However, maybe there is still a use case, you'll have to decide. Here's an example to demonstrate.

Let's say you have one URL handling hook is in /site/init.php and another is in /site/ready.php, they both match the same URL, and the string return value from both is used:

// in init.php
$wire->addHook('/foo/bar', function($event) {
  $event->return = 'Foo';
});

// in ready.php
$wire->addHook('/foo/bar', function($event) {
  $event->return .= 'Bar';
});

Accessing the URL /foo/bar/ outputs "FooBar" (rather than just "Foo" or "Bar") because both hooks executed in order and the 2nd hook could see the return value of the first, and append "Bar" to it. This passing-around and modification of a return value is more common in other ProcessWire hooks, but perhaps there's a use case for it here too, so just wanted to mention the option is there.

Conditional cases

It may be that you want to handle specific URLs only when a POST request is present, or when AJAX is expected, etc. I know some other solutions might make this part of their URL routing conditions. But rather than try to figure out all the cases you might want to create conditions for and building new APIs for it, I thought it was most efficient and simplest to just recommend that they be part of your conditions for attaching the hook. For example:

if($input->is('POST')) {
  $wire->addHook('/foo/bar/', function($e) { ... });
}

if($config->ajax) {
  $wire->addHook('/foo/bar/', function($e) { ... });
}

3rd party modules

This feature request originally came thanks to Bernhard. He also helped me focus in on a direction for it. As I understand it, his interest in and request for this originated from a desire for his modules to provide features at specific URLs without the module having to create and maintain custom pages, templates, fields, etc. for the purpose. And without the module having to hack around PW's 404 process. I thought that was a really good point. In fact, I think this ability to hook and handle ProcessWire URLs opens a lot of possibilities for 3rd party modules.

Modules can now handle any URLs they want without having to create a page, template, template file or fields for them. Imagine an XML sitemap module that automatically responds to the /sitemap.xml URL, for example. Imagine modules being able to provide web services, documentation or examples at whatever URLs they were developed or configured to use. Yes, maybe you could do some of this before by hacking around a bit, but it was a lot more difficult. Now it's really easy, and a native part of the system.

In this post I've tried to cover a lot of simple use cases, but I suspect we are just scratching the surface. I look forward to seeing how you use this.

One of the highlights of every weekend for me is reading the ProcessWire Weekly. It is always fantastic reading and I always learn something new from it. I really enjoy seeing the Site of the Week too. If you are a ProcessWire user, chances are you already read it every week. But if you are new around here, you are in for a treat. Keep up-to-date with ProcessWire, the latest in web development, and much more by reading Teppo’s ProcessWire Weekly every week, and you might want to subscribe to the weekly email as well. Thanks for reading and I hope that you have a great weekend!

Comments

  • Niko

    Niko

    This is such a great new feature!!! Love it! ❤

  • adrianbjones_gmail.com

    adrianbjones_gmail.com

    Damn - this is really fantastic, but literally 24 hours too late :)

    I set up a lot of janky urlsegment stuff last night for handling complex redirects from an old site.

    Time to redo it I suppose - at least it will be way quicker and better this time!

  • dotnetic

    dotnetic

    Wow. This is so great. Love this new feature. It is similar to routes in Laravel.

  • Adam Spruijt

    Adam Spruijt

    • 4 years ago
    • 82

    Amazing addition. I'm going to use this sooo much. Thanks Ryan & Bernhard!

  • Norbert

    Norbert

    • 4 years ago
    • 20

    Absolutely awesome, don't know what else to say...

 

Latest news

  • ProcessWire Weekly #552
    In the 552nd issue of ProcessWire Weekly we'll check out the latest weekly update from Ryan, take a quick look at a new e-commerce solution for ProcessWire, and more. Read on!
    Weekly.pw / 7 December 2024
  • Custom Fields Module
    This week we look at a new ProFields module named Custom Fields. This module provides a way to rapidly build out ProcessWire fields that contain any number of subfields/properties within them.
    Blog / 30 August 2024
  • Subscribe to weekly ProcessWire news

“The end client and designer love the ease at which they can update the website. Training beyond how to log in wasn’t even necessary since ProcessWire’s default interface is straightforward.” —Jonathan Lahijani