Jump to content

Special Chars -- URL Hooks, URL Segments, Page Names -- A lost cause?


Jim Bailie
 Share

Recommended Posts

Taking inspiration from @Robin S's recent tutorial, I told a client we could map all their old pdf file names/links to their newly migrated, and sanitized files in PW. No problem, right?

Last night I set up the URL hook "/concepts/{filename}", and /concepts/My_Test_File4$$$.pdf doesn't even fire the hook and throws a 404.

So I disabled the hook, created the page "concepts" with a template that allows all url segments. "/concepts/My_Test_File4$$$.pdf" throws a 404.

I've read EVERYTHING google/forum will give me pertaining to URL hooks and special chars in page names and such.

Just looking for one more trick before I have to go native and just duplicate all the migrated files in their original directory structure. Thanks!!

Link to comment
Share on other sites

Hello,

Reading the documentation this is the expected behavior:

Quote

URL segments must follow the same format as page names. Meaning, they can be any combination of lowercase ASCII letters (a-z), numbers (0-9), dashes, underscores and periods.

But you can use get parameter: /concepts/?fileName=My_Test_File4$$$.pdf

wire()->input->get->text('fileName');

It's maybe better to use an url encoded PDF name with php urlencode(): /concepts/?fileName=My_Test_File4%24%24%24.pdf

urldecode(wire()->input->get->text('fileName'));
  • Like 1
Link to comment
Share on other sites

@da² Thanks, but the problem is that these full links need to be preserved because they're widely published inside this organization and among their clients and customers sales materials, so they have to be the same.

I was hoping to have PW "fully" manage the files because it would be such a relief to the admins and would allow the org to track the viewership to a certain extent.

I thought some of the information in this article would help, https://processwire.com/blog/posts/page-name-charset-utf8/ , but I just don't think PW is going to have anything to do with having a "$" anywhere near a url. 🤷‍♂️

Link to comment
Share on other sites

I tried to replicate this problem and it turns out there are some characters you can add to $config->pageNameWhitelist and .htaccess all you want, this blacklist in the core won’t have it: https://github.com/processwire/processwire/blob/master/wire/core/Sanitizer.php#L911 Dollar signs are FORBIDDEN!!!!

However, don’t despair yet, you’re never more than a finite amount of disgusting hacks away from success:

wire()->addHookBefore('ProcessPageView::pageNotFound', function(HookEvent $event) {
    if (stripos($_SERVER['REQUEST_URI'], '/concepts/') === 0) {
        //serve file
    }
});

This needs to go into init.php and runs whenever PW would otherwise 404, that is, it can’t find a page or url segments or url hooks for the requested path. Some things will not work as expected in this hook. For example input() will not be populated.

To make the $ work you’ll need to update your .htaccess, though. For example you can just add the $ to the default rewrite condition:

  # PW-PAGENAME
  # ----------------------------------------------------------------------------------------------- 
  # 16A. Ensure that the URL follows the name-format specification required by PW
  # See also directive 16b below, you should choose and use either 16a or 16b. 
  # ----------------------------------------------------------------------------------------------- 

  RewriteCond %{REQUEST_URI} "^/~?[-_.a-zA-Z0-9/\$]*$"
  
  # ----------------------------------------------------------------------------------------------- 
  # 16B. Alternative name-format specification for UTF8 page name support. (O)
  # If used, comment out section 16a above and uncomment the directive below. If you have updated 
  # your $config->pageNameWhitelist make the characters below consistent with that. 
  # ----------------------------------------------------------------------------------------------- 
  
  # RewriteCond %{REQUEST_URI} "^/~?[-_./a-zA-Z0-9æåäßöüđжхцчшщюяàáâèéëêěìíïîõòóôøùúûůñçčćďĺľńňŕřšťýžабвгдеёзийклмнопрстуфыэęąśłżź]*$"

Or use the other one below and add it there. Or do whatever is applicable for nginx? Not ideal but still better than messing with the core, I guess.

  • Like 1
  • Haha 1
Link to comment
Share on other sites

Thank you all for your input. "$"s in PW proper are likely a lost cause for now.

But I think I have the right solution. I dug up an old nginx rewrite rule from a while back and it works!

rewrite ^/concepts/(.*)$ /downloads/?q=$1 last;

So the filename in "/concepts/My_Test_File4$$$.pdf/" can now be captured via $input->get('q') unmolested.

  • Like 2
Link to comment
Share on other sites

If item 2 is enabled in .htaccess then PW will still handle 404s for URLs that don't conform to its rules so you don't necessarily need to change items 16A or 16B.

You could do something like this in /site/init.php:

$wire->addHookBefore('ProcessPageView::pageNotFound', function(HookEvent $event) {
	$info = pathinfo($_SERVER['REQUEST_URI']);
	// Redirect or do whatever
	if(!empty($info['extension']) && strtolower($info['extension']) === 'pdf') {
		$event->wire()->session->location("/find-file/?filename={$info['basename']}");
	}
});

Unfortunately is seems that Tracy Debugger doesn't work on this kind of 404 which makes debugging harder, but I've asked about this in the Tracy subforum and @adrian might have a solution.

 

 

  • Like 2
Link to comment
Share on other sites

9 hours ago, Robin S said:

If item 2 is enabled in .htaccess then PW will still handle 404s for URLs that don't conform to its rules so you don't necessarily need to change items 16A or 16B.

It’s handled, but I wasn’t able to send a 200 status unless allowing the $ in .htaccess. I don’t know what’s going on there, I imagine it’s overridden by Apache after PW is done. Other status codes worked for me, though. Only 200 always turned into 404.

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...