Jump to content

PDF viewing issue with pagefileSecure


Arcturus
 Share

Recommended Posts

Hello,

I'm developing a website where all files are only viewable to signed-in members, and I've used the pagefileSecure method to achieve that. The site's similar to a journal library, wherein the end of almost every use case is viewing a PDF file, and sometimes multiple PDF files. The problem I've run into is how various browsers handle PDFs loaded via the pagefileSecure method. Firefox loads the PDF in-browser without fanfare. Edge prompts the user to either view or download the file (neither great nor terrible). However, Chrome will not show the file in-browser at all. It will only download the file, and this presents a significant usability problem, particularly when this is the browser of choice for this site's audience.

I suspect that there's a headers issue at work, but can't seem to see where I would even experiment with that hypothesis. My understanding was that pagefileSecure executes a simple redirect based on a user's permissions, but perhaps there's more to it than that? Has anyone found a solution to this? Interestingly, Chrome doesn't have this problem with protected JPEGs.

Having to implement an alternative file protection scheme would be quite painful at this point.

Link to comment
Share on other sites

What code are you using to display the PDF / send the PDF to the browser, are you using $files->send()? With that method you have control over the headers.

Firefox uses Mozilla's PDF.js for PDF display. If you want to have a consistent display of PDFs across browsers you might want to consider implementing it on your site.  

This might of interest: 

 

And this: https://pdfobject.com/

Link to comment
Share on other sites

1 hour ago, Arcturus said:

I'm using normal HTML text links to these PDFs (pagefileSecure's part of the ProcessWire core).

To get more control over what happens in the user's browser, you can output your PDFs through a specific template file.

Example implementation:

Create a new template "pdf", in the URLs tab allow URL Segments

Create a hidden page named "pdf" under home with that template. URL for that page will be: "/pdf/"

Change text links for the PDFs to a format like "/pdf/{$pageId}/name-of-pdf-file.pdf"

Sample code to create text links:

/** @var Page $page page that the PDF file lives on */
/** @var Page $pdfPage page that ouputs the PDF file */
$pdfPage = $pages->get('template=pdf');
/** @var PageFile $pageFile object that holds the PDF */
$link = $pdfPage . $page->id . '/' . $pageFile->basename; // example result: /pdf/1234/name-of-pdf-file.pdf

Now in site/templates/pdf.php you can use URL segments logic to retrieve and output the pdf

    if ($input->urlSegment1 && $input->urlSegment2) {
        $pageId = $sanitizer->int($input->urlSegment1);
        $baseName = $sanitizer->filename($input->urlSegment2);
        /** @var PageFile $pageFile object that holds the PDF */
        $pageFile = $pages->get($pageId)->file_field->getFile($baseName);
        if ($pageFile) {
            $filePath = $pageFile->filename;
            // see https://processwire.com/api/ref/wire-file-tools/send/ for more options
            /** @var Files $files PW Files API */
            if($filePath) $files->send($filePath, array('forceDownload' => false));
        }
    }

Instead of sending the file directly to the browser, in pdf.php you could implement https://pdfobject.com/ to embed it for every browser

  • Like 1
Link to comment
Share on other sites

Yeah, my links are already opening in a new tab. That's not the issue.

The issue is that, when using pagefileSecure, files are being delivered to the browser from a ProcessWire process rather than a normal file request directly from the server. That's intended; however, browsers sometimes have issues interpreting files delivered by PHP, and that's often due to the headers used (Chrome might think it's being presented with a file it can't display, so it goes directly to a download dialog). I was hoping to not have to develop a downloader of my own to replace the user side of pagefileSecure, particularly for what should be such a basic use case, but looks like I'll need to attempt that.

Thanks for the suggestions gebeer.

Update: I figured that I was going to have make a number of additions to gebeer's code above, such as working in the pagefileSecure prefix and various header changes... but it worked right away, pretty much as is! No issues at all in Chrome or Edge. Sweet. I'm still going to have to come up with a textformatter for links that exist within CKEditor fields, and will circle back to this thread once I have that coded and working.

 

  • Like 2
Link to comment
Share on other sites

6 hours ago, Arcturus said:

Update: I figured that I was going to have make a number of additions to gebeer's code above, such as working in the pagefileSecure prefix and various header changes... but it worked right away, pretty much as is! No issues at all in Chrome or Edge. Sweet. I'm still going to have to come up with a textformatter for links that exist within CKEditor fields, and will circle back to this thread once I have that coded and working.

Great that this is working for you. Please share your additions to make it work with pageFileSecure. I never used that but would be interested in seeing how you implemented it.

Link to comment
Share on other sites

On 11/17/2022 at 2:26 PM, Arcturus said:

My understanding was that pagefileSecure executes a simple redirect based on a user's permissions, but perhaps there's more to it than that? Has anyone found a solution to this?

It took some investigating and it would be great to have this made more obvious in any documentation for $config->pagefileSecure...

Behind the scenes pagefileSecure is using $files->send():

Quote

This function utilizes the $config->fileContentTypes to match file extension to content type headers and force-download state.

And $config->fileContentTypes forces download for certain extensions based on whether the content type is preceded by a + sign.

You can override the default for the pdf extension in your /site/config.php and then the files should display in the browser:

$config->fileContentTypes('pdf', 'application/pdf'); // No plus sign before the content type

 

  • Like 7
Link to comment
Share on other sites

Thanks Robin, I was able to confirm that the $config->fileContentTypes addition works, and is obviously an easier solution to implement for the problem I described originally.

gebeer, the code I used is unchanged from what you wrote. I expected to have to manually handle the URL correction to use the proper prefix, but the functions related to pagefileSecure were firing after your code while continuing to respect your 'forceDownload' => false parameter.

It looks like I won't have a chance to work on that textformatter until sometime next week. It's not necessary when using Robin's solution, but I kind of like the extra level of file location obfuscation that the download template provides.

  • Like 1
Link to comment
Share on other sites

Nearly bit off more than I could chew with that related Textformatter, but I have the following in place and working nicely with my CKEditor fields.
Fair warning: the following is the franken-result of several hundred Stack Overflow queries. You may wish to shift your server to a explosion-proof container before testing.

[the usual textformatter boilerplate leads into...]

public function formatValue(Page $page, Field $field, &$value){

  $value = preg_replace_callback('/href=\"(.*?)\"/', function($matches) use ($page) {

    $match = $matches[0];

    if (strpos($match, '.pdf') !== false){ // Adjust your conditions accordingly
      $pieces = explode('/', $match);
      $match = 'href="/download/'.$page->id.'/'.end($pieces).'"';
    }

    return $match;

  }, $value);
}
  • Like 1
Link to comment
Share on other sites

Your textformatter seems like a valid solution. Although I am not sure whether the regex for getting the file URL is 100% reliable. I'm not a regex prof. But in general it is not a good idea to rely on regex for parsing HTML. You could use PHP's SimpleXML to parse and replace the href attributes. Adapt something like this: https://stackoverflow.com/questions/27326066/simplexml-php-change-value-from-any-node  

But I think you don't even need a textformatter for the task. A simple hook will intercept all URLs to PDF files and return the desired URL under /downloads/.

Place this in site/init.php

<?php
namespace ProcessWire;
// intercepts all file URLs and rturns URLs in desired format.
wire()->addHookAfter('Pagefile::url', function(HookEvent $event) {
    /** @var PageFile $pageFile */
    $pageFile = $event->object;
    if(strtolower($pageFile->ext) === 'pdf') {
        $pageId = $event->page->id;
        $event->return = '/download/' . $pageId . '/' . $pageFile->basename;
    }
});

Now all URLs to PDF files will have the desired format. Even when you add a link to a file inside CKE editor, the desired URL will inserted.

custom-file-url.gif.cfe85e53a98a7784f0ee3d09059f5e89.gif

 

  • Like 5
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...