Jump to content

Protected files


raydale

Recommended Posts

Hi, I was wondering whether there was an easy way to protect files so that they are only available to the page that links to them?

For example: if I have a client page template and add a file field type to it, then create the page and attach a file - I would like to be able to secure that file so that it can only be accessed by the person viewing the page. This way I can create a secure page and file that can only be accessed by a client logging in and accessing their page.

  • Like 1
Link to comment
Share on other sites

We don't have something like this built-in at present, but here are a few options:

1. Security through obscurity. Use a filename that it going to be impossible for someone to arrive at without actually seeing the link to it. The problem with this is that someone can save the link and forward it on to others.

2. Place the file somewhere outside of your web accessible files and use a PHP script (or PW template) as a download passthrough. I can provide an example if you are interested.

3. Use http authentication on the page's /assets/files/ directory. Using cPanel, SSH or whatever tool you have, secure the directory with a password. However, the user viewing the page will need to enter a password before it'll let them download the file.

  • Like 1
Link to comment
Share on other sites

Thanks Ryan,

I would be very grateful if you could give me a brief rundown of solution #2 please as this is largely for a personal one-off site that I will manage, so any complexity there is fine for me.

Going forward - I'm probably going to need a more user friendly method that can be managed by a client user through the admin. Is this sort of functionality on the agenda for PW in the future? If it is - the way Drupal handles private files with D7 is one of the better methods I've seen (if a blueprint was needed).

Link to comment
Share on other sites

Just wondering: what if I used htaccess to determine the referrer? So, if I placed a .htaccess file into the 'assets/files' directory with something like the following?:

IndexIgnore *
Options +FollowSymlinks
RewriteEngine On
RewriteCond %{HTTP_REFERER} !^http://(www.)?domain-name.com/ [NC]
RewriteCond %{REQUEST_URI} !hotlink.(gif|png|jpg|doc|xls|pdf|html|htm|xlsx|docx) [NC]
RewriteRule .*.(gif|png|jpg|doc|xls|pdf|html|htm|xlsx|docx|zip)$ http://www.domain-name.com/filenotexist.txt [NC]

This isn't my code and just something I just found out there and slightly modified when searching for .htaccess possibilities. Obviously, 'domain-name.com' should be replaced by the proper domain you are trying to secure.

I'm not a developer so I'm not sure if there are any potential pitfalls in going this route. What's the opinion here? Would this be a decent way of going about it? For example, would this slow the website down with large numbers of attached files? etc.

Link to comment
Share on other sites

Just going back to the option of storing it outside of the web root, it doesn't look like it would be too hard to create a new fieldtype based on the InputfieldFile.module

Obviously this doesn't help you in your current situation, but I'm thinking out loud so ryan could possibly comment on this :)

ryan - it seems that there is a $this->destinationPath variable. Whilst you can't link directly to a document outside the document root, is it technically possible to extend the module and have it create a folder outside the root, then set this variable to point to it (and on a page-by-page basis have it create a subfolder with the document ID as the subfolder name?).

If so, then with a little work it wouldn't be too hard to have the path configurable (I think). Since you can't automatically serve a file that's above the document root directly, there would need to be a function in the module that deals with this as well, but I think that if this sounds do-able to ryan then I might take a stab at it at some point in the future (next few weeks are pretty busy, but after that then maybe :)).

  • Like 1
Link to comment
Share on other sites

Thanks Pete. I'm looking at both the htaccess and outside of the web root as viable options. I'd be interested to see which would be the best recommendation going forward as it seems both might be valid right now.

Regarding the htaccess route, I'm guessing it wouldn't be too difficult to build a module that could handle this automatically by showing a checkbox to set something like 'Set Private Files'. It would then create the htaccess file dynamically if it didn't already exist. With something like this I could start writing such a module but would need a lot of help with code cleanup etc.

Link to comment
Share on other sites

Good ideas. I think this is definitely an option we'll want to have at some point soon. This is actually something the first iteration of ProcessWire (called Dictator) had back in 2004. If you removed 'guest' access from a page, then it would move its files directory to a non-web-accessible location and use a passthru script to deliver files so that they could only be accessed if the user was logged into an account with the appropriate access. The need hadn't come up since then, which is the only reason why it's not already in PW. But it's one of those things that's very important for private intranets and the like, so that you don't get confidential company PDFs floating around publicly or getting indexed by Google.

  • Like 2
Link to comment
Share on other sites

The following solution seems to work for me. I'm not sure what downfalls there may be to my approach, if any; I'm not an htaccess or regex expert. My solution is a mix of htaccess (a solution raydale touched upon) and one of Ryan's suggested options (downloading through a passthrough PHP script/PW template).

Firstly, the htaccess file that I manually placed into the protected page's file directory. (Example: placed into mydomain/site/assets/files/1072)

IndexIgnore *
Options +FollowSymlinks
RewriteEngine On
RewriteCond %{REQUEST_URI} ^/site/assets/files/(.*)/(.*)$
RewriteRule (.*) http://www.mydomain.com/file-download/?page=%1&file=%2 [L]

Any requests for any file in that directory get redirected to my passthrough page. (A hidden PW page using the following template.)

<?php
// Get page that owns the file
if(!$input->get->page) throw new Wire404Exception();
$page = $pages->get((int) $input->get->page);
if(!$page->id) throw new Wire404Exception();

if ($page->viewable()) { // Visitor has access to the page

    // Build file's URL
 $filePath = $config->paths->files.$page.'/'.$_GET['file'];

 // Force download of the file
 header("Content-type: application/force-download");
 header("Content-Transfer-Encoding: Binary");
 header("Content-disposition: attachment; filename=\"".$_GET['file']."\"");
 readfile($filePath);

} else { // Visitor does not have access to the page

 $redirectURL = $pages->get("/login/")->httpUrl;

   /**
  * Or alternatively, redirect to Processwire's login page if you don't have a custom login page
  * $redirectURL = substr($pages->get('/')->httpUrl, 0, -1).$config->urls->admin;
  */

 $session->redirect($redirectURL, false);
}
?>

The template checks to see if the page is viewable to the site visitor. If so, it forces the download of the file. If the user does not have page view access, it forwards them to a login page.

It seems to work really well for me. Thoughts? Does anyone see any downfalls? At some point, the creation of the htaccess file can be automated based on the template's access settings.

(One standing, though non-critical, question I have is, why does $config->urls->root always return a forward slash for me?)

  • Like 1
Link to comment
Share on other sites

(One standing, though non-critical, question I have is, why does $config->urls->root always return a forward slash for me?)

You're probably expecting $config->urls->root to return a full URL (I'm guessing) in which case you need

$pages->get('/')->httpUrl; 

See this thread for info:

  • Like 1
Link to comment
Share on other sites

Hani I think you've got a good and smart solution here. Nice work coming up with this. My only concern is just that it sends all file accesses through that script rather than just the ones it needs... but obviously there's no simple answer to this problem. There is significant overhead with doing a passthrough, relative to a regular file access, and I think the effect would be noticeable. So a passthrough is one of those things you only want to use when you have to. But if we're not talking about a high traffic or performance-critical situation, then this solution may be just fine. For a core solution, I think we'd take a similar approach, but in a manner that can separate the private from the public really easily, so that no public files are going through passthroughs.

One suggestion though: Change your $pages->get($_GET['page']) to this below, just to ensure you aren't sending unfiltered data through an API call, and that you are dealing with a page that exists:

if(!$input->get->page) throw new Wire404Exception(); 
$page = $pages->get((int) $input->get->page);
if(!$page->id) throw new Wire404Exception();

At some point, the creation of the htaccess file can be automated based on the template's access settings.

I'm not sure we could take this approach unless the /site/assets/files/ directories get moved to another location to separate them from the 'public' files. Otherwise, that htaccess file would have to know every page ID for protected pages. But I think separating the dirs is the right way to go. Either that, or renaming protected page file directories with a "." in front of them, i.e. /site/assets/files/.1234/ That way, the htaccess file can recognize them ahead of time in a predefined manner. And PW's htaccess already blocks any files/dirs that start with a period.

(One standing, though non-critical, question I have is, why does $config->urls->root always return a forward slash for me?)

Because your site is installed in the root of your domain. If it was installed in a subdirectory, then it would return the subdirectory. Also, like Pete mentioned, ProcessWire does not usually include the schema and domain in any 'url' function/property unless you ask it to.

Also - like I said, I'm no htaccess guru. Does the htaccess file settings also block any indexing of the files by search engines?

At this point I think you are more of an htaccess guru than 99% of web developers. :) But to answer your question, unless you put in USER_AGENT rules, your htaccess file doesn't know or care whether it's a search engine spider or a real user coming through. It's going to exhibit the same behavior on both. Meaning, it will prevent indexing of protected files.

Link to comment
Share on other sites

Passthroughs in PHP seem to be the standard for a lot of file download systems to be honest. I might be wrong, but with some cleverness you can also allow it to resume interrupted downloads this way which I'm not sure you can do with a direct link.

This is the function that actually handles the download in the script I've used previously (it's getting on a bit now ;)).

// Print the http header
header("Cache-control: private\n"); // fix for IE to correctly download files
header("Pragma: no-cache\n");	   // fix for http/1.0
header("Content-Type: ".$filetype);
if ( $myrow["realsize"] )
   header("Content-Length: ".(string)($myrow["realsize"]) );
   header("Content-Disposition: attachment; filename=".$filename);
   header("Content-Transfer-Encoding: binary");
   // Open and display the file..
   fpassthru( $fh );
   @fclose( $fh );
   exit();
} // there was some error message generation code at this point that's not relevant

The above is using some variables from a database query (this system acatually masked the filenamess with some form of MD5 hashing so you can't tell what's what if you look at the files on the file system itself, and it pulls various bits from the database to pass the file to teh suer with the correct name and increment download counts etc. All of that is a bit over the top, so I've just copied in the relevant bit in case it's useful.

Link to comment
Share on other sites

Thanks for the feedback, guys! Really helpful.

My only concern is just that it sends all file accesses through that script rather than just the ones it needs...

Not entirely sure what you mean, Ryan. The htaccess file that I manually placed was not in the root files folder, but rather the specific page's file folder. If I am correct (which I may not be) and it acts the way I intended it to, only files being accessed in that folder will be running through the pass through. So for instance, if my page (id #1900, for example) is only viewable by a certain role, then only those logged in with that role can access files found in /site/assets/files/1900. So with my solution, an htaccess file is required for each protected page and located under that page's file folder.

So I guess that means, if I were to have 30 protected pages, I'd have 30 copies of the same htaccess file located across 30 different file folders. While its functional and "invisible", I can definitely agree that it's "messy".

Change your $pages->get($_GET['page']) to this below

Cool, thanks!

renaming protected page file directories with a "." in front of them, i.e. /site/assets/files/.1234/ That way, the htaccess file can recognize them ahead of time in a predefined manner. And PW's htaccess already blocks any files/dirs that start with a period.

Love that! I think that's a great idea. I guess that's a great solution if the htaccess file is located under /site/assets/files. It seems less messy than having an htaccess file in multiple folders.

Link to comment
Share on other sites

Either that, or renaming protected page file directories with a "." in front of them, i.e. /site/assets/files/.1234/ That way, the htaccess file can recognize them ahead of time in a predefined manner. And PW's htaccess already blocks any files/dirs that start with a period.

Just wanted to talk about this a little more to see if I understand it correctly. Since PW blocks access to any files/dirs that start with a period, it would have to change so that it blocks access to those files/dirs EXCEPT those located in the files dir. Instead, for those folders, it would to reference a pass through similarly to how I've done it.

Is that what you're thinking?

Link to comment
Share on other sites

The htaccess file that I manually placed was not in the root files folder, but rather the specific page's file folder.

Sorry, I misunderstood. If those files are only going in the places that need it, then the concern I had is not applicable.

Since PW blocks access to any files/dirs that start with a period, it would have to change so that it blocks access to those files/dirs EXCEPT those located in the files dir. Instead, for those folders, it would to reference a pass through similarly to how I've done it.

Not necessarily. When a file is on a protected page, ProcessWire could spit out a link to the passthrough script as the $file->url(), like domain.com/files/123/somefile.pdf, rather than to its protected location in /site/assets/files/.123/. The $file->filename() property would still refer to the actual disk path since htaccess limits aren't applicable there.

Link to comment
Share on other sites

When a file is on a protected page, ProcessWire could spit out a link to the passthrough script as the $file->url(), like domain.com/files/123/somefile.pdf, rather than to its protected location in /site/assets/files/.123/. The $file->filename() property would still refer to the actual disk path since htaccess limits aren't applicable there.

Ooooooh! Got it! I feel like that's a great way to go since it simply builds off of ProcessWire's current functionality (of ignoring folders starting with a ".").

Link to comment
Share on other sites

  • 2 years later...

Two years later, what's the status on this issue?

I am building a site where some files need to be secured, and will be made available for purchase.

I like the idea of using e.g. a ".private/" folder prefix - that would work fine for me.

But it appears there's no way to configure a path prefix for a File field still?

And I didn't find a third-party module that adds this feature either.

So how are you guys currently going about protecting assets?

  • Like 1
Link to comment
Share on other sites

To implement a module to do this, I would need to hook the path generation at the lowest level, which appears to be PagefilesManager::path() or actually the static method PagefilesManager::_path() at the very lowest level?

Ryan, any chance you can make this hookable?

A module could then implement the path modification, e.g. adding ".protected/" to the path, and adding a new setting to FieldtypeFile, and a checkbox to InputfieldFile.

Does that sound feasible?

  • Like 3
Link to comment
Share on other sites

Would making this hookable open file-uploads up to storing outside the site-root too? If so is just adding a checkbox to InputfieldFile to control adding the fixed string to the path going to be enough or should a more flexible scheme for modifying the path be attempted from the outset?

Link to comment
Share on other sites

On the site I built that has protected files, I'm still using the .htaccess solution I described.  It has the pitfalls Ryan mentioned, and the solution isn't perfect or easily scaleable, but it works!  I definitely would be excited if this feature made it's way on the ProcessWire roadmap.

Link to comment
Share on other sites

  • 3 months later...

Hi Hani,

Quick question: How are you redirecting to the passthrough page?

I've created a template file with the PHP code you posted and created a page using that template but now I'm a little stuck.

I've also got the .htaccess file in assets/files/xxxx/

Link to comment
Share on other sites

Sorry it took me a few days to respond, Jonathan.  Well, the .htaccess file that's in the page's assets directory should automatically redirect any requests to the pass-through page.  So, if I were to try to get 

http://www.mydomain.com/site/assets/files/xxxx/mydocument.pdf

the request will instead will be redirected to 

http://www.mydomain.com/file-download/?page=xxxx&file=mydocument.pdf

The template file is used on the page that I've created at http://www.mydomain.com/file-download

Does that help at all?

At this point, the .htaccess file has to be manually copied into the files folder for each page that you want assets protected.  However, after developing a lot more with PW, I realize that this can be automated by hooking into saveready.  The idea is that if there is a flag set to protect the files for this page, copy the .htaccess file to the files directory.  Otherwise, delete it if it exists.

At one point Ryan had said:

But I think separating the dirs is the right way to go. Either that, or renaming protected page file directories with a "." in front of them, i.e. /site/assets/files/.1234/ That way, the htaccess file can recognize them ahead of time in a predefined manner. And PW's htaccess already blocks any files/dirs that start with a period.

I haven't had a chance to really dive back into this issue (because the solution I've implemented works and it's a small-scale website) - but I think that if a module were built and hooked into saveready - instead of copying an .htaccess file, it simply renames the folder and adds the period at the beginning.  Of course, this has trickle down effects because PW itself would have to know that that's the new "location" of the folder so it doesn't always look for "files" instead of ".files".


Just re-read your question.  I may have given you too much detail and not answered your question directly.

The .htaccess file that I've manually added to the /sites/assets/files/xxxx directory handles the redirect.

Link to comment
Share on other sites

  • 2 months later...

I'm eager to know how this issue is going on. My next coming project might need secure files download probably.

I'm curious, the method provided by Hani, how do ppl upload files (to be secured) to the server ? thru ftp or a template with inputfieldfield ?

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...