Jump to content

ProcessWire's dependence on .htaccess


ethanbeyer
 Share

Recommended Posts

I've introduced ProcessWire (gleefully) to quite a few devs I know. One, who is quite a bit more experienced than I am, raised a question that I couldn't answer:

Does ProcessWire rely on .htaccess *too much*?

I realize that .htaccess handles a lot of baseline security and routing issues within the ProcessWire ecosystem. Inevitably, this leads to a large reliance on Apache, unless one is willing to try to convert those rules to NGINX configurations.

This has led me to wonder, is there a better way (PHP based?) of handling routing and security than .htaccess?

  • Like 1
Link to comment
Share on other sites

@ethanbeyer I can't address the "better way" part of your post, but you can use NGINX if you want, although it isn't officially part of the codebase.

There are several people who have successfully converted the ruleset to NGINX and you should be able to find the settings here in the forum if you want to review them. Another option is to just pull down a pre-configured LEMP + PW stack from docker hub.

Link to comment
Share on other sites

Yeah, I don't know the "better way" either, at least in terms of actual methodology - I suppose what I was wondering is if anyone had experience in this realm and could help the ProcessWire community at large detach ProcessWire from viewing Apache as a necessity out of the box. Does that make sense?

Link to comment
Share on other sites

45 minutes ago, ethanbeyer said:

I realize that .htaccess handles a lot of baseline security and routing issues within the ProcessWire ecosystem. Inevitably, this leads to a large reliance on Apache, unless one is willing to try to convert those rules to NGINX configurations.

This has led me to wonder, is there a better way (PHP based?) of handling routing and security than .htaccess?

Using .htaccess is efficient. You could, in theory, handle (nearly) everything in PHP, but this would add serious overhead, both memory and speed wise. Others have adapted the rules for NGINX, a few like me use IIS with URL Rewrite to power PW, but in all cases, it makes sense to filter and rewrite requests in the web server itself.

The topic of supporting more platforms than just Apache out of the box has been brought up a few times already here in the forums. Ryan himself is not opposed to it, but he lacks the time to develop and test the rule sets, and he is of course wary of including anything that hasn't been well tested or that might end up without active support (any changes he makes for .htaccess need to be quickly adapted for other platforms). So I guess it would need a team of knowledgeable volunteers who develop the rules, adapt the installer script, test everything well and provide quick support before he considers integration into the PW project.

  • Like 10
Link to comment
Share on other sites

Relying on .htaccess is not really uncommon for php cms's out there. Some are more minimal on utilising apache, processwire is probably more on the "take what we can get"-side. Having php serve everything – especially static assets – is just not performant enough in any way. That's the reason for needing to rely on the webserver in front of php in the first place. The reason for apache specifically is because of .htaccess. Other webservers are usually only statically configurable, which is a deal-breaker for any shared-hoster, where a (global) webserver cannot be restarted whenever a single user needs to change his configuration. So if you only support one webserver, it better be apache. 

  • Like 12
Link to comment
Share on other sites

Sorry to be the guy that comes back like, "wait a minute!"–but I'm still thinking about this.

Originally, I was asking, "does ProcessWire need .htaccess, and if so, why?"

All the answers given boil down to yes, PHP CMS' need an .htaccess file, and the reasons are speed, security, and delegating to the HTTP server what the HTTP server does best. I hear you all! Thank you again for those replies.

Now I want to ask...is the ProcessWire .htaccess file doing too much?

Here it is, with comments stripped:

Options -Indexes
Options +FollowSymLinks
# Options +SymLinksifOwnerMatch

ErrorDocument 404 /index.php

<Files favicon.ico>
  ErrorDocument 404 "The requested file favicon.ico was not found.
</Files>

<Files robots.txt>
  ErrorDocument 404 "The requested file robots.txt was not found.
</Files>

<IfModule mod_headers.c>
  Header always append X-Frame-Options SAMEORIGIN 
  Header set X-XSS-Protection "1; mode=block"
  # Header set X-Content-Type-Options "nosniff" 
</IfModule>

<FilesMatch "\.(inc|info|info\.json|module|sh|sql)$|^\..*$|composer\.(json|lock)$">
  <IfModule mod_authz_core.c>
    Require all denied
  </IfModule>
  <IfModule !mod_authz_core.c>
    Order allow,deny
  </IfModule>
</FilesMatch>

<IfModule mod_php5.c>
  php_flag magic_quotes_gpc     off
  php_flag magic_quotes_sybase      off
  php_flag register_globals     off
</IfModule>

DirectoryIndex index.php index.html index.htm

<IfModule mod_rewrite.c>

  RewriteEngine On
  AddDefaultCharset UTF-8
  
  # RewriteCond %{HTTPS} off
  # RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
  # RewriteCond %{HTTP:X-Forwarded-Proto} =http 
  # RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

  <IfModule mod_env.c>
    SetEnv HTTP_MOD_REWRITE On
  </IfModule>

  # RewriteBase /
  # RewriteBase /pw/
  # RewriteBase /~user/
  RewriteRule "(^|/)\.(?!well-known)" - [F]
  # RewriteCond %{HTTP_HOST} !^www\. [NC]
  # RewriteRule ^ http://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
  # RewriteCond %{REQUEST_URI} "[^-_.a-zA-Z0-9/~]"
  # RewriteCond %{REQUEST_FILENAME} !-f
  # RewriteCond %{REQUEST_FILENAME} !-d
  # RewriteRule ^(.*)$ index.php?it=/http404/ [L,QSA]
  RewriteCond %{REQUEST_URI} !(^|/)site-[^/]+/install/[^/]+\.(jpg|jpeg|png|gif)$
  RewriteCond %{REQUEST_URI} (^|/)\.htaccess$ [NC,OR]
  RewriteCond %{REQUEST_URI} (^|/)(site|site-[^/]+)/assets/(cache|logs|backups|sessions|config|install|tmp)($|/.*$) [OR]
  RewriteCond %{REQUEST_URI} (^|/)(site|site-[^/]+)/install($|/.*$) [OR]
  RewriteCond %{REQUEST_URI} (^|/)(site|site-[^/]+)/assets.*/-.+/.* [OR]
  RewriteCond %{REQUEST_URI} (^|/)(wire|site|site-[^/]+)/(config|index\.config|config-dev|env)\.php$ [OR]
  RewriteCond %{REQUEST_URI} (^|/)(wire|site|site-[^/]+)/templates-admin($|/|/.*\.(php|html?|tpl|inc))$ [OR]
  RewriteCond %{REQUEST_URI} (^|/)(site|site-[^/]+)/templates($|/|/.*\.(php|html?|tpl|inc))$ [OR]
  RewriteCond %{REQUEST_URI} (^|/)(site|site-[^/]+)/assets($|/|/.*\.php)$ [OR]
  RewriteCond %{REQUEST_URI} (^|/)wire/(core|modules)/.*\.(php|inc|tpl|module|info\.json)$ [OR]
  RewriteCond %{REQUEST_URI} (^|/)(site|site-[^/]+)/modules/.*\.(php|inc|tpl|module|info\.json)$ [OR]
  RewriteCond %{REQUEST_URI} (^|/)(COPYRIGHT|INSTALL|README|htaccess)\.(txt|md|textile)$ [OR]
  RewriteCond %{REQUEST_URI} (^|/)site-default/
  RewriteRule ^.*$ - [F,L]
  RewriteCond %{REQUEST_URI} "^/~?[-_.a-zA-Z0-9/]*$"
  # RewriteCond %{REQUEST_URI} "^/~?[-_./a-zA-Z0-9æåäßöüđжхцчшщюяàáâèéëêěìíïîõòóôøùúûůñçčćďĺľńňŕřšťýžабвгдеёзийклмнопрстуфыэęąśłżź]*$"
  RewriteCond %{REQUEST_FILENAME} !-f
  RewriteCond %{REQUEST_FILENAME} !-d
  RewriteCond %{REQUEST_FILENAME} !(favicon\.ico|robots\.txt)
  # RewriteCond %{REQUEST_FILENAME} !\.(jpg|jpeg|gif|png|ico)$ [NC]
  # RewriteCond %{REQUEST_FILENAME} !(^|/)site/assets/
  RewriteRule ^(.*)$ index.php?it=$1 [L,QSA]
  # RewriteRule ^(.*)$ /index.php?it=$1 [L,QSA]

</IfModule>

That's 86 lines.

For comparison, here's Laravel's:

<IfModule mod_rewrite.c>
    <IfModule mod_negotiation.c>
        Options -MultiViews -Indexes
    </IfModule>

    RewriteEngine On

    # Handle Authorization Header
    RewriteCond %{HTTP:Authorization} .
    RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]

    # Redirect Trailing Slashes If Not A Folder...
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_URI} (.+)/$
    RewriteRule ^ %1 [L,R=301]

    # Handle Front Controller...
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteRule ^ index.php [L]
</IfModule>

 

And Wordpress':

<IfModule mod_rewrite.c>
	RewriteEngine On
	RewriteBase /
	RewriteRule ^index\.php$ - [L]
	RewriteCond %{REQUEST_FILENAME} !-f
	RewriteCond %{REQUEST_FILENAME} !-d
	RewriteRule . /index.php [L]
</IfModule>

 

I am wondering if the .htaccess has to do as much as it's doing right now, or is some of this stuff redundant to validation already in Core?

I fully agree we need it, but I wonder if we could move some of the common configurations out of it, because at the end of the day, the app should be able to work out of the box on either Apache or NGINX with limited knowledge of either system and not need gobs of HTTP configuration. I don't personally know how to do spearhead this effort, but other systems have found a way to do so, and I think ProcessWire could garner more market share if the initial configuration appeared hella simple. The .htaccess (and by extension, conversion to NGINX) scares some people.

I think it would be worthwhile to discuss the possibility and plausibility of moving some of those .htaccess configs to the core in some form or fashion.

Link to comment
Share on other sites

A lot of the rules protect direct HTTP access to sensitive files such as config, logs, sessions, templates... most other frameworks and CMS's keep this sensitive stuff out of the web root, so HTTP based access is not possible by design. In ProcessWire, the root directory equals the web root directory, so we need to protect access to these files. On the plus side, ProcessWire can be installed on hostings that do not allow to change the web root. The drawbacks are that we need to be extra careful that nobody can access sensitive/private files.

  • Like 4
Link to comment
Share on other sites

1 hour ago, ethanbeyer said:

or is some of this stuff redundant to validation already in Core?

It's more about server/hosting security than validation as it happens server-side and before any request reaches ProcessWire.

1 hour ago, ethanbeyer said:

at the end of the day, the app should be able to work out of the box on either Apache or NGINX with limited knowledge of either system and not need gobs of HTTP configuration

ProcessWire works out of the box in about 95-99% situations already. Maybe not in case of NGINX but that's another thing. Knowing more about systems and their configuration isn't a bad thing at all. I don't want to miss anything I know about servers and how to handle them.

1 hour ago, ethanbeyer said:

I think ProcessWire could garner more market share if the initial configuration appeared hella simple.

Have you ever installed Typo3? It is (or at least was) pain in the a** and they still have a nice market share.

1 hour ago, ethanbeyer said:

The .htaccess (and by extension, conversion to NGINX) scares some people.

I'm more afraid of doing my taxes and still I have to deal with it. There are professionals for those things. 

 

I wouldn't spend too much time and effort in replacing some lines in the .htaccess file.

Why bother with something that's already working almost all of the time?

Debugging those installation or migration issues within the core of a CMS is way more difficult than (un-)commenting some lines in a plain-text file.

And to be honest... I'm quite shocked that WordPress has just a few lines in its .htaccess.

 

Last but not least...

1 hour ago, ethanbeyer said:

Now I want to ask...is the ProcessWire .htaccess file doing too much?

In my opinion: No... I think the .htaccess could do even more in some cases. For example: https://processwire.com/blog/posts/optimizing-404s-in-processwire/

  • Like 7
Link to comment
Share on other sites

All my test sites run from a virtual server managed by Plesk with NGINX in front of apache. Never had to bother with any configuration details, PW installs without any special attention to .htaccess or similar. It simply works out of the box for me.

  • Like 1
Link to comment
Share on other sites

So there are a few things to unpack:

3 hours ago, ethanbeyer said:

Here it is, with comments stripped:

You essentially removed exactly what's needed to know when working with an .htaccess file being unfamiliar with apache. There are like two setting in there, which are rather common to change: rewrite base (shared hosters often need it set) and https. That's it. Everything else is either static or only to be touched in certain corner cases.

3 hours ago, ethanbeyer said:

I am wondering if the .htaccess has to do as much as it's doing right now, or is some of this stuff redundant to validation already in Core?

If you have rules to let files be directly accessed by the webserver (and not route through php -> bad performance) then you might want to disallow certain folders, which should not be accessable. Things like caches or logs are prone to leak important information about your website and simply allowing any file in the system to be accessable (see wordpress) is not good practice at all. Also this is not stuff that can be handled by the core, because you don't want php involved for static files, but you also want not all static files be accessable. For an example: Say you deploy your composer.json (you shouldn't, but well) with all the other files of your project. People could happily look for you using packages with known security issues and you'd even aid them, by telling them what version constraints you've set up.

It was also already noted that for laravel most of the important stuff is not even web accessable by default, because only it's public folder is meant to be served by a webserver. This is not possible for processwire again because of shared hosters, where often you don't have the possibility to store files outside the webroot to begin with.

3 hours ago, ethanbeyer said:

the app should be able to work out of the box on either Apache or NGINX

That I can support. Nginx configs provided by the core would be really nice and would probably make ProcessWire more approachable for the usecases, where people rent own servers and want e.g. the performance of nginx.

3 hours ago, ethanbeyer said:

The .htaccess (and by extension, conversion to NGINX) scares some people.

I'd say "the conversion" is scary without knowledge. I can understand that. But the plain presence of an a-bit-longer .htaccess didn't bother me much even before I knew what any of it did. You won't even notice in most cases as the instructions to install are basically: Download processwire, unzip and open url. There are cases where that's not enough, but it's certainly a big majority where this will just work. 

3 hours ago, ethanbeyer said:

I think it would be worthwhile to discuss the possibility and plausibility of moving some of those .htaccess configs to the core in some form or fashion.

This is ill driven as like 95% of those rules wouldn't even hit the core without serving static files through php are things not changeable by php at all.

So my final comment would be: The current .htaccess is usefully more elaborate than certain other systems. It makes it more complex to port to other systems, but it's 50% just a bunch of regexes to secure certain files/folders, so I'd say it's more tedious work than a to difficult one.

  • Like 5
Link to comment
Share on other sites

Thanks again for the responses, @Wanze, @wbmnfktr, @Autofahrn, and @LostKobrakai. All of your answers (again) help me make sense of things I didn't know or understand as fully as I thought I did. I've thought for a long time that the .htaccess file that ProcessWire uses was way more robust (read: better) than other CMS' that don't hide most of the app outside of webroot, but I hadn't thought about necessity of most of it. I think all of my thoughts/worries have been sufficiently addressed.

  • Like 2
Link to comment
Share on other sites

 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...