Adam Kiss

Hacking the core to have no trail slash as default

Recommended Posts

Hi all,

I need to rebuild one WP site into PW site (silly, just turn it around, right? :D ).

Currently I have around 200 pages, which have no trailing slashes. I'd like to keep it that way, to keep as much of SEO juice as possible. I'm also very lazy, so, during the main template creation (which basically happens once), I'd like to set 'no trailing slash' as default option.

As mentioned in title,I'm quite okay with hacking the core, as far as I can switch it back after initial setup.

So, how to do this? :)

Thanks!

Share this post


Link to post
Share on other sites

Just a quick question - would this even affect your SEO? I would have thought that either way either link would take you to the same place so it shouldn't affect SEO.

I might be wrong though - entirely possible ;)

Share this post


Link to post
Share on other sites

Well look at it this way: With 301 redirect, you should not lose any 'SEO juice'. However, you have to remember to correctly redirect every page, which can be quite error-prone and you may introduce a wild range of errors to your site, from looped redirects to 404 with misclicked redirects.

Also, it can be quite tedious, not to mention possible screw-ups of inbound links (you change thing A and site B linking to your page A is screwed, non-working)

Share this post


Link to post
Share on other sites

This can be set on a per-template basis in your template settings. I don't think you can hack the core for this just because the PW admin requires trailing slashes, so hacking the core to make non-trailing the default setting may cause problems elsewhere in the system. But if you want to match the slash behavior WP URLs, I would set this for the page templates where applicable.

Share this post


Link to post
Share on other sites

Okay, let me enhance my question :)

during the main template creation (which basically happens once), I'd like to set 'no trailing slash' as default option.

I'd like to set 'no trailing slash' as default option to be checked while creating new templates, so I don't have to check it (and possibly forget to check it).

:D

Share this post


Link to post
Share on other sites

I see what you mean. If you are okay with setting it in your /site/config.php file, I can make it happen. This doesn't involve adding any real overhead or complexity to the system, because these types of config options are optional.

  • Like 1

Share this post


Link to post
Share on other sites

1., that would be awesome

2., I'm quite okay with trailing slashes for most of the websites – processwire taught me that, actually. But I really don't want to spend few hours trying to have sensible redirects and/or play with regex for .htaccess, so this site is special :)

3., I'm really okay with rewriting one thing somewhere in core, temporary, until I set up all fields and templates.

Thanks Ryan!

Share this post


Link to post
Share on other sites

This has been kept as a per-template setting. I opted not to make it a system wide setting because I thought it would create some support burden and make it more difficult for people to use. Every time we post an example that includes some kind of link, we'd have to qualify it with "depending on your slashes setting". I recommend sticking to slashes with your pages (defined pages) and optionally omitting slashes with non-defined pages: URL segments or page numbers. Another thing to mention is that Process modules actually require use of trailing slashes for defined pages, so don't disable trailing slashes on your 'admin' template.

Share this post


Link to post
Share on other sites

It really seems to me that this is a technical limitation, or implementation artifact - not something users should have to deal with on a case-by-case basis.

Share this post


Link to post
Share on other sites

I think it is totally appropriate to have some conventions and not configuration for everything. Trailing slashes are - in my opinion - a good convention.

  • Like 4

Share this post


Link to post
Share on other sites

Hack no good, tool better:

foreach($templates as $t){
 if($t->name == "admin") continue;
 $t->slashUrls = 0;
 $t->save();
}

Done.

Edited by Soma
added excluding admin template
  • Like 2

Share this post


Link to post
Share on other sites

Trailing slashes are - in my opinion - a good convention.

Care to elaborate?

In my opinion, it makes every page look like it's a folder - which the majority of pages are not. So in that sense, the convention favors the minority.

I also don't think the trailing slash looks "right" on printed matter or other public links - you don't want to make people type in an extra slash. And of course, you can leave it off - but then the server needs to do a redirect, in which case why do we need the slash in the first place?

I think it's an odd convention. I understand PW needs it for the admin pages, and that's fine, but I can't think of a single reason why you would want trailing slashes on all your public URLs.

Why do you favor that convention?

  • Like 1

Share this post


Link to post
Share on other sites

+1 for 'looks like a folder', totally my opinion, although I was never able to come up with this simple reason I don't like trailing slashes.

(on that matter, this is just a note. I still prefer no trailing slashes, but don't really care enough to be vocal about it)

Share this post


Link to post
Share on other sites

For me the biggest problem with no trailing slash is that the relative linking strategy isn't logical in a tree structure. No trailing slash is fine for bucket-based, non-hierarchal systems, but is problematic in a tree structure. Compare these relative linking strategies. When there is no trailing slash, you are always in the context of the parent rather than the current page:

No trailing slash:

<a href="./">Link to parent</a>
<a href="this-page-name/child-page-name">Link to child</a>

With trailing slash:

<a href="../">Link to parent</a>
<a href="child-page-name/">Link to child</a>
  • Like 1

Share this post


Link to post
Share on other sites

Understood.

Maybe there should be a third option called "default" - where the use of a trailing slash or not is conditional on whether the Page allows subpages?

Given what you explained, perhaps this is actually the correct default behavior? - so we would have nice, "final" looking URLs for end-nodes, trailing slashes for potential parents.

The default behavior could also be conditional on whether or not the Page has any subpages, but this seems riskier, since URLs would change when you add/remove pages, so probably not safe.

It seems safe to assume though, that a Page that isn't allowed to have sub-pages, won't ;)

Thoughts?

Edit: a better name for this option would be "if Page can have children".

Share this post


Link to post
Share on other sites

I don't see any reason to add more settings that are totally irrelevant or purely aesthetic for 99% of projects. Urls with trailing slashes look nicer (imo) and work nicer (like Ryan explained).

Overthinking and overconfiguring each possible option is certain way to bloat.

Page that isn't allowed to have sub-pages might be allowed to have sub-pages later on. Also the family tab settings are only for admin usage - children are possible through API. So I wouldn't tie any code breaking functionality on that setting and assumption.

  • Like 3

Share this post


Link to post
Share on other sites
Maybe there should be a third option called "default" - where the use of a trailing slash or not is conditional on whether the Page allows subpages?

I would be fine with this. But then we'd be left again with the relative URL problem where on some pages relative parent path is "../" and others it is "./", depending on whether they allow children. Same would go for relative sibling links. Even if a page doesn't allow children, we also have URL segments and page numbers to factor into the equation. I guess I think configuring the setting on the URLs tab of the template settings (where it is now) gives us the most predictable control over the slash setting. I do sometimes use this for end-point pages and omit the slash. Ultimately I totally understand the desire to get rid of the trailing slash, but kind of feel like this is something that should be done where specifically intended. I also think use of trailing slash does lead to less confusion in a tree-based system, but for advanced users it's not as much of an issue.

Share this post


Link to post
Share on other sites
Page that isn't allowed to have sub-pages might be allowed to have sub-pages later on.

Agreed. Family settings are easy to change and if URLs are based on those (or actually any other variable that could potentially be altered later) you're going to end up with multiple URL variations for one page, which can't be a good thing no matter how you look at it. In my humble opinion it's much better to stick consistently with one than alter it based on each specific scenario.

I also believe that (again for consistency) URLs without trailing slashes should always indicate files, though based on what @mindplay.dk said way up there ("it makes every page look like it's a folder - which the majority of pages are not") the whole concept of "file" in this environment might be slightly shady.. :)

Share this post


Link to post
Share on other sites

Actually, I think I'm going to have to withdraw that statement. In ProcessWire, every page is a folder - there is really only one type of node.

I guess the underlying problem or cause of confusion, is that the URI protocol, as such, distinguishes folders from files, and ProcessWire does not.

This is one of the things I like about ProcessWire, so I'm not trying to argue against it, but it really does result in rather ambiguous URLs that are not interpreted the way a URI is normally interpreted.

Case in point, consider the URI "foo/bar/":

The way a URI is normally interpreted, this means: give me the default document under the "foo/bar" folder.

The way ProcessWire interprets it, this means: give me the "bar" document under the "foo" folder.

Now consider the URI "foo/bar": (without the trailing slash)

The way a URI is normally interpreted, this means: give me the "bar" document under the "foo" folder.

In this case, ProcessWire is consistent with normaly URI interpretation.

One way to look at this, is to say that documents in ProcessWire do not have names. That is, the URI always indicates a folder-name, and it just so happens that ProcessWire always has a single, nameless, default document associated with every folder.

From that point of view, eliminating the trailing slash is actually wrong - and the problems you have with relative URLs are just a result of the folder/document ambiguity that results from doing so...

  • Like 1

Share this post


Link to post
Share on other sites

I somehow adopted to the slash and also used it in codeigniter. It works and is consistent and you just don't need to think about it once you start messing around with it, it can get a burden.

I also see it as if it's all a folder and a document at the same time. Consider this:

/templates/

same as

/templates/index.php

And both work in PW. Or is it my local install. Anyway.

Edit: yep it's only if urls segments are enabled.

Share this post


Link to post
Share on other sites

Come to think of it though, the problem is actually this:

The trailing slash option is presented to you as a merely cosmetic thing, which really isn't consistent with the way that URIs work - the trailing slash is not just cosmetic, it actually gives the URL a different meaning, hence the problems with relative URLs, as well as (evidently) with a person's interpretation of it.

The disconnect here, is the fact that "can the page have children?" is presented as a separate option from the trailing slash in the first place.

Think about it: "foo/bar" is like "foo/bar.html" - it references a named document "bar" or "bar.html" in the folder "foo".

Meanwhile, "foo/bar/" is like "foo/bar/index.html", which isn't the same thing at all - it references a default document in the folder "foo/bar".

In other words, the trailing slash indicates (to browsers as well as to people) whether the last literal in a URL is a folder or a document.

In a physical file hierarchy, only folders "can have children" - documents of course cannot "have children".

In other words, for this behavior to be consistent with the standard URI scheme, things that can "have children" are folders, which is indicated by the trailing slash, while things that cannot "have children" are documents, which is indicated by no trailing slash.

From this point of view, you really ought to have only option: "can the page have children?" - if so, it's a folder, and it gets a trailing slash, otherwise it's a document, and does not get a trailing slash. The option to manually control the presence of the slash shouldn't be present, because it enables you to break URI conventions.

Just presenting a point of view here - I'm not trying to dictate anything, just giving the subject a good turn-over :)

Share this post


Link to post
Share on other sites

@mindplay.dk: if I'm following your post correctly, you're saying that by default trailing slash should be added to pages which can (currently) have children and omitted from pages which (currently) can't? I do see where that's coming from, but I still have to disagree here, mostly based on the previously mentioned fact that those things (family settings etc.) can be changed quite easily while URLs changing is generally speaking a bad thing.

What you're saying about files and folders is absolutely right; in this context files/folders act exactly as they would in, say, pretty much any UNIX style operating system. Thus /foo/bar/ would refer to folder /bar/ in another folder /foo/ and /foo/bar to a document called bar in folder /foo/. Up to this point we're probably seeing things exactly the same way. That's also where things get slightly more complicated:

I prefer to think that each ProcessWire page really is in a (virtual) folder, so when you're accessing /foo/bar/ you're actually getting served default file in particular folder; in this case /foo/bar/index.* (suffix doesn't matter here.) There's nothing wrong with this, it's always consistent, very common and has been used for many, many years already all around the web. Even if that particular page had only one file and thus didn't really need to exist in a folder, this is still completely valid.

If pages that can't currently have children had to be accessed as files instead of folders, strictly speaking that should also mean that same address with trailing space added should not work -- otherwise they would be folders and not files, and like you said that's a different beast altogether.

Just give it a try in whatever OS you're currently running. Most likely if you create a folder /foo/ and try to access it as foo (without trailing slash) you're still redirected to correct folder, but if you're trying to edit / view a file called foo and add that trailing space you'll only end up with an error ("foo/: Not a directory" or something similar.)

To sum this up current behavior is imho very much consistent with how the OS beneath your site actually behaves and that's another reason it should not be changed (unless you're actually suggesting that user accessing /foo/bar/ should only get 404 in case that admin has decided that bar can't have children, thus making it a file instead of a folder.)

See where I'm going with this? :)

  • Like 1

Share this post


Link to post
Share on other sites

@teppo I don't disagree with any of this - you're basically saying what I said my previous post, that every node in PW is a folder with a default document attached to it.

I guess at least in part, what gives rise to confusion here, is if you created a node called "products", it's probably hard to abstract from the fact that "products/ipod" is actually a folder - since this is the individual product, and most likely the end-node, why would you think of it as a folder? Some nodes are invariable leaf-nodes, and are unlikely to ever have children.

In some ways, I would almost find it preferable to have an extension that indicates explicitly whether you're referring to a document or a folder - or rather, whether you're referring to the node/folder itself, or to the document attached to it.

If you were building a static site (without a CMS), this is how you would do it - "products.html" is obviously a document, while "products/ipod.html" is obviously referring to a document in the folder named "products", which is conceptually a different thing from the document named "products.html", or in the case of ProcessWire, the document named "products".

I think the confusion arises from "products" essentially being two things in ProcessWire: "products" the folder, and "products" the document. This is not consistent with filesystems, where you can't have both a file and a folder with the same name.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


  • Recently Browsing   0 members

    No registered users viewing this page.