Jump to content
Sevarf2

URL schema

Recommended Posts

Hello,

first i want to say thanks for this great CMS.

Now the question: it's possible to change the url structure for example from /page-name/ to /page-name.html ? And for subpages /main-page/subpage.html

Thanks

Share this post


Link to post
Share on other sites

Hello,

welcome to forums. I'll let Ryan answer this one, but I'm just curious: do you have any reason why you would like to accomplish this, or is it simply matter of personal preference?

Adam

Share this post


Link to post
Share on other sites

The reason is for SEO optimization. It's just a little improvement but i always use this kind of url structure.

If it's not too complicated it could be fine.

Share this post


Link to post
Share on other sites

Sidenote: I don't think SE do care about your URL schema, it's more things like keyword density, inbound/ooutbound links, etc.

But Ryan (the creator) should be here in around two hours, so he'll answer your question :)

Share this post


Link to post
Share on other sites

As i said it's a little improvement, like a "fine tuning".

P.S.: keywords are no more considered by google.

Share this post


Link to post
Share on other sites

Could you try this?

1., create dedicated function, which does something like this:

function URLize($url){
 $new_url = substr($url, 0, length($url-1));
 $new_url .= '.html';
 return $new_url;
}

then in your templates:

 echo URLize($that_page->url);

2., edit .htaccess' line 43 to something (and I'm not sure it's 100% correct) like this:

RewriteRule ^(.*).html$ index.php?it=$1 [L,QSA]

And I'm like 80% sure there is currently no better solution.

Share this post


Link to post
Share on other sites

ProcessWire will support .html extensions. You just have to make them your page names, i.e. "about.html" rather than "about", or look for them in urlSegments. I've actually done this before, though for specific pages (like /sitemap.xml), not on a site-wide basis. But I don't see any problem with it conceptually.

There is one implementation problem in that PW2 enforces slashes at the end of the URL. However, I think I should be able to make that part optional as part of each template's advanced configuration. Actually, I'd meant to make that optional even before this question came up. So let me work on that part, and I think this will be an easy addition.

As for the value of keywords, I think you guys were talking about two different things. I think Sevarf2 is talking about meta keywords. Meta keywords have never been used by Google, and they were thrown out of most other major search engines about a decade ago. But meta keywords are still valuable if you are running your own spider, perhaps something indexing across multiple platforms on a very large site. But excluding that, there's not much point in using the meta keywords tag. I think it's better to leave them out-- A spammy meta keywords tag can still hurt you, even though a meta keywords tag can't help you. I think Adam was referring to keyword density, like the fact that key words/phrases have to be present on a page (or links to it) in order for it to be matched by Google. And the strength of those keywords can relate to where they are placed in the markup (i.e. <title>, anchor text and headline's carry more weight, for starters). And then there are some formulas about density, but they are a matter fine tuning as well. Roughly 80% of SEO is what happens on other sites, not yours (i.e. who's linking to you), so I usually tell people to just focus on making the site highly accessible with high quality semantic markup and content, and focus on making it something that people would want to link to, and leave it at that. 

My experience is that the best results come from logical URL structures that are highly readable and contain keywords. I don't think that Google actually ranks based on keywords in your path, but they certainly highlight them in the SERPS, which is worth it right there. I would be surprised if there was any benefit to using ".html" in your URLs on a new site (as opposed to an old site). At one time, I think there was a benefit, just like there was a benefit to using subdomains over paths. Such benefits don't last because they just exploit a temporary vulnerability in Google's algorithm. Ending with ".html" is kind of a legacy thing, present on sites that have been around a long time. Benefits from ".html" might be a side effect of the age of sites where they are used, when in fact it's the age that drives the result. Where I think the real benefit would lie is if you are trying to maintain old URLs  that end with ".html". If you've had a page at /page.html for a really long time and it's well indexed in Google, it's to your benefit not to change it. While technically Google should transfer page rank to your new page (via a 301 redirect), it doesn't always work and/or throws you in the sandbox for a little while. Other times, it works as planned, but it's a risk on a high traffic page. So if you are trying to maintain legacy URLs that end with ".html" there is most certainly an SEO benefit to keeping them at ".html". But on a new site, I think I would avoid using ".html" in your URLs. I don't see any real world evidence that ".html" influences performance, and I don't view dust crawling as being applicable for the type of pages we are talking about. But lets just assume that there was some benefit. Using .html in URLs that aren't actually composed of static files that end with ".html", with the intention of swaying Google, would likely cross the line on their webmaster guidelines (short term benefits lead to long term pain). I would avoid .html in your URLs unless it's literally a static page where the file extension drives the mime type. Otherwise, it's trying to deceive Google a little bit, and that's never a good long term strategy. All of this is speculation of course, I don't know anyone working at Google, but I do enjoy the subject! If you know something more about this, please keep the conversation going.

Share this post


Link to post
Share on other sites

I've been once again defeated by the simplicity of PW! :)

@Ryan, is it possible somehow (now or in future) to remove the trailing slash?

@OP: Also, there should be possible to add hook while saving page, so it would automatically add '.html' extension to your page name (/url/slug), if we can solve the trailing slash question.

Share this post


Link to post
Share on other sites

Adam: I'm going to make the trailing slash configurable on a per-template basis. It'll be "on", "off" or "either", with the default being "on".

Share this post


Link to post
Share on other sites

Ryan: I don't think 'either' it's a good idea! Actually, system should have only on/off settings and if something, then have a 301 redirect to the active setting from both.

e.g. you have trailing slashes off, so /page/subpage works, but /page/subpage/ does redirect to aforementioned

This is especially important – to not have duplicated content!

Share this post


Link to post
Share on other sites

You are right about that, there should not be an either option, not sure what I was thinking.

Share this post


Link to post
Share on other sites

A while back I saw some independent tests that showed keywords in URLs definitely counted in some SE results (Google and Yahoo included). Plus if you read some of the comments from Google themselves then seems to suggest there's some benefit:

From Sitepoint: "What Is the URL structure preferred by Google?

Google’s Matt Cuts replied:

I would recommend

long-haired-dogs.html

long_haired_dogs.html

longhaireddogs.html

in that order. If your site is already live on the web, it’s probably not worth going back to change from one method to another, but if you’re just starting a new site, I’d probably choose the URLs in that order of preference. I can only speak for Google; you’ll need to run your own tests to see what works best with Microsoft, Yahoo, and Ask."

However, I think Ryan's comments are spot-on regarding the file extension part; i.e. has no effect other than pages already ranked by Google from an old site which include specific file extension

Rgds M

Share this post


Link to post
Share on other sites

As far as I know, there isn't [for now at least]

but I think it will be there, at least soon, since you're not the first to ask [and frankly, I like it without trailing slashes better too]

Share this post


Link to post
Share on other sites

I made the slash configurable by template. If you download the latest commit, you'll see it as a new advanced setting for each template.

Adam, I also made the page number prefix configurable now with $config->pageNumUrlPrefix = 'your_prefix'; If not specified, then it defaults to 'page', as before, i.e. 'page1', 'page2', 'page3', ...

Share this post


Link to post
Share on other sites

Great stuff!

But if I may add something, template setting only is bit redundant here isn't it? Is there site-wide setting too? So you set it once and set something different only if you overload wite-wide setting?

Share this post


Link to post
Share on other sites

I made the slash configurable by template. If you download the latest commit, you'll see it as a new advanced setting for each template.

Adam, I also made the page number prefix configurable now with $config->pageNumUrlPrefix = 'your_prefix'; If not specified, then it defaults to 'page', as before, i.e. 'page1', 'page2', 'page3', ...

This is good..but it could be better when i choose "No" for this setting to add a custom end like .html o whatelse instead of a /  ;D

Share this post


Link to post
Share on other sites

I will look at adding that option. Though this definitely falls into the court of being something I wouldn't ever use on my own sites, and I would question the value of doing it. If it's for maintaining legacy URLs, you are better off using Apache to 301 redirect them away from the legacy URLs. Also, you can always make pages end with .html by making that the page name, i.e. "mypage.html" rather than "mypage", and turning off trailing slashes.

Share this post


Link to post
Share on other sites

Adam, the page number prefix is site-wide, not template. The slash setting is by template. The default state is for it to enforce slashes, as before. Nothing has changed unless you go into a template and specifically set it to not enforce the slashes.

Share this post


Link to post
Share on other sites

Ryan: I know, I saw the commit [already patched my fork]

I just think that the slash/noslash is matter of personal preference – I actually feels like pages shouldn't have slashes [that's highly subjective]

However, people often have these things – and if it's quick hack for you [e.g. one text field and something], why not do it that way, so even heavily biased people want to use PW?

I remember, that when I started doing websites, I preferred .htm over .html. Then I preferred .php over .phtml or .php3. Everyone has these little preferences, nobody is saying that either slashes or noslashes question has some huge SEO impact.

Share this post


Link to post
Share on other sites

I will look at adding that option. Though this definitely falls into the court of being something I wouldn't ever use on my own sites, and I would question the value of doing it. If it's for maintaining legacy URLs, you are better off using Apache to 301 redirect them away from the legacy URLs. Also, you can always make pages end with .html by making that the page name, i.e. "mypage.html" rather than "mypage", and turning off trailing slashes.

Good solution adding .html in the page name...

Share this post


Link to post
Share on other sites

I do want to make sure people have the flexibility to do it any way they want, so I think that's what we've got now (they can set it according to their preference). I've been meaning to add this slashes setting, so figured now was the time with this most recent request.

My preference for the slashes is because a page can be both a container for data (fields) and a container for pages (children). As a matter of consistency, I want to treat all pages the same (at least on my own sites) so that my site's API code doesn't always have to be looking for the presence of slashes when working with selectors, relative paths, url segments and such. I don't want to have to always consider these things when developing a site. As for adding extensions like ".html", that would kill the ability to use page URLs/paths in selectors, unless you actually named your page with the ".html" extension. So if we start adding automatic extensions, I think we start creating a lot more work for the site developers and general confusion... at least I would find it confusing. :) Sure there might be solutions around the issues, but if something is going to be used on less than 30% of sites then it doesn't belong in the core (which would make extensions a possible good module idea).

Share this post


Link to post
Share on other sites

Ryan, now you're saying out loud what I totally believe in: If it's not major thing, don't add it to core!

Anyway, I still think that slash setting should be site-wide – I can't see any reason now for it to be template setting, I mean: If you prefer no-slash urls, you prefer it on every page you have, not on some only.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...