Jump to content

How to no index the URLs that end up in


Marco Ro
 Share

Recommended Posts

Hi guys,

I notice in the Google Search Console a lot of error about a specific link, eg:  site/page it's ok but site/page/?register=1 or  site/page/?forgot=1  give me a lot of problem about targeting, metadata duplicate and title duplicate.

I already add a rel='nofollow' in the specific link, but it's not enough. This work just for the directly link. 

Inside the robots.txt I add  Disallow: *?forgot=1  and  Disallow: *?register=1 but I'm not sure if this works. Before I tried to write  Disallow: /?forgot=1 but this did not work.

 

Some one know how to stop to index the URLs that I end up in this way ?

Thank you!

 

 

Link to comment
Share on other sites

Thank you @entschleunigung.

Interesting reading I also find this link https://geoffkenyon.com/how-to-use-wildcards-robots-txt/ which has the ways of use with examples.

In my case it seems that the correct syntax is:   

Disallow: /*?forgot=1
Disallow: /*?register=1

or

Disallow: /*?forgot=*
Disallow: /*?register=*

 

I don't think there are a lot of different with Disallow: *?forgot=1 I see use this in other website. anyway will see if this will work. I hope I do not see all those errors in the Search Console.

Link to comment
Share on other sites

For those particular cases, your strategy seems fine, since the site/page/?forgot=1 and site/page/?register=1 are entirely different content from site/page/ without the query string (I'm presuming, as it looks like LoginRegister). Another way would be to block them with a meta robots tag in your document <head>, i.e.

if($input->get('forgot') || $input->get('register')) {
  echo '<meta name="robots" content="noindex, nofollow">';
}

Or, you could just deliver unique title tags, and omit the meta description tags, for those cases. But that would take a little more code, and these page variations presumably aren't useful to search engines anyway. So I think what you are doing, or the solution above, is a good way to go. 

For other cases, where the query string doesn't indicate entirely different content (such as a GET var that changes the sort of a list, or something like that), you'd probably want to use a canonical <link> tag. This will tell the search engine that the request can be considered the same as the one given in the href attribute. And the URL in the href attribute is the canonical, or main one. This will prevent the Google Search Console from flagging duplicate titles or meta descriptions on the query string variations of the page.

<link rel='canonical' href='<?=$input->httpUrl()?>'>

Btw, the $input->httpUrl() includes URL segments and pagination numbers (when applicable) so is more likely to be preferable to $page->httpUrl() when it comes to canonical link tags. 

  • Like 2
Link to comment
Share on other sites

thank you @ryan!

Yes you understood well it's about LoginRegister module. I will keep your solution like plan B if the robots.txt file for some reason it don't will work!

About the rel canonical I just added it too.  But I had added it this way:

<link rel="canonical" href="<?php echo 'https://'.$_SERVER['HTTP_HOST'].$page->url; ?>" />

I don't know if the correct way. I tried your <?=$input->httpUrl()?> and give me back the same url. 

Link to comment
Share on other sites

Like a lot of the stuff coming from PHP's superglobals, using $_SERVER[HTTP_HOST] in that manner actually isn't safe because it comes from request headers, which can be manipulated by the user (a type of user input). $input->httpUrl() on the other hand would be safe, or for just the equivalent of $_SERVER[HTTP_HOST] use $config->httpHost instead (which is validated). If your server uses both http and https (meaning same page accessible via http OR https), then you'll want to point to the https one as your canonical version, like you are doing in your example above. Here's how you might rewrite your example above. If you aren't using URL segments or pagination, then it's also fine to replace $input->url() with $page->url() like you did. The primary difference between $input->url() and $page->url() is that the $input version represents the actual request (including page URL plus URL segments, page numbers, and optionally query string), rather than just the static URL of the $page that it loaded. 

<link rel="canonical" href="<?php echo "https://$config->httpHost" . $input->url(); ?>" />

 

  • Like 2
Link to comment
Share on other sites

 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...