Jump to content

Unusual URL path showing up in Google Analytics - anyone else seen this?


daniel-not-dan
 Share

Recommended Posts

I was looking over my Analytics data for my site the other day, and I noticed some of the pages people are landing on have a strange relative url that's not part of my url scheme.

Site: http://thesharktankproducts.com

Example url: thesharktankproducts.com/categories/food

Weird url showing up in Google analytics: thesharktankproducts.com/index.php?it=categories/food

Both urls are showing up in Analytics, so it's not a question of the correct one not showing up.

I have deduced that almost all the traffic to these weird urls is coming from Yahoo and Bing, so it may be something I need to fix with them. But, still, it's weird that this url A) works and B) somehow got into the search page results at Yahoo and Bing.

Has anyone seen this before? Should I set Redirects for the pages this is happening on? Is there another way inside PW to fix this?

Thanks!

Link to comment
Share on other sites

That GET param ("it") is what ProcessWire internally uses; a request to example.com/categories/food/ gets passed from .htaccess to index.php as example.com/index.php?it=categories/food. This explains why these work, but not why they're showing up in your analytics data :)

So far my best guess would be that, for some reason, sometimes an error happens somewhere, causing such URLs to become visible to visitors, including robots. I did notice that Google has indexed some of these URLs, so it's actually possible that this error doesn't even happen anymore. Google is quite good at holding on to indexed pages, even if there are no more links to those pages :)

If you want to debug this further, the first question would be if you have any custom rules in your .htaccess or anything like that. Something that could clash with existing rules there? I'd also suggest taking a look at your error log files (Apache and PW) to see if anything weird shows up there related to these URLs.

  • Like 1
Link to comment
Share on other sites

Thanks for the speedy response! 

I have made a few changes to my htaccess. Last year, Analytics was telling me I had redundant hostnames so I added a bit at the end of my htaccess file that I found somewhere, and that seemed to resolve the issue.

However, now I'm seeing a very definite correlation between when Analytics says the redundant hostnames issue was resolved, and when the traffic to the index.php?it= pages started happening! There was a big spike in traffic to those URLs starting around 11/22/14 (and lasting a few weeks) and then again around 2/21/15 (and still on-going). Google Analytics detected redundant hostnames had been resolved on 11/25/14 and again on 2/24/15. Definitely seems like a correlation there. I'm pretty sure I added the extra bit to the htaccess at one of those two times, but can't be sure which.

Here's the bit I added to htaccess (I wish I'd noted where I'd stolen it from, but alas I didn't):

RewriteCond %{HTTP_HOST} ^www\.(.+) [NC]
RewriteRule ^(.*) http://%1/$1 [R=301,NE,L]

I included it right after this:

#################################################################################################
# END PROCESSWIRE HTACCESS DIRECTIVES
#################################################################################################

This would seem to be the likely culprit, I would imagine, right?

There weren't any errors in the PW logs about those URLs, and... I'm not sure where to find my Apache logs!

In the meantime, I've also created <link rel='canonical' href='domain.com/path/to/url' /> tags in my headers so hopefully the search engines start weeding out those ugly URLs.

Link to comment
Share on other sites

Yeah, that explains it perfectly. In fact, if I open www.thesharktankproducts.com/products/ I end up at index.php?it=products/. The problem is that you're doing your own custom redirect after ProcessWire has already rewritten the URL to index.php?it=some-url  :)

You might want to take a look at ProcessWire's default .htaccess rules. There's similar www redirect there, though it works the other way around: non-www URLs are prefixed with www. The important thing is the position of this rule in the .htaccess file; you'll want to add your custom rule to similar position.

Generally speaking, though, redirects like that should be placed as early as possible in the .htaccess file, since that's the most efficient way: you don't want to parse any extra rules, if there's going to be a redirect that makes it necessary to go through that very same process again anyway.

  • Like 2
Link to comment
Share on other sites

You rock! 

My preference is to have all traffic go to the non-www URL, so I've taken the custom rule I added at the end and moved it to the section you pointed out in your response: 

The important thing is the position of this rule in the .htaccess file; you'll want to add your custom rule to similar position.

Now I guess I'll just keep watching it to see if the ugly URLs start to dwindle. 

Many many thanks!

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...