Jump to content

Unusual URL path showing up in Google Analytics - anyone else seen this?


daniel-not-dan
 Share

Recommended Posts

I was looking over my Analytics data for my site the other day, and I noticed some of the pages people are landing on have a strange relative url that's not part of my url scheme.

Site: http://thesharktankproducts.com

Example url: thesharktankproducts.com/categories/food

Weird url showing up in Google analytics: thesharktankproducts.com/index.php?it=categories/food

Both urls are showing up in Analytics, so it's not a question of the correct one not showing up.

I have deduced that almost all the traffic to these weird urls is coming from Yahoo and Bing, so it may be something I need to fix with them. But, still, it's weird that this url A) works and B) somehow got into the search page results at Yahoo and Bing.

Has anyone seen this before? Should I set Redirects for the pages this is happening on? Is there another way inside PW to fix this?

Thanks!

Link to comment
Share on other sites

That GET param ("it") is what ProcessWire internally uses; a request to example.com/categories/food/ gets passed from .htaccess to index.php as example.com/index.php?it=categories/food. This explains why these work, but not why they're showing up in your analytics data :)

So far my best guess would be that, for some reason, sometimes an error happens somewhere, causing such URLs to become visible to visitors, including robots. I did notice that Google has indexed some of these URLs, so it's actually possible that this error doesn't even happen anymore. Google is quite good at holding on to indexed pages, even if there are no more links to those pages :)

If you want to debug this further, the first question would be if you have any custom rules in your .htaccess or anything like that. Something that could clash with existing rules there? I'd also suggest taking a look at your error log files (Apache and PW) to see if anything weird shows up there related to these URLs.

  • Like 1
Link to comment
Share on other sites

Thanks for the speedy response! 

I have made a few changes to my htaccess. Last year, Analytics was telling me I had redundant hostnames so I added a bit at the end of my htaccess file that I found somewhere, and that seemed to resolve the issue.

However, now I'm seeing a very definite correlation between when Analytics says the redundant hostnames issue was resolved, and when the traffic to the index.php?it= pages started happening! There was a big spike in traffic to those URLs starting around 11/22/14 (and lasting a few weeks) and then again around 2/21/15 (and still on-going). Google Analytics detected redundant hostnames had been resolved on 11/25/14 and again on 2/24/15. Definitely seems like a correlation there. I'm pretty sure I added the extra bit to the htaccess at one of those two times, but can't be sure which.

Here's the bit I added to htaccess (I wish I'd noted where I'd stolen it from, but alas I didn't):

RewriteCond %{HTTP_HOST} ^www\.(.+) [NC]
RewriteRule ^(.*) http://%1/$1 [R=301,NE,L]

I included it right after this:

#################################################################################################
# END PROCESSWIRE HTACCESS DIRECTIVES
#################################################################################################

This would seem to be the likely culprit, I would imagine, right?

There weren't any errors in the PW logs about those URLs, and... I'm not sure where to find my Apache logs!

In the meantime, I've also created <link rel='canonical' href='domain.com/path/to/url' /> tags in my headers so hopefully the search engines start weeding out those ugly URLs.

Link to comment
Share on other sites

Yeah, that explains it perfectly. In fact, if I open www.thesharktankproducts.com/products/ I end up at index.php?it=products/. The problem is that you're doing your own custom redirect after ProcessWire has already rewritten the URL to index.php?it=some-url  :)

You might want to take a look at ProcessWire's default .htaccess rules. There's similar www redirect there, though it works the other way around: non-www URLs are prefixed with www. The important thing is the position of this rule in the .htaccess file; you'll want to add your custom rule to similar position.

Generally speaking, though, redirects like that should be placed as early as possible in the .htaccess file, since that's the most efficient way: you don't want to parse any extra rules, if there's going to be a redirect that makes it necessary to go through that very same process again anyway.

  • Like 2
Link to comment
Share on other sites

You rock! 

My preference is to have all traffic go to the non-www URL, so I've taken the custom rule I added at the end and moved it to the section you pointed out in your response: 

The important thing is the position of this rule in the .htaccess file; you'll want to add your custom rule to similar position.

Now I guess I'll just keep watching it to see if the ugly URLs start to dwindle. 

Many many thanks!

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Similar Content

    • By Liam88
      Hi,

      After years of just playing around with Processwire I have asked 3 q's in the same week. It's all about working with forms, parameters etc and so I'm hoping this ordeal is nearly over!
      I currently have a checkbox filter:
      <form id="abFilter" method="get" role="form" action="'.$page->url().'"> <div class="list-group"> <h3>Content Type</h3>'; $cont = $fields->get('ab_content'); $contents = $cont->type->getOptions($cont); foreach($contents as $ab_cont){ echo' <div class="list-group-item checkbox"> <input type="checkbox" class="" id="'.$ab_cont->title.'" name="content" value="'.$ab_cont->title.'"'; if (in_array($ab_cont->title, $contArray)){ echo "checked"; } echo'> <label for="'.$ab_cont->title.'">'.$ab_cont->title.'</label> </div>'; } echo' </div>'; //end of filter 1 //start of filter 2 echo' <div class="list-group"> <h3>Channels</h3>'; $chan = $fields->get('ab_channels'); $channel = $chan->type->getOptions($chan); foreach($channel as $ab_chan){ echo' <div class="list-group-item checkbox"> <input type="checkbox" class="" id="'.$ab_chan->title.'" name="channel" value="'.$ab_chan->title.'"'; if (in_array($ab_chan->title, $chanArray)){ echo "checked"; } echo'> <label for="'.$ab_chan->title.'">'.$ab_chan->title.'</label> </div>'; } echo' </div>'; ?> <button id="select">Get Checked Checkboxes</button> </form><!-- end of form --> I also have a piece of script which selects all the checkboxes and then outputs them into readable parameters for the URL which then passes into the $inputs. The reason for the script is to not have duplicate filters like ?ab=1&ab=2 and the script changes it to ab=1_2 which on the input gets exploded into an array. 
      document.querySelector("form").onsubmit=ev=>{ ev.preventDefault(); let o={}; ev.target.querySelectorAll("[name]:checked").forEach(el=>{ (o[el.name]=o[el.name]||[]).push(el.value)}) console.log(location.pathname+"?"+Object.entries(o).map(([v,f])=>v+"="+f.join("_")).join("&")); document.location.href = location.pathname+"?"+Object.entries(o).map(([v,f])=>v+"="+f.join("_")).join("&"); } Here is $inputs and so on on the page:
      //Default selector to get ALL products $baseSelector = "template='adbank_pages',sort=published,include=all,status!=hidden,limit=2"; $selector = "template='adbank_pages',sort=published,include=all,status!=hidden,limit=2"; $input->whitelist('channel',explode("_", $channel)); // Use this to append to the $items filter if($channel){ $chanArray = explode("_", $channel); $chan = $channel = str_replace('_', '|', $channel); $selector = $selector .= ",ab_channels=$chan"; } $test = $pages->find($selector); // This is just testing if the $selector choise returns and if not use page filter without filters. if(count($test) > 0){ $items = $pages->find($selector); // $items with the parameter filter // Example - "template='adbank_pages',sort=published,include=all,status!=hidden,limit=2,ab_channels=facebook-ads" // Example (multi choice) - "template='adbank_pages',sort=published,include=all,status!=hidden,limit=2,ab_channels=facebook-ads|instagram-ads" // Example (with other filters) - "template='adbank_pages',sort=published,include=all,status!=hidden,limit=2,ab_channels=facebook-ads,ab_content=video|static" }else{ $items = $pages->find($baseSelector); // Example - "template='adbank_pages',sort=published,include=all,status!=hidden,limit=2" } $total = $items->getTotal(); I have stripped out a few of the other filters from the above to try keep it a little more concise (haha). Now I appreciate the post may be long but here we are at the end!
      The URL I get on page 1 of the filter results would look like: example.com/blog/?channel=facebook-ads_instagram-ads
      If I click page 2 the url changes to - example.com/blog/page2/?channel=
      If I then click back to page 1 it changes to - example.com/blog/?channel=
      So I'm hoping you can see my problem and hoping someone can assist. I need to work out how to keep the parameters in the url but also if I remove that filter for that parameter to remove.
      This whole process works without pagination but with pagination it has a different behaviour.
      Thank you in advance
    • By humanafterall
      Hi,
      I have a URL field that will sometimes have relative/local URLs on a multilingual site, for example /contact/ 

      However the URL field does not seem to pick up when I'm on another language, for example /fr/ so I'm taken to the default language page for /contact/ rather than /fr/contact/
      Is there a way to make the URL fields play well with a multi-language site?
      Thanks!
       
    • By Robin S
      A new module that hasn't had a lot of testing yet. Please do your own testing before deploying on any production website.
      Custom Paths
      Allows any page to have a custom path/URL.
      Note: Custom Paths is incompatible with the core LanguageSupportPageNames module. I have no experience working with LanguageSupportPageNames or multi-language sites in general so I'm not in a position to work out if a fix is possible. If anyone with multi-language experience can contribute a fix it would be much appreciated!
      Screenshot

      Usage
      The module creates a field named custom_path on install. Add the custom_path field to the template of any page you want to set a custom path for. Whatever path is entered into this field determines the path and URL of the page ($page->path and $page->url). Page numbers and URL segments are supported if these are enabled for the template, and previous custom paths are managed by PagePathHistory if that module is installed.
      The custom_path field appears on the Settings tab in Page Edit by default but there is an option in the module configuration to disable this if you want to position the field among the other template fields.
      If the custom_path field is populated for a page it should be a path that is relative to the site root and that starts with a forward slash. The module prevents the same custom path being set for more than one page.
      The custom_path value takes precedence over any ProcessWire path. You can even override the Home page by setting a custom path of "/" for a page.
      It is highly recommended to set access controls on the custom_path field so that only privileged roles can edit it: superuser-only is recommended.
      It is up to the user to set and maintain suitable custom paths for any pages where the module is in use. Make sure your custom paths are compatible with ProcessWire's $config and .htaccess settings, and if you are basing the custom path on the names of parent pages you will probably want to have a strategy for updating custom paths if parent pages are renamed or moved.
      Example hooks to Pages::saveReady
      You might want to use a Pages::saveReady hook to automatically set the custom path for some pages. Below are a couple of examples.
      1. In this example the start of the custom path is fixed but the end of the path will update dynamically according to the name of the page:
      $pages->addHookAfter('saveReady', function(HookEvent $event) { $page = $event->arguments(0); if($page->template == 'my_template') { $page->custom_path = "/some-custom/path-segments/$page->name/"; } }); 2. The Custom Paths module adds a new Page::realPath method/property that can be used to get the "real" ProcessWire path to a page that might have a custom path set. In this example the custom path for news items is derived from the real ProcessWire path but a parent named "news-items" is removed:
      $pages->addHookAfter('saveReady', function(HookEvent $event) { $page = $event->arguments(0); if($page->template == 'news_item') { $page->custom_path = str_replace('/news-items/', '/', $page->realPath); } }); Caveats
      The custom paths will be used automatically for links created in CKEditor fields, but if you have the "link abstraction" option enabled for CKEditor fields (Details > Markup/HTML (Content Type) > HTML Options) then you will see notices from MarkupQA warning you that it is unable to resolve the links.
      Installation
      Install the Custom Paths module.
      Uninstallation
      The custom_path field is not automatically deleted when the module is uninstalled. You can delete it manually if the field is no longer needed.
       
      https://github.com/Toutouwai/CustomPaths
      https://modules.processwire.com/modules/custom-paths/
    • By Craig
      I've been using Fathom Analytics for a while now and on a growing number of sites, so thought it was about time there was a PW module for it.
      WayFathomAnalytics
      WayFathomAnalytics is a group of modules which will allow you to view your Fathom Analytics dashboard in the PW admin panel and (optionally) automatically add and configure the tracking code on front-end pages.
      Links
      GitHub Readme & documentation Download Zip Modules directory Module settings screenshot What is Fathom Analytics?
      Fathom Analytics is a simple, privacy-focused website analytics tool for bloggers and businesses.

      Stop scrolling through pages of reports and collecting gobs of personal data about your visitors, both of which you probably don't need. Fathom is a simple and private website analytics platform that lets you focus on what's important: your business.
      Privacy focused Fast-loading dashboards, all data is on a single screen Easy to get what you need, no training required Unlimited email reports Private or public dashboard sharing Cookie notices not required (it doesn't use cookies or collect personal data) Displays: top content, top referrers, top goals and more
    • By AndZyk
      Hello,
      I am currently building a intranet which will be hosted on the local network of a company. This intranet has many links to files on their fileserver with the protocol file://.
      So for example the links look like this file://domain.tld/filename.ext
      When I try to insert such a link into a URL field, I get the error, that only the protocol http:// is allowed. When I try to insert such a link into a CKEeditor link, it gets stripped out. Is it possible to insert such links into the FieldType URL and CKEditor.
      I know that I could use a FieldType Text or insert a RewriteRule in the .htaccess file, but I am looking for a more elegant solution. 😉
      Regards, Andreas
×
×
  • Create New...