Jump to content

Google Analytics: Referral Spam


bernhard
 Share

Recommended Posts

hi everyone!

my google analytics for my personal website looks like this:

post-2137-0-39718800-1429774370_thumb.pn

you see two things:

  1. i have very few visitors :D
  2. 90% of them are spam!

on this site, i tried using htaccess blocking like this: http://blog.raventools.com/stop-referrer-spam/ but as you can see: no success :(

of course i have also activated the analytics-built-in option to filter known bots and spiders: also useless.

only way that seems to work is to manually create filters in analytics. but that's a lot of stupid work if you manage more websites... and with every single new spambot you have to update ALL your websites one by one. that feels so ancient!

how do you guys handle this? wouldn't a module be great that has a global library of known bots that we could update with one click? maybe with an option to use the global list of bots and explicitly allow some of them, if there is a need?

thanks :)

PS: for search indexing some other keywords: semalt, darodar, buttons-for-website

Link to comment
Share on other sites

Blocking using .htacess only stops those bots who are actually visiting your site. Most of these bots hijack your Analytics ID and won't even visit your website. Therefore they are still showing up. The only way I know to block them completely is to create a filter in GA *and* use .htaccess to make sure your data is more reliable. It seems to get more worse and worse. New ones are showing up every week. Google should update their list more often  :)

Piwik is sharing a list of these spammers so you easily add these to your .htaccess and files.

  • Like 4
Link to comment
Share on other sites

thanks arjen!

is it possible to set google analytics filters via API? that would be great. you could then automate this process using piwiks referrer spam list and update all your analytics IDs on the fly. maybe a very useful update for nicos analytics module?

how can i invite nico to join this discussion? is there anything like @Nico Knoll ? or do i have to PM him?

  • Like 1
Link to comment
Share on other sites

But what if they don't actually visit your website pwired? They steal your analytics ID in a "normal" visit and begin to simulate clicks. They will show up in the GA results unless you filter them out.

  • Like 1
Link to comment
Share on other sites

Hi Arjen,

You are right, I browsed through the help and faq base on my hoster.

Bots not really visiting your site or shop need to be handled differently.

In that case editing your .htaccess will not be sufficient.

But before a bot can hijack your GA ID, does the bot not have to visit in a normal way

at least 1 time ? How would the bot hijack your GA ID without a single normal visit ?

Anyway I post here what I found in the faq base on my hoster, maybe this is helpfull in some way.

begin post:

----------------------------------

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^(.*)msnbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)MJ12bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)BLEXBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)SolomonoBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Yandex [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)bingbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Baiduspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Yeti [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Mail.Ru [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Ezooms [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)AhrefsBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)exabot [NC]
RewriteRule .* - [F]

-------------------------------------------------------------

end post

You will just need to add/edit .htaccess for the bot that you are having issues with

Link to comment
Share on other sites

But what if they don't actually visit your website pwired? They steal your analytics ID in a "normal" visit and begin to simulate clicks. They will show up in the GA results unless you filter them out.

But before a bot can hijack your GA ID, does the bot not have to visit in a normal way

at least 1 time ? How would the bot hijack your GA ID without a single normal visit ?

  • Like 1
Link to comment
Share on other sites

I thought I made my point clear, obviously not.

If a bot has to visit your website at least 1 time in a normal way to be able to hijack your GA ID after that,

then the .htaccess block will work. Because of the .htaccess edit, the bot will never be able to make a

first normal visit. Unless the bot is not listed in the .htaccess file.

There are many lists you can find on the internet with many well known bots.

Link to comment
Share on other sites

I still don't get it howcome google doesn't filter them. Would be so easy to put a report spam button and do exactly the same as they do with email. Isn't filtering emails more risky than filtering visits for analytics purposes?

  • Like 1
Link to comment
Share on other sites

Hi Bernhard,

is it possible to set google analytics filters via API? that would be great. you could then automate this process using piwiks referrer spam list and update all your analytics IDs on the fly. maybe a very useful update for nicos wanzes :P analytics module?

In its current state, the Google Analytics module does display a uesful subset of the available analytics data. In my opinion, it's not the job of the module to filter out data. I would suggest to create another module which does this job. Or maybe I'm misunderstanding something? Can we filter out those spam entries when querying data with the API?

  • Like 2
Link to comment
Share on other sites

you can filter out unwanted entries BEFORE they are stored in your view or AFTER that (before presentation of data). if you filter them out before storing it, all your reports are free of those unwanted entries and they don't deform indicators like bounce rate and session duration and so on. if you filter them before presentation you would have to apply all those filters on every single report. of course you could do that and you could even save your reports as shortcuts so that you do not have to do this work over and over again, but imagine what happens if you have several shortcut reports and you notice there has a new spider come up...

kind of a best practise is to filter data before it gets stored and have one unfiltered view as a backup.

@wanze

did some research on this but for the moment it's beyond my scope. seems to be not that easy with all the authentication and so on so i thought it would may be a simple and very useful addition to your module. sorry for honoring nico instead of you :D but you are right - it's not directly what your module was made for so a seperate module would be good... maybe i find the time some day :) or anybody else :) what makes me puzzle somehow is that i can't find any howtos or codesamples for this... seems i'm the only one on the web concerning about it ^^

  • Like 2
Link to comment
Share on other sites

that's what diogo said - totally agree! but as you can see i have LOTS of spam traffic on my sites and definitely switched ON googles "remove known bots and spiders" option.

what's your experience about spam referrals on analytics. do you have any? what do you do against it?

Link to comment
Share on other sites

I'm not up to date with this topic, but I quickly checked some sites.. I'm also seeing those spam entries like "buttons-for-your-website" and "best-seo-offer".

Not sure what to do against this though... I'm wating until someone posts an easy solution :P Btw I don't like the new analytics interface, really had trouble finding the relevant stuff..

  • Like 1
Link to comment
Share on other sites

I thought I made my point clear, obviously not.

If a bot has to visit your website at least 1 time in a normal way to be able to hijack your GA ID after that,

then the .htaccess block will work. Because of the .htaccess edit, the bot will never be able to make a

first normal visit. Unless the bot is not listed in the .htaccess file.

There are many lists you can find on the internet with many well known bots.

Th bots are also known not to visit your website since they will guess the Analytics ID. I've read a lot of these stuff and sometimes adding a second ID (xxxxxx-2 instead of xxxxxx-1) will stop them for about a month or so before they try the -2 of -3 or -4 version of your ID. It's all automated and they only way they can be stopped is if Google blocks them. Or if you use a filter or an Advanced Segment to either filter them or segmentize. On language specific websites I tend to segmentize only the country and/or language to view relevant data. Of course I will block out potential data, but the harm coming from these bots is greater. Google should fix this mess.

  • Like 3
Link to comment
Share on other sites

hy DaveP,

thanks. you are / he is right, but that's not what i want:

only way that seems to work is to manually create filters in analytics. but that's a lot of stupid work if you manage more websites... and with every single new spambot you have to update ALL your websites one by one. that feels so ancient!

Link to comment
Share on other sites

the best blog-post i found so far: https://megalytic.com/blog/how-to-filter-out-fake-referrals-and-other-google-analytics-spam

i found out that segments are REALLY helpful! you can define filters and apply them also to your historical data. it's also very few clicking if you use REGEX like this (taken from https://github.com/piwik/referrer-spam-blacklist/blob/master/spammers.txt):

4webmasters.org|7makemoneyonline.com|acads.net|anal-acrobats.hol.es|anticrawler.org|best-seo-offer.com|best-seo-solution.com|bestwebsitesawards.com|blackhatworth.com|brakehawk.com|buttons-for-website.com|buttons-for-your-website.com|buy-cheap-online.info|darodar.com|econom.co|forum69.info|forum20.smailik.org|free-share-buttons.com|get-free-traffic-now.com|googlsucks.com|hulfingtonpost.com|humanorightswatch.org|ilovevitaly.com|iminent.com|kabbalah-red-bracelets.com|kambasoft.com|makemoneyonline.com|masterseek.com|o-o-6-o-o.com|ok.ru|priceg.com|ranksonic.info|ranksonic.org|savetubevideo.com|semalt.com|sexyteens.hol.es|social-buttons.com|theguardlan.com|webmaster-traffic.com

you can then analyze what all the spam-bots are doing on your site:

post-2137-0-63312400-1430920386_thumb.pn

and you can easily switch your filter to EXCLUDE all spam-bots and compare your data:

post-2137-0-42129300-1430920522_thumb.pn

i've not found out how to deal with segments via ga-api. maybe some day i find the time. as a note for myself: https://developers.google.com/analytics/solutions/articles/hello-analytics-api

i'll give a more detailed insight on my blog, when it is finished :)

EDIT: it get's even better!!

you can share segments and have it available in ALL your properties for ALL your data (also historical)! here is my segment: https://www.google.com/analytics/web/template?uid=ns25vIZpSj2NpRFk371g3Q

just visit the link and enjoy spam-free analytics :) does it work for you?

Edited by BernhardB
  • Like 5
Link to comment
Share on other sites

 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...