bernhard Posted April 23, 2015 Share Posted April 23, 2015 hi everyone! my google analytics for my personal website looks like this: you see two things: i have very few visitors 90% of them are spam! on this site, i tried using htaccess blocking like this: http://blog.raventools.com/stop-referrer-spam/ but as you can see: no success of course i have also activated the analytics-built-in option to filter known bots and spiders: also useless. only way that seems to work is to manually create filters in analytics. but that's a lot of stupid work if you manage more websites... and with every single new spambot you have to update ALL your websites one by one. that feels so ancient! how do you guys handle this? wouldn't a module be great that has a global library of known bots that we could update with one click? maybe with an option to use the global list of bots and explicitly allow some of them, if there is a need? thanks PS: for search indexing some other keywords: semalt, darodar, buttons-for-website Link to comment Share on other sites More sharing options...
arjen Posted April 23, 2015 Share Posted April 23, 2015 Blocking using .htacess only stops those bots who are actually visiting your site. Most of these bots hijack your Analytics ID and won't even visit your website. Therefore they are still showing up. The only way I know to block them completely is to create a filter in GA *and* use .htaccess to make sure your data is more reliable. It seems to get more worse and worse. New ones are showing up every week. Google should update their list more often Piwik is sharing a list of these spammers so you easily add these to your .htaccess and files. 4 Link to comment Share on other sites More sharing options...
bernhard Posted April 23, 2015 Author Share Posted April 23, 2015 thanks arjen! is it possible to set google analytics filters via API? that would be great. you could then automate this process using piwiks referrer spam list and update all your analytics IDs on the fly. maybe a very useful update for nicos analytics module? how can i invite nico to join this discussion? is there anything like @Nico Knoll ? or do i have to PM him? 1 Link to comment Share on other sites More sharing options...
arjen Posted April 23, 2015 Share Posted April 23, 2015 Sent a PM to Nico. But does he have an Analytics module? Do you mean this one by wanze? I have thought about a module too, but this might be broader than ProcessWire. There are several ways to do this and it also depends on the setup of your GA. Anyhow it seems to be possible. Link to comment Share on other sites More sharing options...
bernhard Posted April 23, 2015 Author Share Posted April 23, 2015 of course i mean wanze! i'll get wanze in here... thanks for the link, makes me confident Link to comment Share on other sites More sharing options...
pwired Posted April 23, 2015 Share Posted April 23, 2015 Depending on your Hoster their CPanel comes with good options to block spam visitors. Link to comment Share on other sites More sharing options...
arjen Posted April 23, 2015 Share Posted April 23, 2015 But what if they don't actually visit your website pwired? They steal your analytics ID in a "normal" visit and begin to simulate clicks. They will show up in the GA results unless you filter them out. 1 Link to comment Share on other sites More sharing options...
pwired Posted April 23, 2015 Share Posted April 23, 2015 Hi Arjen, You are right, I browsed through the help and faq base on my hoster. Bots not really visiting your site or shop need to be handled differently. In that case editing your .htaccess will not be sufficient. But before a bot can hijack your GA ID, does the bot not have to visit in a normal way at least 1 time ? How would the bot hijack your GA ID without a single normal visit ? Anyway I post here what I found in the faq base on my hoster, maybe this is helpfull in some way. begin post: ----------------------------------RewriteEngine OnRewriteCond %{HTTP_USER_AGENT} ^(.*)msnbot [NC,OR]RewriteCond %{HTTP_USER_AGENT} ^(.*)MJ12bot [NC,OR]RewriteCond %{HTTP_USER_AGENT} ^(.*)BLEXBot [NC,OR]RewriteCond %{HTTP_USER_AGENT} ^(.*)SolomonoBot [NC,OR]RewriteCond %{HTTP_USER_AGENT} ^(.*)Yandex [NC,OR]RewriteCond %{HTTP_USER_AGENT} ^(.*)bingbot [NC,OR]RewriteCond %{HTTP_USER_AGENT} ^(.*)Baiduspider [NC,OR]RewriteCond %{HTTP_USER_AGENT} ^(.*)Yeti [NC,OR]RewriteCond %{HTTP_USER_AGENT} ^(.*)Mail.Ru [NC,OR]RewriteCond %{HTTP_USER_AGENT} ^(.*)Ezooms [NC,OR]RewriteCond %{HTTP_USER_AGENT} ^(.*)AhrefsBot [NC,OR]RewriteCond %{HTTP_USER_AGENT} ^(.*)exabot [NC]RewriteRule .* - [F] ------------------------------------------------------------- end post You will just need to add/edit .htaccess for the bot that you are having issues with Link to comment Share on other sites More sharing options...
kongondo Posted April 23, 2015 Share Posted April 23, 2015 But what if they don't actually visit your website pwired? They steal your analytics ID in a "normal" visit and begin to simulate clicks. They will show up in the GA results unless you filter them out. But before a bot can hijack your GA ID, does the bot not have to visit in a normal way at least 1 time ? How would the bot hijack your GA ID without a single normal visit ? 1 Link to comment Share on other sites More sharing options...
pwired Posted April 23, 2015 Share Posted April 23, 2015 I thought I made my point clear, obviously not. If a bot has to visit your website at least 1 time in a normal way to be able to hijack your GA ID after that, then the .htaccess block will work. Because of the .htaccess edit, the bot will never be able to make a first normal visit. Unless the bot is not listed in the .htaccess file. There are many lists you can find on the internet with many well known bots. Link to comment Share on other sites More sharing options...
diogo Posted April 23, 2015 Share Posted April 23, 2015 I still don't get it howcome google doesn't filter them. Would be so easy to put a report spam button and do exactly the same as they do with email. Isn't filtering emails more risky than filtering visits for analytics purposes? 1 Link to comment Share on other sites More sharing options...
Wanze Posted April 23, 2015 Share Posted April 23, 2015 Hi Bernhard, is it possible to set google analytics filters via API? that would be great. you could then automate this process using piwiks referrer spam list and update all your analytics IDs on the fly. maybe a very useful update for nicos wanzes analytics module? In its current state, the Google Analytics module does display a uesful subset of the available analytics data. In my opinion, it's not the job of the module to filter out data. I would suggest to create another module which does this job. Or maybe I'm misunderstanding something? Can we filter out those spam entries when querying data with the API? 2 Link to comment Share on other sites More sharing options...
bernhard Posted April 24, 2015 Author Share Posted April 24, 2015 you can filter out unwanted entries BEFORE they are stored in your view or AFTER that (before presentation of data). if you filter them out before storing it, all your reports are free of those unwanted entries and they don't deform indicators like bounce rate and session duration and so on. if you filter them before presentation you would have to apply all those filters on every single report. of course you could do that and you could even save your reports as shortcuts so that you do not have to do this work over and over again, but imagine what happens if you have several shortcut reports and you notice there has a new spider come up... kind of a best practise is to filter data before it gets stored and have one unfiltered view as a backup. @wanze did some research on this but for the moment it's beyond my scope. seems to be not that easy with all the authentication and so on so i thought it would may be a simple and very useful addition to your module. sorry for honoring nico instead of you but you are right - it's not directly what your module was made for so a seperate module would be good... maybe i find the time some day or anybody else what makes me puzzle somehow is that i can't find any howtos or codesamples for this... seems i'm the only one on the web concerning about it ^^ 2 Link to comment Share on other sites More sharing options...
Wanze Posted April 24, 2015 Share Posted April 24, 2015 I have the feeling that this should be handled by google itself, who's interested in spam visitors? 2 Link to comment Share on other sites More sharing options...
bernhard Posted April 24, 2015 Author Share Posted April 24, 2015 that's what diogo said - totally agree! but as you can see i have LOTS of spam traffic on my sites and definitely switched ON googles "remove known bots and spiders" option. what's your experience about spam referrals on analytics. do you have any? what do you do against it? Link to comment Share on other sites More sharing options...
Wanze Posted April 24, 2015 Share Posted April 24, 2015 I'm not up to date with this topic, but I quickly checked some sites.. I'm also seeing those spam entries like "buttons-for-your-website" and "best-seo-offer". Not sure what to do against this though... I'm wating until someone posts an easy solution Btw I don't like the new analytics interface, really had trouble finding the relevant stuff.. 1 Link to comment Share on other sites More sharing options...
arjen Posted April 24, 2015 Share Posted April 24, 2015 I thought I made my point clear, obviously not. If a bot has to visit your website at least 1 time in a normal way to be able to hijack your GA ID after that, then the .htaccess block will work. Because of the .htaccess edit, the bot will never be able to make a first normal visit. Unless the bot is not listed in the .htaccess file. There are many lists you can find on the internet with many well known bots. Th bots are also known not to visit your website since they will guess the Analytics ID. I've read a lot of these stuff and sometimes adding a second ID (xxxxxx-2 instead of xxxxxx-1) will stop them for about a month or so before they try the -2 of -3 or -4 version of your ID. It's all automated and they only way they can be stopped is if Google blocks them. Or if you use a filter or an Advanced Segment to either filter them or segmentize. On language specific websites I tend to segmentize only the country and/or language to view relevant data. Of course I will block out potential data, but the harm coming from these bots is greater. Google should fix this mess. 3 Link to comment Share on other sites More sharing options...
DaveP Posted April 29, 2015 Share Posted April 29, 2015 Just spotted a blog post about this subject. May be of interest/help. 1 Link to comment Share on other sites More sharing options...
bernhard Posted April 30, 2015 Author Share Posted April 30, 2015 hy DaveP, thanks. you are / he is right, but that's not what i want: only way that seems to work is to manually create filters in analytics. but that's a lot of stupid work if you manage more websites... and with every single new spambot you have to update ALL your websites one by one. that feels so ancient! Link to comment Share on other sites More sharing options...
bernhard Posted May 6, 2015 Author Share Posted May 6, 2015 (edited) the best blog-post i found so far: https://megalytic.com/blog/how-to-filter-out-fake-referrals-and-other-google-analytics-spam i found out that segments are REALLY helpful! you can define filters and apply them also to your historical data. it's also very few clicking if you use REGEX like this (taken from https://github.com/piwik/referrer-spam-blacklist/blob/master/spammers.txt): 4webmasters.org|7makemoneyonline.com|acads.net|anal-acrobats.hol.es|anticrawler.org|best-seo-offer.com|best-seo-solution.com|bestwebsitesawards.com|blackhatworth.com|brakehawk.com|buttons-for-website.com|buttons-for-your-website.com|buy-cheap-online.info|darodar.com|econom.co|forum69.info|forum20.smailik.org|free-share-buttons.com|get-free-traffic-now.com|googlsucks.com|hulfingtonpost.com|humanorightswatch.org|ilovevitaly.com|iminent.com|kabbalah-red-bracelets.com|kambasoft.com|makemoneyonline.com|masterseek.com|o-o-6-o-o.com|ok.ru|priceg.com|ranksonic.info|ranksonic.org|savetubevideo.com|semalt.com|sexyteens.hol.es|social-buttons.com|theguardlan.com|webmaster-traffic.com you can then analyze what all the spam-bots are doing on your site: and you can easily switch your filter to EXCLUDE all spam-bots and compare your data: i've not found out how to deal with segments via ga-api. maybe some day i find the time. as a note for myself: https://developers.google.com/analytics/solutions/articles/hello-analytics-api i'll give a more detailed insight on my blog, when it is finished EDIT: it get's even better!! you can share segments and have it available in ALL your properties for ALL your data (also historical)! here is my segment: https://www.google.com/analytics/web/template?uid=ns25vIZpSj2NpRFk371g3Q just visit the link and enjoy spam-free analytics does it work for you? Edited May 6, 2015 by BernhardB 5 Link to comment Share on other sites More sharing options...
Recommended Posts