pwired Posted May 8, 2014 Share Posted May 8, 2014 Hi, One of my clients website has a lot of traffic due to a chinese crawler, every day constantly up to 1000 visits. It´s the agressive Baidu crawler coming from this ip range: 180.76.5.x180.76.6.x Besides the page crawling, it is really nervous to see how many attempts this crawler makes on login.php However this client has no business and no clients in china. He wants me to block this crawler. I tried with robots.txt but this crawler does not seem to respect what is configured in robots.txt I need something more powerfull with a .htaccess file. Has anyone an example .htaccess file how to block this ip range and do I put this in the root public.html ? Link to comment Share on other sites More sharing options...
Craig Posted May 8, 2014 Share Posted May 8, 2014 I think something like this would work in your .htaccess: Order Allow,Deny Deny from 180.76.5.0/24 180.76.6.0/24 Allow from all 1 Link to comment Share on other sites More sharing options...
Joss Posted May 8, 2014 Share Posted May 8, 2014 This might help https://support.krystal.co.uk/entries/24933152-How-to-block-bad-spiders-from-wasting-bandwidth- Also if your client is going via Cloudflare, you can block bots there before they get anywhere near the server. 2 Link to comment Share on other sites More sharing options...
pwired Posted May 8, 2014 Author Share Posted May 8, 2014 Thanks for your replies Link to comment Share on other sites More sharing options...
pwired Posted May 8, 2014 Author Share Posted May 8, 2014 After some fast and hard digging found this: http://searchenginewatch.com/article/2067357/Bye-bye-Crawler-Blocking-the-Parasites 2 Link to comment Share on other sites More sharing options...
OrganizedFellow Posted May 8, 2014 Share Posted May 8, 2014 @pwired Please let us know which solution works best for you. This will make a good reference post. Link to comment Share on other sites More sharing options...
Jonathan Lahijani Posted April 18, 2021 Share Posted April 18, 2021 I'm going to start using this: https://github.com/mitchellkrogza/apache-ultimate-bad-bot-blocker 2 Link to comment Share on other sites More sharing options...
wbmnfktr Posted April 19, 2021 Share Posted April 19, 2021 In case you are using tools like Xenu, Screamingfrog and some others you might want to remove those from the lists. Link to comment Share on other sites More sharing options...
Jonathan Lahijani Posted April 19, 2021 Share Posted April 19, 2021 1 hour ago, wbmnfktr said: In case you are using tools like Xenu, Screamingfrog and some others you might want to remove those from the lists. Indeed. I use Ahrefs for example to crawl my own site. The tool I posted blocks it by default. Link to comment Share on other sites More sharing options...
dab Posted April 19, 2021 Share Posted April 19, 2021 I've been using this: 7g firewall with my Processwire sites: https://perishablepress.com/7g-firewall/ Seems to be working effectively. 1 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now