pwired Posted May 8, 2014 Posted May 8, 2014 Hi, One of my clients website has a lot of traffic due to a chinese crawler, every day constantly up to 1000 visits. It´s the agressive Baidu crawler coming from this ip range: 180.76.5.x180.76.6.x Besides the page crawling, it is really nervous to see how many attempts this crawler makes on login.php However this client has no business and no clients in china. He wants me to block this crawler. I tried with robots.txt but this crawler does not seem to respect what is configured in robots.txt I need something more powerfull with a .htaccess file. Has anyone an example .htaccess file how to block this ip range and do I put this in the root public.html ?
Craig Posted May 8, 2014 Posted May 8, 2014 I think something like this would work in your .htaccess: Order Allow,Deny Deny from 180.76.5.0/24 180.76.6.0/24 Allow from all 1
Joss Posted May 8, 2014 Posted May 8, 2014 This might help https://support.krystal.co.uk/entries/24933152-How-to-block-bad-spiders-from-wasting-bandwidth- Also if your client is going via Cloudflare, you can block bots there before they get anywhere near the server. 2
pwired Posted May 8, 2014 Author Posted May 8, 2014 After some fast and hard digging found this: http://searchenginewatch.com/article/2067357/Bye-bye-Crawler-Blocking-the-Parasites 2
OrganizedFellow Posted May 8, 2014 Posted May 8, 2014 @pwired Please let us know which solution works best for you. This will make a good reference post.
Jonathan Lahijani Posted April 18, 2021 Posted April 18, 2021 I'm going to start using this: https://github.com/mitchellkrogza/apache-ultimate-bad-bot-blocker 2
wbmnfktr Posted April 19, 2021 Posted April 19, 2021 In case you are using tools like Xenu, Screamingfrog and some others you might want to remove those from the lists.
Jonathan Lahijani Posted April 19, 2021 Posted April 19, 2021 1 hour ago, wbmnfktr said: In case you are using tools like Xenu, Screamingfrog and some others you might want to remove those from the lists. Indeed. I use Ahrefs for example to crawl my own site. The tool I posted blocks it by default.
dab Posted April 19, 2021 Posted April 19, 2021 I've been using this: 7g firewall with my Processwire sites: https://perishablepress.com/7g-firewall/ Seems to be working effectively. 1
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now