Recommended Posts

Presentation
Originaly developped by Jeff Starr, Blackhole is a security plugin which trap bad bots, crawlers and spiders in a virtual black hole.
Once the bots (or any virtual user!) visit the black hole page, they are blocked and denied access for your entire site.
This helps to keep nonsense spammers, scrapers, scanners, and other malicious hacking tools away from your site, so you can save precious server resources and bandwith for your good visitors.

 

How It Works
You add a rule to your robots.txt that instructs bots to stay away. Good bots will obey the rule, but bad bots will ignore it and follow the link... right into the black hole trap. Once trapped, bad bots are blocked and denied access to your entire site.


The main benefits of Blackhole include:

Quote

Stops leeches, scanners, and spammers
Saves server resources for humans and good bots
Improves traffic quality and overall site security

 Bots have one chance to obey your site’s robots.txt rules. Failure to comply results in immediate banishment.

 

Features

  • Disable Blackhole for logged in users
  • Optionally redirect all logged-in users
  • Send alert email message
  • Customize email message
  • Choose a custom warning message for bad bots
  • Show a WHOIS Lookup informations
  • Choose a custom blocked message for bad bots
  • Choose a custom HTTP Status Code for blocked bots
  • Choose which bots are whitelisted or not

 
Instructions

  1. Install the module
  2. Create a new page and assign to this page the template "blackhole"
  3. Create a new template file "blackhole.php" and call the module $modules->get('Blackhole')->blackhole();
  4. Add the rule to your robot.txt
  5. Call the module from your home.php template $modules->get('Blackhole')->blackhole();

 Bye bye bad bots!


Downloads

 

Screen

blackhole.gif.8360604767dfcff7430cf4d317a11b94.gif

 


 Enjoy :neckbeard:

Edited by flydev
module directory link
  • Like 15

Share this post


Link to post
Share on other sites

Nice module, thanks for sharing.

I wonder though how effective it really is reading the last two sections "caveat emptor" and "blackhole whitelist":

https://perishablepress.com/blackhole-bad-bots/#blackhole-whitelist

Quote

Whitelisting these user agents ensures that anything claiming to be a major search engine is allowed open access. The downside is that user-agent strings are easily spoofed, so a bad bot could crawl along and say, “Hey look, I’m teh Googlebot!” and the whitelist would grant access. It is possible to verify the true identity of each bot, but doing so consumes significant resources and could overload the server. Avoiding that scenario, the Blackhole errs on the side of caution: it’s better to allow a few spoofs than to block any of the major search engines.

 

  • Like 3

Share this post


Link to post
Share on other sites

To get a "quote" how useful it maybe for a specific site, log all (search bots) user agents for a while. 

  • Like 2

Share this post


Link to post
Share on other sites

@dragan  As @horst said, you can check your logs for bots. There are a small tips in the module admin for that :

 

help.thumb.png.e1025629623879550a91479df6092086.png

 

@Juergen  what say <?php echo ini_get('allow_url_fopen'); ?> ?

Sorry I don't understand what is saying the warning thing in German :lol:

  • Like 2

Share this post


Link to post
Share on other sites

Ok thanks, then probably a firewall issue. Which type of webhosting you are trying the module on ?

  • Like 1

Share this post


Link to post
Share on other sites

Its a shared host.

9 hours ago, flydev said:

what say <?php echo ini_get('allow_url_fopen'); ?> ?

It says true.

For the moment I have disabled this module because the loading time of the page increases significantly.

Share this post


Link to post
Share on other sites
1 hour ago, Juergen said:

For the moment I have disabled this module because the loading time of the page increases significantly.

You can disable the WHOIS lookup in the module's config.

 

whois.thumb.png.6bad2c335a8ab2d189d46a4db0f339b8.png

  • Like 2

Share this post


Link to post
Share on other sites

I have installed it again but now I have only included the module in the blackhole.php (not on the home or other page) only to see if it works. It works now, but the loading time of the page is approx. 21 seconds!!!!

I have added a hidden link in my site to the blackhole.php and if I click on it my IP will be stored in the DAT file - works well. In the mail that I got afterwards there was a hint about a Port problem:

Whois Lookup:

Timed-out connecting to $server (port 43).

I am on a shared host so it seems that this port is not free. The strange thing is that I have disabled the Who is Lookup in my settings of the module

Screenshot(8).png.9ef7ce303425b5bc8ff54ec9cdf2ba76.png

Best regards Jürgen

  • Thanks 1

Share this post


Link to post
Share on other sites

Thanks you @Juergen .

About the port 43, its common that this port is blocked by default and - depending on the hosting provider - can be configured trough the panel provided.

59 minutes ago, Juergen said:

The strange thing is that I have disabled the Who is Lookup in my settings of the module

Will look at it this afternoon as I am deploying this module a on a production site. Stay tuned, thanks again mate.

  • Like 1

Share this post


Link to post
Share on other sites

Module updated to version 1.0.2.

  • The Whois information request is triggered accordingly to the module's option

 

Thanks for the bug report @Juergen :)

 

  • Like 2

Share this post


Link to post
Share on other sites

Works like a charm now! Would be great if the hard coded url of the "contact the administrator" page could be selected out of PW pages.

Thanks for the update!!!

Edit: It would be better if you add multilanguage support to the custom message textareas :)

 

 

Edited by Juergen
  • Like 1

Share this post


Link to post
Share on other sites
2 hours ago, Juergen said:

It would be better if you add multilanguage support to the custom message textareas :)

I will try to do it, I never played with modules and multilanguage ;)

  • Like 1

Share this post


Link to post
Share on other sites
3 hours ago, flydev said:

I will try to do it, I never played with modules and multilanguage

Its not so important, because only bad bots will see it and probably no humans (I hope so). By the way 2 bots from China were caught in the trap - works!!!:)

  • Like 1
  • Haha 1

Share this post


Link to post
Share on other sites

Good and funny !

 

13 hours ago, Juergen said:

because only bad bots will see it and probably no humans

For example, on the site I deployed the module, it is a custom dashboard with sensible informations, I had to take care of hand crafted request which could retrieve data from other users. When this behavior is detected, the user is logged out, the role login-disabled is assigned and then the user is redirected into the blackhole to be banned.

 

public function SecureParks() {
        if($this->input->post->park) {
            $ids = explode('-', $this->sanitizer->pageName($this->input->post->park));
            $userroles = $this->getParkRoles();
            $userhaveright = $this->searchForParkId($ids[2], $userroles);
            if ($userhaveright === null) {
                $this->user->addRole('login-disabled');
                $this->user->save();
                $this->session->logout();
                $this->session->redirect($this->pages->get('/blackhole/')->url); // :)
            }
        }
    }

 

  • Like 2

Share this post


Link to post
Share on other sites

Just a thought:

I think it would be nice to store the banned IPs also in a logfile, so you have them in one place with the other protocols.

Fe:

$log->save('blackhole', 'Banned IP')

You can also add fe a checkbox in the module settings to offer enabling and disabling of this feature.

What do you think? Might this be useful for others too?

  • Like 2

Share this post


Link to post
Share on other sites

Hi @Juergen

I completely agree.  Even better, there will be a Process module to manage/view the blackhole data.

  • Like 1

Share this post


Link to post
Share on other sites

I was also thinking to add a new feature from where we could monitor 302/404 HTTP code and redirect the "guest" into the blackhole.

For example, all those try :

  • /phpMyAdmin/scripts/_setup.php
  • /w00tw00t.at.ISC.SANS.DFind:)
  • /blog/wp-login.php
  • /wp-login.php
  • etc.

will be banned.

I still don't know if I code all the feature or if I should hook into Jumplinks from @Mike Rockett.

Share this post


Link to post
Share on other sites
3 minutes ago, flydev said:
  • /blog/wp-login.php
  • /wp-login.php

I also have a lot of these requests in my 404 logger protocol :(.

I think if there is module that can handle it - use it.  Check if the module is installed first. If not output a message that this feature is only available if Jumplinks is installed.

I dont have Jumplinks installed and I dont know how well it works, but before starting to code from the beginning I would try to use an existing solution first.

Share this post


Link to post
Share on other sites

I use Redirect gone ... in .htaccess

Redirect gone /wp-login.php

for all that stuff. (First I log 404s for a period, than I add those candidates to the .htaccess, before ProcessWires entries!!)

I think it is better to not invoke PW for this stuff, (lesser overhead on the server!), instead use apache custom error page(s).

410_wp-login_php.thumb.jpg.86905f9ab46e4529163d4bc51d3df7e3.jpg

47ms is fast! :)

 

PS: 410 is better than 404, as I also use this for SearchEngineRequests that try to reach URLs that do not exist since 10 years or so. Normally the SEs should flush their cache on 410 returns.

Edited by horst
  • Like 2
  • Thanks 1

Share this post


Link to post
Share on other sites

In all honesty, I think that Jumplinks is better suited to site migrations. Black holes should either be covered by a specifically-built module, or by htaccess/vhost config...

  • Like 1
  • Thanks 1

Share this post


Link to post
Share on other sites

Ok guys, I get what you mean, so what about a module with this flow ?

  1. monitor and log HTTP error code for a period
  2. if an entry / request is superior of N then
  3. backup .htaccess file (versioning it)
  4. add new entries to the .htaccess file

 

Does it make sense or I should let the user manage their .htaccess file manually with a FAQ or something ?

  • Like 2

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By anderson
      Hi all,
      I'm a new to website building. Learned some CRASH course of js,jquery,php. Then I found CMS. Still learning around forum, youtube....
      Anyway, please help me with some beginer questions:
      1, About template - please correct me if I understand wrong : every page should be (or recommended) built on a template. So if in total I'll have 10 pages, 2 of them have same layout, I'll need 9 templates.  And, what fields a page includes, is not defined in page, but defined in the template that page uses. 
      2,  Where to see what modules I've installed? is it in "Modules - Site"? 
      3, I installed "PageTable Extended", then what?  As in a youtube tutorial, it should appear in Setup tab, but it doesn't.  What's in there: templates,fields,logs,comments. (I installed "Uikit 3 site_blog profile".)
      4, I did a search in Processwire website for the famous "repeater matrix" module, and can not find it, there's a Repeater, as well as a Matrix. Is it not a module?
      5, I watched this youtube tuts: https://www.youtube.com/watch?v=IHqnLQy9R1A
      Anybody familiar with this tuts please help: 
      After he analysed a target webpage layout he wanted to mimic, he created some fields, some template, then based on those he created a page and input some "content" in there, then clicked "view", it's just some text. So, here comes my question, he copied a folder "assets" (subfolders are: css,fonts,js,img) over, then the page have the appearance/layout he wanted to mimic. Where does that assets folder come from?
      Appreciate any help.
    • By mtwebit
      I've created a set of modules for importing (manipulating and displaying) data from external resources. A key requirement was to handle large (100k+) number of pages easily.
      Main features
      import data from CSV and XML sources in the background (using Tasker) purge, update or overwrite existing pages using selectors user configurable input <-> field mappings on-the-fly data conversion and composition (e.g. joining CSV columns into a single field) download external resources (files, images) during import handle page references by any (even numeric) fields How it works
      You can upload CSV or XML files to DataSet pages and specify import rules in their description.
      The module imports the content of the file and creates/updates child pages automatically.
      How to use it
      Create a DataSet page that stores the source file. The file's description field specifies how the import should be done:
      After saving the DataSet page an import button should appear below the file description.

      When you start the import the DataSet module creates a task (executed by Tasker) that will import the data in the background.
      You can monitor its execution and check its logs for errors.

      See the module's wiki for more details.
      The module was already used in three projects to import and handle large XML and CSV datasets. It has some rough edges and I'm sure it needs improvement so comments are welcome.
    • By anttila
      We have many booking calendars made with ProcessWire (own databases) and I want to do a web app (SQL) which allows user to log in. First, the user chooses the right calendar and then (s)he have to log in. The user can be from any of those calendars and the app is not running on ProcessWire (it can if necessary). So if there any way to make sure that the user has rights to the calendar (s)he tries to log in and if the password is correct.
      Is there any better way to do this? I could also use PIN codes or something, but those need to be encrypted too.
      Multiple ProcessWires A lot of users per ProcessWire Everyone can log in to the web app (when using right calendar)
    • By dreerr
      TemplateEnginePug (formally TemplateEngineJade)
       
      This module adds Pug templates to the TemplateEngineFactory. It uses https://github.com/pug-php/pug to render templates.
      doctype html html(lang='en') head meta(http-equiv='content-type', content='text/html; charset=utf-8') title= $page->title link(rel='stylesheet', type='text/css', href=$config->urls->templates . 'styles/main.css') body include header.pug h1= $page->title if $page->editable() p: a(href=$page->editURL) Edit Project on GitHub: github.com/dreerr/TemplateEnginePug
      Project in modules directory: modules.processwire.com/modules/template-engine-pug/
       
      For common problems/features/questions about the Factory, use the TemplateEngineFactory thread.
       
    • By Robin S
      Pages At Bottom
      Keeps selected pages at the bottom of their siblings.
      A "bottom page" will stay at the bottom even if it is drag-sorted to a different location or another page is drag-sorted below it (after Page List is refreshed the bottom page will still be at the bottom).
      Newly added sibling pages will not appear below a bottom page.
      The module also prevents the API methods $pages->sort() and $pages->insertAfter() from affecting the position of bottom pages.
      Note: the module only works when the sort setting for children on the parent page/template is "Manual drag-n-drop".
      Why?
      Because you want some pages to always be at the bottom of their siblings for one reason or another. And someone requested it. 🙂
      Usage
      Install the Pages At Bottom module.
      Select one or more pages to keep at the bottom of their siblings. If you select more than one bottom page per parent then their sort order in the page list will be the same as the sort order in the module config.

       
      https://github.com/Toutouwai/PagesAtBottom
      https://modules.processwire.com/modules/pages-at-bottom/