Jump to content

Recommended Posts

Posted

Hi @ryan,

This new version of WireRequestBlocker has a breaking change relative to the previous version in that it now requires PHP >= v8, due to the use of str_starts_with().

Because pro modules are not upgradable via the PW admin users don't see notices about requirements before upgrading (and the PHP 8 requirement isn't stated in getModuleInfo() in any case). Could you please highlight the PHP 8 requirement somehow, or change the code so it has the same requirements as previous versions of the module?

Thanks.

  • Like 1
Posted
Quote

[...]especially from the latest breed of AI bots that have an endless appetite for collecting training data.

Hi @ryan,

maybe I'm misreading this. But actually you would want bots to collect training data for PW, especially for the API reference part. This website does not publish content that is protected IP. It offers information that aims to attract developers and decision makers to use PW for their business.

IMHO, blocking these bots is contra-productive. You are cutting yourself off from a growing number of developers that build projects with AI tools to boost their productivity. In the near future we probably will not be able to compete if we do not use these tools.

The more accurate training data and context these AI assistants have for PW, the better they can perform and produce actually usable, production ready code.

I would give the current approach of blocking these bots a second thought.

 

  • Like 1
Posted

Hi @gebeer

While I understand your concern about blocking AI bots but what I get from Ryan's post is that he doesn't completely cut off AI bots. It is because they come too often. He just want to limit their visit rate. I think it is ok because I don't think the document part changes every few seconds.

Gideon

  • Like 4
Posted

@gebeer Throttling is what enables us to allow the AI bots, rather than having to block them for taking over the sites resources. So long as the bots adhere to the rules established in the robots.txt they'll never get throttled. But if they ignore the crawl delay, then those requests get throttled with a 429 error. We even include a retry-after header telling them when they can try again. I used to have to block these bots outright in order to preserve the resources for you and me. Now they can crawl as much as they like, so long as they follow the speed limit. The throttle feature provides a way to enforce the speed limit.

  • Like 5
Posted
2 hours ago, ryan said:

@gebeer Throttling is what enables us to allow the AI bots, rather than having to block them for taking over the sites resources. So long as the bots adhere to the rules established in the robots.txt they'll never get throttled. But if they ignore the crawl delay, then those requests get throttled with a 429 error. We even include a retry-after header telling them when they can try again. I used to have to block these bots outright in order to preserve the resources for you and me. Now they can crawl as much as they like, so long as they follow the speed limit. The throttle feature provides a way to enforce the speed limit.

Ryan, thank you for clarifying. This totally makes sense now :-)

  • Like 4
Posted

This is awesome timing. Our hosting service only allots a set number of processes per customer, and due to bots we have been getting throttled and web requests were being delayed or outright refused due to too many requests being handled. Our overall traffic is, as reported by our host, about 55% bot requests!

  • Like 1
Posted

@BrendonKoz Great! Please let me know how it works for you. Any sense of which bots are causing the most trouble? The next thing I plan to build for WireRequestBlocker is a user agent counter/profiler, so that it's easier to identify problematic bots. That way you can throttle them specifically rather than throttling as general traffic. 

  • Thanks 1
Posted
2 hours ago, ryan said:

@BrendonKoz Great! Please let me know how it works for you. Any sense of which bots are causing the most trouble? The next thing I plan to build for WireRequestBlocker is a user agent counter/profiler, so that it's easier to identify problematic bots. That way you can throttle them specifically rather than throttling as general traffic. 

I hope to give it a try tomorrow, but if I can't get to it, the first chance I'll have is next week. That said, I will definitely let you know!

From a cursory search with recent logs, the following bots were problematic:

  • Bingbot (Microsoft, USA)
  • Bytespyder (ByteDance, so TikTok, China)
  • MJ12bot (Majestic, SEO Tool, UK)
  • AhrefsBot (Ahrefs, SEO Tool, USA)
  • PetalBot (Petal Search Engine; China)
  • CensysInspect (Internet Vulnerability Scanner, USA -- I think this is being abused and used as an attempted attack vector on our site, but they say it abides by crawl delay)

I honestly did not realize there was/is a crawl speed directive for robots.txt (that some bots follow). I would've implemented that a long time ago. I do intend to implement ProCache at some point as well but this will be a very nice intermediary.

  • Like 2
Posted

@BrendonKoz I've got all those buts in our list as well, except for Bingbot. As far as I can tell, Bingbot follows the crawl delay, so is one of the good ones. 

  • Like 2
Posted

As the module name has changed, is there any recommended way to upgrade from the prior module? The ProcessWireUpgrade module doesn't seem to notice there's an update to the WireRequestBlocker, but I'm thinking they'd share the same folder name on the physical server, but if they have a different database record, any custom settings may not transfer?

Posted

@BrendonKoz it should just be a matter of replacing the module files with the new ones. Then do a modules refresh. Then go to the module config page to setup throttling features. It should install the new ProcessRequestBlocker module automatically, which will appear on the Setup top nav menu.

  • Like 2
Posted

@Robin S I didn't intend for it to require PHP 8. I mistakenly was thinking str_contains and str_starts_with came in PHP 7.x. I've updated the download so that it replaces those function usages with strpos(). 

  • Like 2
Posted

The module is installed and running. Will report back on statistical findings after letting it run for about a week. The upgrade folder replacement went fine and did take up the settings from the database, so I didn't have to merge anything. The version info from the prior version is still reporting as the most up-to-date in ProcessWire Upgrade though (so searching for new versions won't show this version as available).

Since I didn't get the column header in the cropped photo below, the 0.0.4 is currently installed, 0.0.2 is latest version (as reported).

image.thumb.png.69c126bbc4061f0becbd0a0e270b00e8.png

Posted

Anthropic seems to be a little extra greedy! It seems as though the throttler appears to be working! We've still had some outages of our website as reported by UptimeMonitor, but since making a few changes and adding this module, it's only happened once; we were experiencing it multiple times per day, a few days per week. It can't all be attributed to this module, but I'm certain it's helped!

image.thumb.png.6d1cdde99e5d616b1a9bacfe5eb1d0e8.png

It would be nice eye-candy if there were a small animation to "compress" (or scroll up) the grouping of active throttles for their removal from the list. It's quite jarring when the updates occur in rapid succession and the prior entries simply disappear and get replaced. I thought the time-to-display was coordinated with the time-to-block, but I just recently witnessed (via Firefox) a fairly rapid succession of updates, and each update caused the prior list/time-report to disappear. The below animation, although it repeats, is in realtime. The one I witnessed was a little faster. (Maybe a memory leak? I've had this open for about about 4 hours now.)

s6savF0.gif

Posted
On 9/24/2025 at 6:40 AM, adrian said:

I can no longer log into the modules directory. I don't get any notification as to why either, but I do see this JS error (which might be unrelated).

I started experiencing the same thing a week or so ago and emailed @ryan about it. He wasn't able to reproduce the problem, but the only way I was able to log in to the modules directory was via the backend login form (best if I don't post the URL for that publicly).

Now I'm getting reports of a similar issue on one of my own sites that uses LoginRegisterPro, but I'm not able to reproduce it when I try the LRP login form. My suspicion is that it's a problem with CSRF validation, as if I deliberately tamper with the CSRF token I get the same behaviour of the form reloading with no failure notification. This makes me think a couple of things:

1. LoginRegisterPro should give the user some feedback in this situation, if only to help the developer diagnose the cause.

2. PW should log CSRF validation failures, or at least have an option to log them.

Posted

Thanks @Robin S - I tried logging in via the backend login form. I couldn't, but I managed to use the password reset which worked, but then I still couldn't login with that new password via the backend or frontend forms still.

Posted
2 minutes ago, adrian said:

but then I still couldn't login with that new password via the backend or frontend forms still.

If the issue is incorrect password then LRP normally shows a failure message, but I'm not seeing any notice at all, the form just reloads. FYI, for the backend login form you have to substitute the @ in your email address with an underscore.

Posted

I don't get a notice from LRP either. For the backend, I tried both a dash and now an underscore and neither work. 

I did manage to login via the backend after a few attempts.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...