Jump to content

New blog: Throttling AI bot traffic in ProcessWire


ryan
 Share

Recommended Posts

Hi @ryan,

This new version of WireRequestBlocker has a breaking change relative to the previous version in that it now requires PHP >= v8, due to the use of str_starts_with().

Because pro modules are not upgradable via the PW admin users don't see notices about requirements before upgrading (and the PHP 8 requirement isn't stated in getModuleInfo() in any case). Could you please highlight the PHP 8 requirement somehow, or change the code so it has the same requirements as previous versions of the module?

Thanks.

  • Like 1
Link to comment
Share on other sites

Quote

[...]especially from the latest breed of AI bots that have an endless appetite for collecting training data.

Hi @ryan,

maybe I'm misreading this. But actually you would want bots to collect training data for PW, especially for the API reference part. This website does not publish content that is protected IP. It offers information that aims to attract developers and decision makers to use PW for their business.

IMHO, blocking these bots is contra-productive. You are cutting yourself off from a growing number of developers that build projects with AI tools to boost their productivity. In the near future we probably will not be able to compete if we do not use these tools.

The more accurate training data and context these AI assistants have for PW, the better they can perform and produce actually usable, production ready code.

I would give the current approach of blocking these bots a second thought.

 

  • Like 1
Link to comment
Share on other sites

@gebeer Throttling is what enables us to allow the AI bots, rather than having to block them for taking over the sites resources. So long as the bots adhere to the rules established in the robots.txt they'll never get throttled. But if they ignore the crawl delay, then those requests get throttled with a 429 error. We even include a retry-after header telling them when they can try again. I used to have to block these bots outright in order to preserve the resources for you and me. Now they can crawl as much as they like, so long as they follow the speed limit. The throttle feature provides a way to enforce the speed limit.

  • Like 4
Link to comment
Share on other sites

2 hours ago, ryan said:

@gebeer Throttling is what enables us to allow the AI bots, rather than having to block them for taking over the sites resources. So long as the bots adhere to the rules established in the robots.txt they'll never get throttled. But if they ignore the crawl delay, then those requests get throttled with a 429 error. We even include a retry-after header telling them when they can try again. I used to have to block these bots outright in order to preserve the resources for you and me. Now they can crawl as much as they like, so long as they follow the speed limit. The throttle feature provides a way to enforce the speed limit.

Ryan, thank you for clarifying. This totally makes sense now :-)

  • Like 4
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...