...
- "Whack-a-mole". AI bots that are slowing down a service or site are identified when service begins to degrade, and their source IP ranges are blocked. This measure is only temporarily effective, as many harvesters simply spawn new bots on new source subnets. This approach also requires substantial manual intervention, to identify the bots and subnets, implement the block, and verify the results.
- Geographical blacklisting. Entire regions that have been identified as the source of abnormally high bot traffic are blocked entirely, if their IP subnets can be identified. This carries the risk of blocking legitimate traffic, and generally is only temporarily effective, and does not stop all harvesting.
- Throttling. Harvesters are not blocked, but limits are put in place to reduce the number of requests that can be made in a given time frame. Throttling is often implemented at the level of subnets, in order to target bot networks specifically. This measure also has the risk of slowing down or temporarily blocking legitimate users.
- Web Application Firewall (WAF) products. Institutions contract with vendors to provide request filtering in front of their websites.
- Increase hardware/infrastructure resources. This approach welcomes AI harvesting, and scales up server CPU, memory, and bandwidth to reduce strain on the systems from increased traffic. This approach is mostly useful where the institution's policy encourages unfettered access to its collections, and there is the money, infrastructure, and expertise to quickly scale up computing resources to meet demand. There is the risk that increasing resources only increases demand, leading perpetual resource starvation.
Useful Links
Service providers and products that filter AI harvester traffic
- Cisco AI Defense: a suite of applications designed to manage both internal and external AI traffic. Mostly suitable for entities deploying other Cisco products in their infrastructure.
- AWS Web Application Firewall
- Cloudflare
- F5 Big-IP Application Security Manager
- Imperva Web Application Firewall
- Siteimprove
- Traefik (with traffic analysis plugin)
Community projects
- BotChallengePage (Ruby-on-Rails plugin, wrapping Cloudflare Turnstile)
https://bibwild.wordpress.com/2025/01/16/using-cloudflare-turnstile-to-protect-certain-pages-on-a-rails-app/ - Code4Lib Blocking Bots wiki page: https://wiki.code4lib.org/Blocking_Bots
- Anubis Proof-of-work gateway (now also at https://anubis.techaro.lol/)
Other Resources
Good description of the problem and attempts at remediation from a technical perspective: https://go-to-hellman.blogspot.com/2025/03/ai-bots-are-destroying-open-access.html
Code4Lib Slack channel on bots: https://code4lib.slack.com/archives/C074PDZQX4G
...
https://shelf.io/blog/metadata-unlocks-ais-superpowers/
https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/
A general guide to preventing web scraping, with a good discussion of do's and don'ts: https://github.com/JonasCz/How-To-Prevent-Scraping/blob/master/README.md