AI bots are targeting memory institutions (galleries, libraries, archives and museums) to crawl their digital collections and retrieve their content for large language model training. In many cases, the number of bots making requests, the number of requests being made by a bot in a short period of time, and the quantity of information returned in response all together are overwhelming the networking, CPU, and memory capacities of the platforms where the digital resources are hosted. This results in what is, in effect, the equivalent of a denial-of-service attack on the platforms: user experience degrades, making query and response times unacceptably slow, or crashing the systems altogether.
Some considerations that complicate the problem:
These were harvested from discussions on zoom and may find a better home on the document once the page develops but for now. this is a dumping ground for useful links on the subject
https://creativecommons.org/2024/08/23/six-insights-on-preference-signals-for-ai-training/
https://github.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/blob/master/MANUAL-CONFIGURATION.md
https://shelf.io/blog/metadata-unlocks-ais-superpowers/