Go To Hellman: AI bots are destroying Open Access
peter.suber's bookmarks 2025-03-23
Summary:
"Bots on the internet are nothing new, but a sea change has occurred over the past year. For the past 25 years, anyone running a web server knew that the bulk of traffic was one sort of bot or another....The old style bots were rarely a problem. They respected robot exclusions and "nofollow" warnings. The warning helped bots avoid volatile resources and infinite parameter spaces. Even when they ignored exclusions they seemed to be careful about it. They declared their identity in "user-agent" headers. They limited the request rate and number of simultaneous requests to any particular server. Occasionally there would be a malicious bot like a card-tester or a registration spammer. You'd often have to block these based on IP address. It was part of the landscape, not the dominant feature.
The current generation of bots is mindless. They use as many connections as you have room for. If you add capacity, they just ramp up their requests. They use randomly generated user-agent strings. They come from large blocks of IP addresses. They get trapped in endless hallways. I observed one bot asking for 200,000 nofollow redirect links pointing at Onedrive, Google Drive and Dropbox. (which of course didn't work, but Onedrive decided to stop serving our Canadian human users). They use up server resources - one speaker at Code4lib described a bug where software they were running was using 32 bit integers for session identifiers, and it ran out!...
The thing that gets me REALLY mad is how unnecessary this carnage is. Project Gutenberg makes all its content available with one click on a file in its feeds directory. OAPEN makes all its books available via an API. There's no need to make a million requests to get this stuff!! Who (or what) is programming these idiot scraping bots? Have they never heard of a sitemap??? Are they summer interns using ChatGPT to write all their code? Who gave them infinite memory, CPUs and bandwidth to run these monstrosities? (Don't answer.)..."