Publishers Target Common Crawl In Fight Over AI Training Data | WIRED
peter.suber's bookmarks 2024-06-15
Summary:
"Danish media outlets have demanded that the nonprofit web archive Common Crawl remove copies of their articles from past data sets and stop crawling their websites immediately. This request was issued amid growing outrage over how artificial intelligence companies like OpenAI are using copyrighted materials.
Common Crawl plans to comply with the request, first issued on Monday. Executive director Rich Skrenta says the organization is “not equipped” to fight media companies and publishers in court...."