Here’s Proof You Can Train an AI Model Without Slurping Copyrighted Content | WIRED
peter.suber's bookmarks 2024-03-21
Summary:
"IN 2023, OPENAI told the UK parliament that it was “impossible” to train leading AI models without using copyrighted materials. It’s a popular stance in the AI world, where OpenAI and other leading players have used materials slurped up online to train the models powering chatbots and image generators, triggering a wave of lawsuits alleging copyright infringement.
Two announcements Wednesday offer evidence that large language models can in fact be trained without the permissionless use of copyrighted materials.
A group of researchers backed by the French government have released what is thought to be the largest AI training dataset composed entirely of text that is in the public domain...."