News and Views: How much content can AI legally exploit?
peter.suber's bookmarks 2024-12-14
Summary:
"Open access was envisioned as a way to make scholarly content more portable and adaptable in the digital age. Yet, its application in AI training faces practical challenges.
Most OA licenses, even permissive ones like CC BY, require attribution. However, generative AI models inherently strip attribution from the data they process, making compliance nearly impossible. Specialist AIs might be trained to circumvent this, but the bulk of big-name gen AI tools don’t. Compliance with the most basic OA requirement of attribution is unworkable.
Additionally, while traditional licenses clearly delineate permissible use, OA licenses often depend on interpretations of “non-commercial” or “derivative” use that may vary by jurisdiction.
In contrast, traditional copyright-protected works – often controlled by publishers – can be directly licensed for AI use. Publishers and AI companies are already striking deals, bypassing the complexities of OA compliance....
Whatever the legal details, can AI companies simply license content from publishers?
For copyrighted content where the publisher holds the copyright, yes. Reuse is in the gift of the license holder, and licensing deals are an established part of publishing. Scholarly publishers are now licensing content to tech companies. Once agreed, the licensee can push on with the agreed use. The only challenge here is one of optics, in cases where authors do not support their work being used to train AI.
But the rise of generative AI exposes a digital irony: the very openness that defines open access may hinder its use in one of the most transformative technologies of our time. Meanwhile, traditional “closed” licensing remains a smoother path for AI developers, albeit at a cost. The challenge for publishers and authors is to navigate this paradox, ensuring their work is both protected and impactful in the AI-driven future."