The Public Interest Corpus: Final Report and Path Forward

peter.suber's bookmarks 2026-04-30

Summary:

"We are releasing today the final report of the Public Interest Corpus project. A stable, citable version is here: The Public Interest Corpus: A Framework for Implementation. Because we want to encourage additional feedback, we’re also sharing a version here that allows you to comment.

The report is the product of more than a year of work supported by the Mellon Foundation, in which we asked how research libraries can make books data available for AI training and computational research in ways that serve the public interest, rather than reinforcing the existing concentration of access to texts among a small number of well-resourced commercial actors.

We’ve talked about the starting point of this project before: books are now widely recognized as among the highest-quality training data available for AI systems—they reflect sustained editorial processes and capture historically deep records of human thought across many disciplines and languages. Research libraries hold them at scale, often already in digital form. But access to that data for AI development has so far been shaped largely by which commercial actors have the resources to license, scrape, or otherwise acquire it. Academic researchers and smaller public interest organizations have been mostly outside that conversation."

Link:

https://authorsalliance.substack.com/p/the-public-interest-corpus-final

From feeds:

Open Access Tracking Project (OATP) » peter.suber's bookmarks

Tags:

oa.new oa.ai oa.books oa.libraries oa.authors_alliance oa.mellon_foundation

Date tagged:

04/30/2026, 09:14

Date published:

04/30/2026, 05:14