Data-mining reveals that 80% of books published 1924-63 never had their copyrights renewed and are now in the public domain / Boing Boing
peter.suber's bookmarks 2019-08-02
"But there's another source of public domain works: until the 1976 Copyright Act, US works were not copyrighted unless they were registered, and then they quickly became public domain unless that registration was renewed. The problem has been to figure out which of these works were in the public domain, because the US Copyright Office's records were not organized in a way that made it possible to easily cross-check a work with its registration and renewal.
For many years, the Internet Archive has hosted an archive of registration records, which were partially machine-readable.
Enter the New York Public Library, which employed a group of people to encode all these records in XML, making them amenable to automated data-mining.
Now, Leonard Richardson (previously) has done the magic data-mining work to affirmatively determine which of the 1924-63 books are in the public domain, which turns out to be 80% of those books; what's more, many of these books have already been scanned by the Hathi Trust (which uses a limitation in copyright to scan university library holdings for use by educational institutions, regardless of copyright status)...."