The Million Book Digital Library Project | December 1, 2001

peter.suber's bookmarks 2020-11-30


The Million Book Digital Library Project

Raj Reddy and Gloriana StClair, Carnegie Mellon University, Pittsburgh, Pa. 15213

December 1, 2001

"Objective: The objective of this project is to create a free-to-read, searchable collection of one million books, primarily in the English language, available to everyone over the Internet.  This task is accomplished by scanning the books and indexing their full text.  The text file is created, where possible, through optical character recognition.  The result will be a unique resource accessible to anyone in the world 24x7x365, without regard to nationality or socioeconomic background.   Typical large high-school libraries house fewer than 30,000 volumes.  One million volumes is the approximate size of the combined libraries at Carnegie Mellon University.  The total number of different titles indexed in OCLC’s WorldCat is about 48 million.  One million books, therefore, is more than the holdings of any high-school, equivalent to the library at a substantial university and a significant fraction of all available books.  

Executive Summary:   Creating a universal free to read, digital library containing over one million scanned books, with optical character recognition when possible to support full text searching, is the goal of the million book digital library project.  Such a resource will lead to the democratization of knowledge by making available on the web, a unique library resource to scholars, students, and citizens around the world.  The availability of online search allows users to locate relevant information quickly and reliably thus enhancing student willingness and success in their research endeavors.  This 24x7x365 resource would also provide an excellent testbed for language processing research in areas such as machine translation, summarization, intelligent indexing, and information mining.   A portion of the content would include out of copyright, pre-1920 materials.  A “best books” feature of the project would involve requesting permission to scan titles in the core collection development tool Books for College Libraries.  A preliminary Carnegie Mellon University Libraries pilot suggests that 22% of the 80,000 titles might become available. Further, when 80% of the million books are finished, scholars will be recruited to review collections in their disciplines and to select remaining books of importance.  ..."



11/29/2020, 03:39

From feeds:

Open Access Tracking Project (OATP) » peter.suber's bookmarks
Open Access Tracking Project (OATP) » ab1630's bookmarks


oa.books oa.digitization oa.libraries oa.nsf

Date tagged:

11/30/2020, 02:54

Date published:

12/01/2001, 03:39