HathiTrust Research Center Releases Massive Dataset of Features ... | HathiTrust Digital Library
ab1630's bookmarks 2015-05-13
Summary:
"The HathiTrust Research Center is pleased to announce the release of the Extracted Features Dataset (v.0.2), a dataset dervied from 4.8 million public domain volumes, totaling over 1.8 billion pages currently available in the HathiTrust Digital Library collection. The dataset includes over 734 billion words, dozens of languages, and spans multiple centuries. Features are informative, quantified characteristics of a text, and include ..."