Hathi1M: Introducing a Million Page Historical Prose Dataset in English from the Hathi Trust

.txtLAB @ McGill 2022-03-17

Summary:

Really pleased to announce the release of a new data set that I’ve been working on with my collaborator Sunyam Bagga. In it we build on the prior work of Ted Underwood and his team to develop parallel corpora of fiction and non-fiction writing over 

Link:

https://txtlab.org/2022/03/hathi1m-introducing-a-million-page-historical-prose-dataset-in-english-from-the-hathi-trust/

From feeds:

ArtsHums » .txtLAB @ McGill

Tags:

data humanities dh academy

Authors:

Andrew Piper

Date tagged:

03/17/2022, 13:30

Date published:

03/17/2022, 11:49