Hathi1M: Introducing a Million Page Historical Prose Dataset in English from the Hathi Trust
.txtLAB @ McGill 2022-03-17
Summary:
Really pleased to announce the release of a new data set that I’ve been working on with my collaborator Sunyam Bagga. In it we build on the prior work of Ted Underwood and his team to develop parallel corpora of fiction and non-fiction writing over …