Open Patrologia Graeca 1.0 » Perseus Digital Library Updates

abernard102@gmail.com 2015-08-08

Summary:

"A first stab at producing OCR-generated Greek and Latin for the complete Patrologia Graeca (PG) is now available on GitHub at https://github.com/OGL-PatrologiaGraecaDev. This release provides raw textual data that will be of service to those with programming expertise and to developers with an interest in Ancient Greek and Latin. The Patrologia Graeca has as much as 50 million words of Ancient Greek produced over more than 1,000 years, along with an even larger amount of scholarship and accompanying translations in Latin. Matt Munson started a new organization for this data because it is simply too large to put into the existing OGL organization. Each volume can contain 250MB or more of .txt and .hocr files, so it is impossible to put everything in one repository or even several dozen repositories. So he decided to create a new organization where all the OCR results for each volume would be contained within its own repository. This will also allow us to add more OCR data as necessary (e.g., from Bruce Robertson, of Mt. Allison University, or from nidaba, our own OCR pipeline) at the volume level. The repositories are being created and populated automatically by a Python script, so if you notice any problems or strange happenings, please let us know either by opening an issue on the individual volume repository or by sending us an email. This is our first attempt at pushing this data out. Please let us know what you think ..."

Link:

https://sites.tufts.edu/perseusupdates/2015/08/07/open-patrologia-graeca-1-0/

From feeds:

Open Access Tracking Project (OATP) » abernard102@gmail.com

Tags:

oa.new oa.comment oa.data oa.github oa.languages oa.greek oa.latin oa.classics oa.tools oa.floss oa.ssh

Date tagged:

08/08/2015, 16:01

Date published:

08/08/2015, 12:01