Getting to open data for Classical Greek and Latin: breaking old habits and undoing the damage — a call for comment! » Perseus Digital Library Updates 2015-03-06


"Philologists must for at least two reasons open up the textual data upon which they base their work. First, researchers need to be able to download, modify and redistribute their textual data if they are to fully exploit both new methods that center around algorithmic analysis (e.g., corpus linguistics, computational linguistics, text mining, and various applications of machine learning) and new scholarly products and practices that computational methods enable (e.g., on-going and decentralized production of micro-publications by scholars from around the world, as well as scalable evaluation systems to facilitate contributions from, and learning by, citizen scientists). In some cases, issues of privacy may come into play (e.g., where we study Greek and Latin data produced by our students) but our textual editions of, and associated annotations on, long-dead authors do not fall into this category. Second, open data is essential if researchers working with historical languages such as Classical Greek and Latin are to realize either their obligation to conduct the most effective (as well as transparent) research and or their obligation to advance the role that those languages can play in the intellectual life of society as a whole. It is not enough to make our 100 EUR monographs available under an Open Access license. We must also make as accessible as possible the primary sources upon which those monographs depend. This blog post addresses two barriers that prevent students of historical languages such as Classical Greek and Latin from shifting to a fully open intellectual ecosystem: (1) the practice of giving control of scholarly work to commercial entities that then use their monopoly rights to generate revenue and (2) the legacy rights over critical editions that scholars have already handed over to commercial entities. The field has the rights, the skills, and the labor so that it can immediately and permanently address the first challenge. The second challenge is much less tractable. We may never be able to place recent work in a form where it can fully support new scholarship. That form includes not only the rights that restrict its distribution and, often, the digital format in which textual editions have been produced (e.g., where editors used word processing files rather than best practices such as well-implemented Text Encoding Initiative XML markup). Both the rights and the format together make it unlikely that we will be able in the immediate future (if ever) to make recent critical editions fully available (under a CC-BY-SA license, with TEI XML markup representing the logical structure of both the reconstructed text and the textual notes). The question before us is to determine how much we can in the immediate future recover for the full range of scholarly use and public discourse ..."


