Digitizing Printed Arabic Journals: Is a Scalable Solution Possible?

peter.suber's bookmarks 2019-11-06

Summary:

"In 2017, JSTOR received a grant from the National Endowment for the Humanities to investigate processes for digitizing Arabic-language scholarly content. Our goal in the project was to develop a workflow for scanning Arabic materials--especially journals-- that is reasonably cost-efficient, feasible to implement at scale, and likely to produce high-quality images and metadata, including fully searchable text....

Through this investigation, we concluded that, using new metadata guidelines and OpenITI’s software, and leveraging specific workflows created jointly with Apex, it is possible for JSTOR to digitize Arabic language journals with the high-degree of accuracy needed to support search and discovery at a cost of approximately $3 per page, with the promise that this per page cost could be reduced further through continuous improvements in the OCR software engine. In this white paper, we contextualize our investigation in the broader landscape of digital scholarly literature in Arabic. We then document our approach and findings from this project, which took place over 20 months from April 2017 through December 2018. And finally, we lay out some areas we identified for potential further research...."

Link:

https://about.jstor.org/wp-content/uploads/2019/08/NehAward_PW-253861-17_JstorArabicDigitizationInvestigation_WhitePaper_20190329.pdf

From feeds:

Open Access Tracking Project (OATP) » peter.suber's bookmarks

Tags:

oa.new oa.digitization oa.arabic oa.journals oa.jstor oa.funding oa.neh oa.ch oa.costs

Date tagged:

11/06/2019, 12:34

Date published:

11/06/2019, 07:34