Challenges Faced by Institutional Repositories in Managing Indian Language Content | A | Informatics Studies

peter.suber's bookmarks 2026-03-27

Summary:

Abstract:  Institutional Repositories (IRs) play a critical role in preserving and disseminating scholarly output generated within academic institutions. In India’s multilingual environment, managing content in Indian languages presents significant technical and institutional challenges. This paper examines issues related to Optical Character Recognition (OCR), font encoding, metadata creation, user access, and standardization in languages such as Hindi, Malayalam, Sanskrit, Tamil, and Bengali. Using the Institutional Repository of the University of Calicut as a case study, the paper analyzes practical difficulties encountered in archiving theses and scholarly documents in Indic scripts. Many born-digital theses in regional languages rely on non-Unicode fonts, limiting discoverability, indexing, and retrieval. The study outlines strategies adopted to address these problems, including scanning, OCR processing, document cleaning, rasterization, and conversion into Unicode-compliant formats using open-source tools. It also highlights best practices in metadata standardization, copyright management, and repository workflows to improve long-term accessibility and discoverability of Indian language scholarly resources.

 

Link:

https://www.informaticsstudies.org/index.php/informatics/article/view/743

From feeds:

Open Access Tracking Project (OATP) » peter.suber's bookmarks

Tags:

oa.new oa.ir oa.repositories oa.green oa.india oa.ocr oa.multilingualism oa.case oa.case.repositories oa.south

Date tagged:

03/27/2026, 10:29

Date published:

03/27/2026, 06:29