PDF accessibility in open repositories: A large-scale automated assessment - Kajetan Rożej, Łukasz Skonieczny, Jakub Koperwas, 2026
peter.suber's bookmarks 2026-01-05
Summary:
Abstract: This study systematically evaluates the accessibility of PDF documents in open digital repositories, with a primary focus on compliance with WCAG and PDF/UA standards. A large-scale automated analysis was conducted on a dataset of 100,000 PDF documents collected from 1000 repositories listed in the OpenDOAR registry. The objectives were to assess overall accessibility and to identify patterns and disparities related to academic discipline and country of origin. The analysis revealed that only 0.3% of the examined documents passed all automated accessibility tests, indicating a critically low overall level of compliance. The results reveal substantial variation in accessibility between repositories. A significant proportion of documents exhibited major violations, including missing alternative text, improper structural tagging and incomplete metadata. Although some repositories demonstrated high accessibility compliance, others failed to meet even the most basic standards. Regional differences were also observed, with repositories from the United States and Canada showing notably higher levels of compliance, probably influenced by stronger national accessibility legislation and institutional policies. Beyond evaluating document accessibility, the study also uncovered limitations in the OpenDOAR registry itself. Only about 50% of the tested OAI-PMH endpoints were fully functional, with the remainder showing invalid addresses, authorisation errors or timeouts, highlighting the need for more consistent maintenance and verification of repository infrastructure. These findings underscore the urgent necessity for broader awareness and more consistent implementation of accessibility standards across the digital repository landscape.