OpenAlex in focus: Metadata quality of publication type and language fields in an open peer review corpus | Information Research an international electronic journal

peter.suber's bookmarks 2026-03-24

Summary:

Abstract:  Introduction. OpenAlex is widely used as a free bibliographic database for bibliometric and scholarly communication research. Despite its openness and coverage, its metadata contains inconsistencies that require systematic cleaning. This study examines metadata quality in OpenAlex, focusing on publication type and language.

Method. Publications on open peer review were retrieved from OpenAlex. After filtering and deduplication, 6,640 records were manually checked. Document type and language fields were cross-verified with publisher sources, while Crossref publication types were also collected for comparison.

Analysis. Manual classification was harmonised across categories to ensure comparability. The main focus was to evaluate the agreement between OpenAlex and manual classifications of type and language, and to assess the consistency of Crossref publication types with both.

Results. Of 6,640 records, 2,878 (43%) showed publication type discrepancies, with ‘Article’ most often misused. Crossref aligned more closely with OpenAlex in broad categories but diverged from manual verification. Additionally, 222 records (3.3%) had language mismatches, often English labels wrongly assigned to non-English works.

Conclusion(s). OpenAlex is a valuable infrastructure, yet its metadata for publication type and language shows notable inconsistencies. Researchers should apply systematic cleaning and validation before using OpenAlex or similar databases.

Link:

https://publicera.kb.se/ir/article/view/64207

From feeds:

Open Access Tracking Project (OATP) » peter.suber's bookmarks

Tags:

oa.new oa.openalex oa.metadata oa.quality

Date tagged:

03/24/2026, 14:29

Date published:

03/24/2026, 10:29