Unlocking Author-Affiliation Metadata for All of arXiv — COMET

peter.suber's bookmarks 2026-02-18

Summary:

Abstract:  The COMET team is pleased to share results from an exciting line of work we have recently completed, focused on unlocking author-affiliation metadata from preprints. Specifically, we have trained a small, open-weight large language model (LLM) that achieves state-of-the-art performance on author-affiliation extraction for arXiv works. With this approach, we have for the first time produced open author-affiliation metadata for the full arXiv corpus as of December 2025, enabling community use and allowing for direct improvements to persistent identifier metadata. The trained model and dataset are openly available and free to use. Please read on to learn more!

Link:

https://www.cometadata.org/blog/unlocking-author-affiliation-metadata-for-all-of-arxiv

From feeds:

Open Access Tracking Project (OATP) » peter.suber's bookmarks

Tags:

oa.new oa.comet oa.authors oa.metadata oa.arxiv oa.ai oa.scholcomm

Date tagged:

02/18/2026, 09:27

Date published:

02/18/2026, 04:27