[2507.22391] Knowledge engineering for open science: Building and deploying knowledge bases for metadata standards

peter.suber's bookmarks 2025-08-01

Summary:

Abstract:  Scientists strive to make their datasets available in open repositories, with the goal that they be findable, accessible, interoperable, and reusable (FAIR). Although it is hard for most investigators to remember all the guiding principles associated with FAIR data, there is one overarching requirement: The data need to be annotated with rich, discipline-specific, standardized metadata. The Center for Expanded Data Annotation and Retrieval (CEDAR) builds technology that enables scientists to encode metadata standards as templates that enumerate the attributes of different kinds of experiments. These metadata templates capture preferences regarding how data should be described and what a third party needs to know to make sense of the datasets. CEDAR templates describing community metadata preferences have been used to standardize metadata for a variety of scientific consortia. They have been used as the basis for data-annotation systems that acquire metadata through Web forms or through spreadsheets, and they can help correct metadata to ensure adherence to standards. Like the declarative knowledge bases that underpinned intelligent systems decades ago, CEDAR templates capture the knowledge in symbolic form, and they allow that knowledge to be applied in a variety of settings. They provide a mechanism for scientific communities to create shared metadata standards and to encode their preferences for the application of those standards, and for deploying those standards in a range of intelligent systems to promote open science.

 

Link:

https://arxiv.org/abs/2507.22391

From feeds:

Open Access Tracking Project (OATP) » peter.suber's bookmarks

Tags:

oa.new oa.metadata oa.standards oa.fair oa.data

Date tagged:

08/01/2025, 09:39

Date published:

08/01/2025, 05:39