Terminology supported archiving and publication of environmental science data in PANGAEA (OPEN ACCESS) - ScienceDirect

lkfitz's bookmarks 2017-07-31

Summary:

Abstract: "Exemplified on the information system PANGAEA, we describe the application of terminologies for archiving and publishing environmental science data. A Terminology Catalogue (TC) was embedded into the system, with interfaces allowing to replicate and to manually work on terminologies. For data ingest and archiving, we show how the TC can improve structuring and harmonizing lineage and content descriptions of data sets. Key is the conceptualization of measurement and observation types (parameters) and methods, for which we have implemented a basic syntax and rule set. For data access and dissemination, we have improved findability of data through enrichment of metadata with TC terms. Semantic annotations, e.g. adding term concepts (including synonyms and hierarchies) or mapped terms of different terminologies, facilitate comprehensive data retrievals. The PANGAEA thesaurus of classifying terms, which is part of the TC is used as an umbrella vocabulary that links the various domains and allows drill downs and side drills with various facets. Furthermore, we describe how TC terms can be linked to nominal data values. This improves data harmonization and facilitates structural transformation of heterogeneous data sets to a common schema. Technical developments are complemented by work on the metadata content. Over the last 20 years, more than 100 new parameters have been defined on average per week. Recently, PANGAEA has increasingly been submitting new terms to various terminology services. Matching terms from terminology services with our parameter or method strings is supported programmatically. However, the process ultimately needs manual input by domain experts. The quality of terminology services is an additional limiting factor, and varies with respect to content, editorial, interoperability, and sustainability. Good quality terminology services are the building blocks for the conceptualization of parameters and methods. In our view, they are essential for data interoperability and arguably the most difficult hurdle for data integration. In summary, the application of terminologies has a mutual positive effect for terminology services and information systems such as PANGAEA. On both sides, the application of terminologies improves content, reliability and interoperability."

Link:

https://doi.org/10.1016/j.jbiotec.2017.07.016

From feeds:

Open Access Tracking Project (OATP) ยป lkfitz's bookmarks

Tags:

oa.new oa.environment oa.data oa.discoverability oa.metadata oa.interoperability oa.infrastructure

Date tagged:

07/31/2017, 15:26

Date published:

07/31/2017, 11:26