TRIPLE and the use of vocabularies in Social Sciences and Humanities research infrastructures

OPERAS 2023-04-21

On 27 and 28 of March 2023, DARIAH organised one of the last events of the TRIPLE project (which concluded a few days later) in its premises in Berlin: a training event on the topic of Use of vocabularies for metadata curation and quality assessment in Social Sciences and Humanities. This event, targeting expert participants, aimed to share the results of the TRIPLE project in terms of vocabularies and to put them in perspective with other relevant initiatives in the Social Sciences and Humanities research infrastructures.

19 speakers and participants came together for two days to share their knowledge and best practices on topical vocabularies in the Social Sciences and Humanities. The event, which was led by Matej Durco (ACDH-CH, DARIAH-EU), focused on how concepts used to describe subjects of SSH objects and research are created, structured, used and reused.

Ceri Binding presenting ARIADNEplus project results

The first day was the occasion to set the scene for topical vocabularies used in SSH, thanks to a series of presentations covering various services and infrastructures. Vocabulary visibility and discovery, but also interoperability challenges as experienced within the SSH Vocabulary Commons, an initiative gathering several Research Infrastructures from Social Sciences and Humanities, was the first topic addressed. Prepared by Daan Broader (CLARIN) and presented by Matej Durco, this talk was a great introduction before entering into the details of state-of-the-art technologies used to speed up and optimise the creation and publishing of multilingual vocabularies. Two case studies from the SSH Open Cluster activities were indeed presented by Cesare Concordia (CNR-ISTI).

Focusing on GoTriple results, Julien Homo (FoxCub) explained how vocabularies are used in the enrichment pipeline of this discovery portal, both for classification and annotation purposes. His talk was followed by a presentation from Tomasz Umerle (IBL-PAN) on some of the results of TRIPLE deliverable 8.5: Guidelines on the Research Data in the Humanities, highlighting especially how controlled vocabularies can be used to organise keywords in SSH, based on the analysis of subject description in academic repositories, metadata aggregators or digital libraries. 

During the afternoon, Antoine Isaac (Europeana) presented how Europeana and its partners work in a collaborative and flexible fashion to address quality issues in the millions of metadata records from over 3500 libraries, archives and museums gathered by the platform. In turn, Ceri Binding (ARIADNE) explained how the ARIADNEplus project aggregated over 3.5 million metadata records describing archaeological sites. This work involved 59 local vocabularies containing over 19,000 subject terms in 16 different languages while a vocabulary matching tool was also used. Following that, Klaus Illmayer (ACDH-CH) explained how vocabularies are used in the SSH Open Marketplace. He especially focused on how users are involved in the creation of concepts and how the curation workflow has been designed to support that. Next speaker, Alessia Bardi (CNR-ISTI, OpenAire) showed how controlled vocabularies are used in the OpenAire Graph, especially vocabularies used for subject classification. 

To conclude the day, Peter Kiraly (GWDG) presented his work on quality assessment, and showed some of the results of the QA catalogue tool, a metadata quality assessment tool developed for MARC21 and PICA metadata schema based library catalogues.   

The second day was kicked off by Nina C. Rastinger and Massimilano Carloni (ACDH-CH) who ran a hands-on session to allow comparison of concepts across various vocabularies used in the SSH. First developed in an Austrian context, they extended their study to include a wider number of vocabularies and shared their result during this event. After being guided through a Colab notebook, the participants were able to play around selecting vocabularies they wanted to compare.  

Finally, following the training part of the event, a meeting was held to discuss how to build on the successful GoTriple vocabulary experience. Two initiatives were extensively discussed: the SSH Vocabulary Commons, created during the SSHOC project, and the idea of starting an RDA Working Group “Multilingual Vocabulary Alignment” as part of the RDA Tiger project that Arnaud Gingold (OPERAS) and Najla Rettberg (RDA) introduced. 

Opportunities for collaboration are promising in the area of SSH vocabularies and stakeholders present at the meeting will definitely continue to work together towards an SSH vocabulary federation.

The training part of this event has been “captured” to be published on DARIAH Campus. Stay tuned as all the materials only partly mentioned in this blog post will soon be openly available!