An open-source tool for merging data from multiple citation databases | Scientometrics

Hanna_S's bookmarks 2024-06-11

Summary:

Abstract:  A bibliometric analysis based on records from a single citation database may be limited in its comprehensiveness and, therefore, in the reliability of its results. The process of combining and deduplicating records from multiple citation index databases for the purpose of a bibliometric analysis is often manual and requires significant effort, especially for larger amounts of data. This paper presents an open-source tool for automatically preprocessing and deduplicating records based on similarity and user-configurable strategies. To validate the capabilities of the tool, the authors of this paper first manually deduplicated records from Scopus and Web of Science on a use-case analysis for 11,307 records. The performance of the tool was then evaluated against the manually deduplicated results. From the results of the best performing similarity configuration on a deduplication use case, the tool minimizes the time researchers would spend on data wrangling for combining Scopus and WoS up to 99% precision and 98% F-measure. The tool developed has practical implications for bibliometric studies. For instance, we conducted a bibliometric analysis of the most productive researchers at a university using a single citation database, as well as merged data from multiple citation databases. The study used the VOSviewer tool and showed that utilizing merged data may produce different outcomes compared to those obtained from a study based on a single citation database.

 

Link:

https://link.springer.com/article/10.1007/s11192-024-05076-2

From feeds:

Open Access Tracking Project (OATP) » peter.suber's bookmarks

Tags:

oa.new oa.floss oa.tools oa.data oa.citations

Date tagged:

06/11/2024, 13:49

Date published:

06/11/2024, 09:49