A simplified survey on the use of open source data in research activities 2024 [survey material-342] has been published (9/25) | National Institute of Science and Technology Policy (NISTEP)
Hanna_S's bookmarks 2024-09-27
"This paper presents the results of a survey conducted to understand “changes in research activities due to DX(digital transformation)”. The specific purpose of this survey is to investigate the use of open source data in research activities. For this purpose, we investigated the number of mentions of open source and open data in manuscripts on arXiv, a major preprint server in the fields of physics and information sciences. In the survey, github was set as a proxy variable for open source, and Zenodo and figshare as proxy variables for open data. The DOI was also investigated as basic data for comparison. Using the email address given in the text as a clue, each manuscript was assigned a nationality (where assignable) and organized based on the year and month of first publication. In terms of years, the survey covered 24 time points from 2010 to 2023. Compared to the previous survey, which covered data up to September 2022, the proportion of manuscripts mentioning Zenodo, figshare, and github did not change relatively significantly compared to the pre-2022 period, although the number of manuscripts covered itself increased. The survey further comparedeacharXivmanuscriptbydiscipline. Wealsoexaminedmanuscriptsfrom2023onthetwomajor preprint servers, bioRxiv and medRxiv, in the fields of biology and medicine respectively, to investigate larger disciplinary differences. As a result, we observed that the number of disseminated DOIs was almost equal in units of arXiv, bioRxiv, and medRxiv, around 20% of the total, while there were differences in github mentions, with nearly 20% of manuscripts in bioRxiv and nearly 10% in medRxiv having mentions, showing some characteristics between the fields. It was also confirmed that even within the arXiv, there were differences among physics, mathematics, and information sciences. The lack of progress in the use of open data for data types such as numerical values and charts needs to be interpreted in conjunction with the results of a separate questionnaire survey. On the other hand, the increase in the number of references to github, which is used as a proxy variable for open source, indicates “changes in research activities due to DX”, which could be taken into account in performance evaluation."