Automatic Classification of Software Repositories: a Systematic Mapping Study - Archive ouverte HAL

peter.suber's bookmarks 2025-05-03

Summary:

Abstract:  The rapid growth of software repositories on development platforms such as GitHub, as well as archives like Software Heritage, prompts the need for better repository classification. Machine learning is increasingly used to automate this classification, but there are no secondary studies analyzing this research landscape. We present a systematic mapping study of 43 primary sources published between 2002 and 2023, where we examine the goals, inputs, outputs, training, and evaluation processes involved in automatic repository classification. Our findings reveal a growing interest in automatic classification, particularly to enhance the discoverability and recommendation of relevant repositories. Other applications, such as classification for mining studies, were surprisingly underrepresented. We also observe that a lack of standardized datasets, classification tasks, and evaluation metrics makes it difficult to compare the performance of different techniques.

 

Link:

https://hal.science/hal-05049757v1

From feeds:

Open Access Tracking Project (OATP) » peter.suber's bookmarks

Tags:

oa.new oa.repositories oa.repositories.code oa.github

Date tagged:

05/03/2025, 14:16

Date published:

05/03/2025, 10:16