Images of the arXiv: Reconfiguring large scientific image datasets | Published in Journal of Cultural Analytics
peter.suber's bookmarks 2021-03-04
Abstract: In an ongoing research project on the ascendancy of statistical visual forms, we have been concerned with the transformations wrought by such images and their organisation as datasets in ‘redrawing’ knowledge about empirical phenomena.Historians and science studies researchers have long established the generative rather than simply illustrative role of images and figures within scientific practice. More recently, the deployment and generation of images by scientific researchand its communication via publication has been impacted by the tools, techniques, and practices of working with large(image) datasets. Against this background, we built a dataset of 10 millionplus images drawn from all preprint articles deposited in the open access repository arXiv from 1991 (its inception) until the end of 2018. In this article, we suggest ways – including algorithms drawn from machine learning that facilitate visually ’slicing’ through the image data and metadata – for exploring large datasets of statistical scientific images. By treating all forms of visual material found inscientific publications – whether diagrams, photographs, or instrument data – as bare images, we developed methods for tracking their movements across a range of scientific research. We suggest that such methods allow us different entry points into large scientific image datasets and that they initiate a new set of questions about how scientific representatio nmight be operating at more-than-human scale.