Bayesian and frequentist statistical models to predict publishing output and article processing charge totals - Dixon & Schares - Journal of the Association for Information Science and Technology - Wiley Online Library
peter.suber's bookmarks 2025-04-02
Summary:
Abstract: Academic libraries, institutions, and publishers are interested in predicting future publishing output to help evaluate publishing agreements. Current predictive models are overly simplistic and provide inaccurate predictions. This paper presents Bayesian and frequentist statistical models to predict future article counts and costs. These models use the past year's counts of corresponding authored peer-reviewed articles to predict the distribution of the number of articles in a future year. Article counts for each journal and year are modeled as a log-linear function of year with journal-specific coefficients. Journal-specific predictions are summed to predict the distribution of total paper count and combined with journal-specific costs to predict the distribution of total cost. We fit models to three data sets: 366 Wiley journals for 2016–2020, 376 Springer-Nature journals from 2017 to 2021, and 313 Wiley journals from 2017 to 2021. For each dataset, we compared predictions for the subsequent year to actual counts. The model predicts two datasets better than using either the annual mean count or a linear trend regression. For the third, no method predicts output well. A Bayesian model provides prediction uncertainties that account for all modeled sources of uncertainty. Better estimates of future publishing activity and costs provide critical, independent information for open publishing negotiations.