Just got a paper on conformal prediction REJECTED by International Journal of Forecasting despite evidence on 30,000 time series (and more). What’s going on? Part2: 1311 time series from the Tourism competition

R-bloggers 2025-01-20

[This article was first published on T. Moudiki's Webpage - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In #182, I presented a benchmark based on 30,000 time series, comparing the sequential split conformal method to the state of the art in Conformal prediction for Forecast.

In this post, I’ll present a benchmark based on 1311 time series from the Tourism competition, comparing the sequential split conformal method to the state of the art. Theta method from the forecast package is used as the base model, along with my R package ahead for conformalizing the base model. 2 target coverage levels are considered: 80% and 95%. The benchmarking errors are measured by:

Achieved test set coverage rate (percentage of future values that are within the prediction intervals) for 80% and 95% prediction intervals.
Test set Winkler score (see https://www.otexts.com/fpp3/distaccuracy.html#winkler-score for more details).

0 – Required packages

utils::install.packages(c('foreach', 'forecast', 'fpp', 'fpp2', 'remotes', 'Tcomp'),                        repos="https://cran.r-project.org")remotes::install_github("Techtonique/ahead")remotes::install_github("thierrymoudiki/simulatetimeseries")remotes::install_github("herbps10/AdaptiveConformal", force=TRUE)remotes::install_github("thierrymoudiki/misc")suppressWarnings(library(datasets))suppressWarnings(library(forecast))suppressWarnings(library(foreach))suppressWarnings(library(fpp2))suppressWarnings(library(ahead))suppressWarnings(library(AdaptiveConformal))suppressWarnings(library(misc))suppressWarnings(library(Tcomp))

1 – Useful functions

coverage_score <- function(obj, actual) {  if (is.null(obj$lower))  {    return(mean((obj$intervals[, 1] <= actual)*(actual <= obj$intervals[, 2]))*100)  }  return(mean((obj$lower <= actual)*(actual <= obj$upper))*100)}winkler_score <- function(obj, actual, level = 95) {  alpha <- 1 - level / 100  lt <- try(obj$lower, silent = TRUE)  ut <- try(obj$upper, silent = TRUE)  actual <- as.numeric(actual)  if (is.null(lt) || is.null(ut))  {    lt <- as.numeric(obj$intervals[, 1])    ut <- as.numeric(obj$intervals[, 2])  }  n_points <- length(actual)  stopifnot((n_points == length(lt)) && (n_points == length(ut)))  diff_lt <- lt - actual  diff_bounds <- ut - lt  diff_ut <- actual - ut  score <- diff_bounds  score <- score + (2 / alpha) * (pmax(diff_lt, 0) + pmax(diff_ut, 0))  return(mean(score))}get_error <- function(obj, actual, level = 95){  actual <- as.numeric(actual)  mean_prediction <- as.numeric(obj$mean)  me <- mean(mean_prediction - actual)  rmse <- sqrt(mean((mean_prediction - actual)**2))  mae <- mean(abs(mean_prediction - actual))  mpe <- mean(mean_prediction/actual-1)  mape <- mean(abs(mean_prediction/actual-1))  coverage <- as.numeric(coverage_score(obj, actual))  winkler <- winkler_score(obj, actual, level = level)  res <- c(me, rmse, mae, mpe,            mape, coverage, winkler)  names(res) <- c("me", "rmse", "mae", "mpe",                   "mape", "coverage", "winkler")  return(res)}

2 - Benchmarking loop

ahead_methods <- c("block-bootstrap", "surrogate",                    "kde", "bootstrap")aci_methods <- c('ACI', 'SCP', 'AgACI',                  'DtACI', 'SF-OGD', 'SAOCP')i <- 3level <- 95nsim <- 250ahead_method <- ahead_methods[1]aci_method <- aci_methods[1]# base model obj <- forecast::thetaf(y=Tcomp::tourism[[i]]$x,                         h=Tcomp::tourism[[i]]$h,                         level=level) print(get_error(obj, Tcomp::tourism[[i]]$xx))# conformalized ahead                             obj_ahead <- ahead::conformalize(FUN=forecast::thetaf,                            y=Tcomp::tourism[[i]]$x,                            h=Tcomp::tourism[[i]]$h,                            level=level,                            nsim = nsim,                            method=ahead_method)print(get_error(obj_ahead, Tcomp::tourism[[i]]$xx))# AdaptiveConformal(obj_aci <- AdaptiveConformal::aci(Y = as.vector(Tcomp::tourism[[i]]$xx),                                   predictions = as.vector(obj$mean),                                   method = "ACI",                                   alpha = level/100))obj_aci$mean <- as.vector(obj$mean)print(get_error(obj_aci, Tcomp::tourism[[i]]$xx))## -----------------------------------------------------------------------------------ahead_methods <- c("block-bootstrap", "surrogate",                    "kde", "bootstrap")aci_methods <- c('ACI', 'SCP', 'AgACI',                  'DtACI', 'SF-OGD', 'SAOCP')n_series <- length(tourism)for (level in c(80, 95)){  pb <- utils::txtProgressBar(min=0, max=n_series, style = 3)  benchmark <- foreach::foreach(i=1:n_series, .combine = rbind)%do%  {    results <- matrix(NA,                       nrow= 1 + length(ahead_methods) + length(aci_methods),                       ncol=9)    colnames(results) <- c("series", "method",                            "me", "rmse", "mae", "mpe", "mape",                            "coverage", "winkler")    results_index <- 1    # base model     obj <- forecast::thetaf(y=Tcomp::tourism[[i]]$x,                             h=Tcomp::tourism[[i]]$h,                             level=level)     results[results_index, ] <- c(i, "none",                                   get_error(obj, Tcomp::tourism[[i]]$xx))    results_index <- results_index + 1    # conformalized ahead    for (j in 1:length(ahead_methods))    {      obj_ahead <- try(ahead::conformalize(FUN=forecast::thetaf,                                y=Tcomp::tourism[[i]]$x,                                h=Tcomp::tourism[[i]]$h,                                level=level,                                nsim = nsim,                                method=ahead_methods[j]), silent = TRUE)      if (inherits(obj_ahead, "try-error"))      {        results[results_index, ] <- c(i, paste0("conformal-", ahead_methods[j]), rep(NA, 7))      } else {       results[results_index, ] <- c(i, paste0("conformal-", ahead_methods[j]),                                     get_error(obj_ahead, Tcomp::tourism[[i]]$xx))       }      results_index <- results_index + 1    }    # AdaptiveConformal    for (j in 1:length(aci_methods))    {      obj_aci <- try(AdaptiveConformal::aci(Y = as.vector(Tcomp::tourism[[i]]$xx),                                         predictions = as.vector(obj$mean),                                         method = aci_methods[j],                                         alpha = level/100), silent = TRUE)      if (inherits(obj_ahead, "try-error"))      {        results[results_index, ] <- c(i, paste0("conformal-", aci_methods[j]),                                       rep(NA, 7))      } else {        obj_aci$mean <- as.vector(obj$mean)        results[results_index, ] <- c(i, paste0("conformal-", aci_methods[j]),                                     get_error(obj_aci, Tcomp::tourism[[i]]$xx))      }      results_index <- results_index + 1    }        utils::setTxtProgressBar(pb, i)    results   }  close(pb)    benchmark <- cbind.data.frame(benchmark[, c(1, 2)],                                 apply(benchmark[, -c(1, 2)], c(1, 2), as.numeric))    benchmark$method <- sapply(1:length(benchmark$method),                              function(i) gsub(pattern = "conformal-",                                              replacement = "",                               x=benchmark$method[i]))    saveRDS(benchmark, paste0("2025-01-20-tourism-benchmark", level, ".rds"))}

3 - Plot Results

Coverage rates and Winkler scores for the 80% and 95% prediction intervals are presented, along with boxplots of the log-error rates (the lower the better).

tourism_benchmark80 <- readRDS("2025-01-20-tourism-benchmark80.rds")tourism_benchmark95 <- readRDS("2025-01-20-tourism-benchmark95.rds")benchmark_medians80 <- cbind.data.frame(tapply(tourism_benchmark80$coverage,                                              tourism_benchmark80$method, median),                                    tapply(tourism_benchmark80$winkler,                                            tourism_benchmark80$method, median))colnames(benchmark_medians80) <- c("coverage", "winkler_score")misc::sort_df(benchmark_medians80, by="winkler_score")benchmark_medians95 <- cbind.data.frame(tapply(tourism_benchmark95$coverage,                                                tourism_benchmark95$method, median),                                        tapply(tourism_benchmark95$winkler,                                                tourism_benchmark95$method, median))colnames(benchmark_medians95) <- c("coverage", "winkler_score")misc::sort_df(benchmark_medians95, by="winkler_score")par(mfrow=c(2, 1))boxplot(log(100-coverage) ~ method, data = tourism_benchmark80,         main="log-error rates")boxplot(log(100-coverage) ~ method, data = tourism_benchmark95,         main="log-error rates")                coverage winkler_scoresurrogate       83.33333      9339.212block-bootstrap 83.33333      9372.207kde             87.50000      9419.847bootstrap       79.16667     10110.819none            75.00000     11582.445AgACI           62.50000     17010.164DtACI           62.50000     17031.145ACI             62.50000     17165.020SCP             50.00000     18008.067SAOCP            0.00000     47472.648SF-OGD           0.00000     47535.205                 coverage winkler_scoresurrogate        95.83333      9453.619block-bootstrap  95.83333      9625.191kde             100.00000      9705.674none             87.50000      9845.460bootstrap        91.66667     10042.757AgACI            62.50000     16564.417ACI              62.50000     16649.422DtACI            62.50000     16649.422SCP              62.50000     16649.422SAOCP             0.00000     47472.648SF-OGD            0.00000     47535.205

Conclusion

So, based on these extensive experiments against the state of the art (and assuming the implementations of the state of the art methods are correct, which I’m sure they are, see https://computo.sfds.asso.fr/published-202407-susmann-adaptive-conformal/, and assuming I’m using them well), how cool is this contribution to the science of forecasting?

To leave a comment for the author, please follow the link and comment on their blog: T. Moudiki's Webpage - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: Just got a paper on conformal prediction REJECTED by International Journal of Forecasting despite evidence on 30,000 time series (and more). What’s going on? Part2: 1311 time series from the Tourism competition