Simulating A Simple Response Adaptive Randomization – I Have To See It To Believe It

R-bloggers 2025-05-06

[This article was first published on r on Everyday Is A School Day, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In my simulations of Response Adaptive Randomization, I discovered it performs comparably to fixed 50-50 allocation in identifying treatment effects. The adaptive approach does appear to work! However, with only 10 trials, I’ve merely scratched the surface. Important limitations exist – temporal bias risks, statistical inefficiency, and complex multiplicity adjustments in Bayesian frameworks.

Objectives

What Is Response Adaptive Randomization?

Response-Adaptive Randomization (RAR) is a technique used in clinical trials where the allocation of patients to different treatment arms changes based on interim results collected during the trial. Unlike traditional fixed randomization where treatment allocation ratios remain constant throughout the study, RAR adjusts the probability of assignment to treatment groups as data accumulates, typically to assign more patients to treatments that appear to be performing better. This approach is designed to achieve two main goals: to maximize the information gained about superior treatments while minimizing the number of patients exposed to inferior treatments during the trial itself.

The primary motivation for using RAR is ethical – it aims to treat more trial participants effectively while still gathering sufficient data for scientific conclusions. It’s particularly considered in trials involving serious conditions with high mortality rates or in multi-arm trials where several experimental treatments are being compared. RAR has received considerable theoretical attention since the 1930s but remains controversial in practice, with proponents citing its potential ethical advantages while critics point to concerns about statistical validity, potential bias from temporal trends, and increased complexity in trial design and analysis.

REMAP-CAP was the first study that I came across that introduced me to this design and I found the theory quite intriguing and interesting! It does appear too good to be true but for me, I cannot get the intuition behind how the adaptive randomization be able to identify it. Fortunately, there is another way we can convince ourselves that this works, and that is to simulate it for ourselves, Bayesian style!

Adaptive Formula for Response Adaptive Randomization

Thall and Wathen approach

This is a commonly used formula that adjusts randomization probabilities based on posterior probabilities. For a two-arm trial, the probability of assigning a patient to treatment 1 is:

\begin{gather} π₁,ᵢ = \frac{[P(p₁ > p₀|data)]^c}{[P(p₁ > p₀|data)]^c + [1-P(p₁ > p₀|data)]^c} \end{gather}

Where P(p₁ > p₀|data) is the posterior probability that treatment 1 is better than treatment 0, and c is a tuning parameter that controls adaptation speed. When c=0, this reduces to equal randomization; when c=1, it becomes Thompson Sampling.

There may be other approaches to RAR, but this is the one that makes sense and easy to implement. Dive deeper

Simulation Plan

Hopefully simulating this multiple times might give me a better idea of how it will work in practice. How do we do that?

Adaptive vs 50-50 Allocation

Set up a world where we have the entire population and we know how they respond with and without treatment.
We’re going to set treatment effect as -1.09, which means the treatment is better at reducing some proportion of event than the control. The formula would be log odds = b0 + b1 * treatment, where b0 is 0, and b1 is our treatment effect.
We will then write a code to simulate both Response Adaptive Randomization and 50-50 allocation, with a 50 sets. Each set will then have 4 sequential analyses and sampling. The first analysis will have 50 patients with random allocation with 50%. For subsequent interim analyses, response adaptive randomization group will depend on the response, whereas 50-50 allocation group will remain as 50%. For adatpive randomization, we will use Thall and Wathen approach with c between 0 and 0.5. We will use Thall and Warten’s formula for updating the c with n / (2*N), where n is the number of patients sampled thus far plus the ones will be sampled this in interim trial, and N is the maximum number of patients in the trial. As you can see, when total of 200 patients sampled, c will be 0.5 at the last interim analysis.
Each trial for both groups is set with the same seed to ensure reproducibility.
After each trial, we extract the coefficient of interest (in our case beta1), and also assess what is the probability of beta1 being less than 0, which means the treatment is helpful at reducing whatever outcome we want to reduce.

Questions I Have For Myself

What is the difference between Response Adaptive Randomization and 50-50 allocation in terms of treatment effect estimates? Will RAR have the same accuracy at identifying treatment effect? Will RAR reach identify treatment effect faster than 50-50 allocation?
If there is no effect, will RAR be able to tease that out to? And what would that look like?

Code

library(tidyverse)library(cmdstanr)library(glue)## comparing 50-50 allocation vs Adaptive Randomizationmethod_list <- c("50_50","adaptive")seeds <- sample(1:100000, size = 10, replace = F)## set up empty dataframeresult <- tibble(  b0 = numeric(),  b0_lower = numeric(),  b0_upper = numeric(),  b1 = numeric(),  b1_lower = numeric(),  b1_upper = numeric(),  prob = numeric(),  treatment_num = numeric(),  num_sample = numeric(),  treatment_num_cum = numeric(),  num = numeric(),  method = character(),  seed = numeric())for (seed in seeds) {for (method in method_list){treatment_effect <- -1.09### Set Up Entire Population, knowing both effects of placebo and treatment of each patientset.seed(seed)N <- 100000x0 <- replicate(N, 0)y0 <- rbinom(N, 1, plogis(treatment_effect*x0))x1 <- replicate(N,1)y1 <- rbinom(N, 1, plogis(treatment_effect*x1))df <- tibble(y0,y1) |>  mutate(id = row_number())### Max samplemax_n <- 200### How many sampling trialsn <- 50sample_number <- max_n / nb0 = b0_lower = b0_upper = b1 = b1_lower = b1_upper = prob_vec = treatment_num = num_vec = vector(mode = "numeric", length = sample_number)x_vec <- c()### Changing c parametern_sum <- 0c_param <- function(x) {  value <- x / (2*max_n)  return(value)}for (i in 0:sample_number) {  ## Set initial parameters and don't log it  if (i == 0) {    b0mu <- 0    b0sd <- 10    b1mu <- 0    b1sd <- 2.5    diff <- 0.5    x <- rbinom(n, 1, 0.5)    num_vec[i+1] <- n    n_sum <- n  } else {  ## Update priorb0mu <- fit$summary("beta0")[['median']]b0sd <- fit$summary("beta0")[['sd']]b1mu <- fit$summary("beta1")[['median']]b1sd <- fit$summary("beta1")[['sd']]b0[i] <- b0mub0_lower[i] <- fit$summary("beta0")[["q5"]]b0_upper[i] <- fit$summary("beta0")[["q95"]]b1[i] <- b1mub1_lower[i] <- b1_lower_ib1_upper[i] <- b1_upper_in_sum <- n_sum + n## Assigment of sampling proportion; 50% allocation at all times vs Adaptiveif (method == "50_50") {  diff <- 0.5  } else {        prob <- beta1 |>      mutate(treatment_benefit = ifelse(beta1 < 0, 1, 0)) |>      summarize(prop = mean(treatment_benefit)) |>      pull()        c <- c_param(n_sum)    # diff <- min(max(prob, 0.1), 0.9)    diff <- prob^c / (prob^c + (1-prob)^c)  }prob_vec[i] <- difftreatment_num[i] = sum(x)## Each Sampling is 50 patientsnum_vec[i+1] <- nx <- rbinom(n, 1, diff)}  ## Sampling from populationdf_list <- df |>  slice_sample(n = n) |>  bind_cols(x=x) |>  mutate(y = case_when(    x == 1 ~ y1,    x == 0 ~ y0  ))## Bayesian Model## main modelstan_model <- glue("data {{  int<lower=0> N;    array[N] int x;    array[N] int y;  }}parameters {{  real beta0;    real beta1;  }}model {{  beta0 ~ normal({b0mu},{b0sd});  beta1 ~ normal({b1mu},{b1sd});  y ~ bernoulli_logit(beta0 + beta1*to_vector(x));}}")  ## compile modelmod <- write_stan_file(stan_model)model <- cmdstan_model(mod)fit <- model$sample(  data = list(N=nrow(df_list), x=df_list$x, y=df_list$y),  chains = 4,   iter_sampling = 2000,  iter_warmup = 1000,  seed = 1,  parallel_chains = 4)## Remove patients who are already sampled from the populationsample <- df_list |> pull(id)df <- df |> filter(!id %in% sample)x_vec <- c(x_vec,x)print(i)## Extract MCMCmcmc <- as.data.frame(fit$draws(inc_warmup = F))## Assess probability that treatment effect is < 0 beta1 <- mcmc |>  select(contains("beta1")) |>  pivot_longer(cols = everything(), names_to = "column", values_to = "beta1") b1_lower_i <- beta1 |>  summarize(lower = quantile(beta1, 0.025)) |>  pull(lower)b1_upper_i <- beta1 |>  summarize(upper = quantile(beta1, 0.975)) |>  pull(upper)}## log resultsresult <- result |>  bind_rows(tibble(b0=b0,b0_lower=b0_lower,b0_upper=b0_upper,b1=b1,b1_lower=b1_lower,b1_upper=b1_upper,prob=prob_vec,treatment_num=treatment_num, num_sample=num_vec[1:(length(num_vec)-1)]) |>   mutate(treatment_num_cum = cumsum(treatment_num),         num = cumsum(num_sample),         method = method,         seed = seed) |>  slice_head(n=30))}}

Interpretation

Visualization

## Visualize the resultresult |>  ggplot(aes(x=num,y=b1)) +  geom_point() +  geom_line() +  geom_ribbon(aes(ymin=b1_lower,ymax=b1_upper), alpha=0.5) +  geom_hline(yintercept = 0) +  geom_hline(yintercept = treatment_effect, color = "blue") +  geom_label(aes(x=num,y=b1,label=treatment_num_cum), size=3) +  ggtitle("Reponse Adaptive Randomization vs 50-50 Fixed Randomization", subtitle = "Blue line = True value, Black line = No different between treatment & placebo, Numbers labeled = Number of treatment") +  xlab("Patients") +  theme_bw() +  xlim(40,210) +  facet_grid(method~seed)

The above plot shows the beta1 coefficient parameter from the posterior distribution. The 10 columns are the seeds for each set of trial. The row represents the method of allocation, either 50-50 or adaptive randomization. The blue line is the true treatment effect, which is -1.09. The grey line is when there is no effect (0), and the shaded area is the 95% credible interval. Simple heuristic is that if the grey area goes below the black line, it then means there is 95% probability that there is a treatment effect. Now let’s see if we can answer the questions we had.

Will RAR need less patients to show a positive treatment effect?

Does not seem like it like, both 50-50 and RAR seem to have similar number of patients to show a positive treatment effect.

Will RAR have the same accuracy at identifying treatment effect?

Yes. The patterns on both methods appear to be quite similar.

If there is no effect, will RAR be able to tease that out? And what would that look like?

Maybe? both 50-50 and adpative randomization seem to be able to identify that there is no treatment effect. However, the 50-50 allocation seems to be more stable than the adaptive randomization. This is only from 10 trials, hard to say. We also didn’t set criteria for stopping for futility.

On a side note, notice how with seed 26561 and 87907, we will falsely conclude that there is a treatment effect when there is none.

Limitations

According to Resist the Temptation of Response-Adaptive Randomization, Response-Adaptive Randomization (RAR) causes many problems, including:

Bias from temporal trends in clinical trials when patients enrolled at different times face systematically different conditions.
Inefficiency in treatment effect estimation that often requires larger sample sizes to maintain statistical power.
Volatility in sample-size distributions that can paradoxically assign more patients to inferior treatments due to random variation.
Difficulty of validly analyzing results when sample sizes themselves contain information about treatment efficacy.
The potential for selection bias when researchers become unintentionally unblinded to interim results as allocation probabilities shift.

What About Multiplicity Adjustment and Type 1 Error?

Bayesian adaptive designs face a practical dilemma regarding multiplicity adjustment. While Bayesian inference theoretically doesn’t require corrections for multiple looks at data, regulatory requirements typically demand Type I error control for confirmation trials regardless of statistical approach. Case studies show that unadjusted interim analyses with early efficacy stopping inflate Type I error rates, while futility-only stopping decreases Type I error but reduces power. For researchers implementing Bayesian adaptive designs with early efficacy stopping, adjustments to boundaries become necessary as analyses increase, creating tension between Bayesian philosophy and regulatory demands, especially in trials aiming to change clinical practice.

Opportunity for Improvement

We only did 10 trials total, which is not enough to draw concrete conclusions.
Cauchy distribution for logistic regression coefficient prior read more
Set ROPE (Region of Practical Equivalence) for the treatment effect, and see if we can get a better idea of how many patients we need to sample before we can conclude that there is no treatment effect.

I had sent my friend Alec Wong the script earlier on to proof-read. Here are the comments for me to improve in for future reference.

Extract functionality into dedicated functions, ideally small, concise, and highly descriptive of what it does. Write them to avoid global variables as much as possible, and instead be explicit about what variables the function depends on, and pass them in.
A lot of lines that are independent of the “methods” loop, but nevertheless show up inside the loop. Anything that does not depend on the loop should not belong in the loop. You modify the df of simulated data during the loop, yes, but that needs not be the case, you could modify to a new variable instead of overwriting df.
The “method” loop is actually quite small and doesn’t do very much. There appears little reason to write the script so that everything is run twice.
You re-use n many times in this script, making it difficult to reason about what the value of n is at any given time. If you follow the advice in (1), you can re-use n inside functions without ambiguity because they are scoped to the function.

Thanks Alec! I’ll work on these!

Lessons Learnt

Refreshed on our bayesian statistics and Stan code
Learnt about RAR, and what it looks like compared to 50-50 allocation
Literature search on the question on multiplicity and bayesian statistics
Cauchy distribution for logistic regression coefficient prior read more

If you like this article:

please feel free to send me a comment or visit my other blogs
please feel free to follow me on BlueSky, twitter, GitHub or Mastodon
if you would like collaborate please feel free to contact me

To leave a comment for the author, please follow the link and comment on their blog: r on Everyday Is A School Day.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: Simulating A Simple Response Adaptive Randomization – I Have To See It To Believe It