Effective sample size
Statistical Modeling, Causal Inference, and Social Science 2025-11-27
This post is by Aki
Richard McElreath had a bsky post with a MCMC convergence diagnostic plot with one axis label saying “number of effective samples”. I commented that it’s wrong and misleading, and it would be better to write “effective sample size”. Frank Harrell asked elaboration. Before I had time to answer, some other people had posted just fine answers. I wanted to write a bit more but not having time to write short, I decided to write a blog post.
The problem with “number of effective samples” is that it sounds like some samples are effective and some are not, but the effectiveness is not a property of individual samples but the whole sample.
Before I continue further, I want to switch using different words. Each individual posterior draw is sample, and the collection of all posterior draws is sample. Technically this is correct, but can lead to confusion whether we refer to an individual (sample) or the group of individuals (sample). This is why I prefer to talk about individual posterior draws, and the collection of posterior draws is a posterior sample. This is also why posterior R package uses the term draw.
Every single MCMC draw has effective sample size 1, and the number of effective draws is the same as the total number of draws. However, when we use a collection of MCMC draws to estimate some expectation, the Markovian dependency makes the Monte Carlo error behave in a way that it makes sense to compare the estimation efficiency to the corresponding sample size of independent draws.
Effective sample size used to be denoted by n_eff. It might be that “number of effective samples” comes from some people reading n_eff as “number of effective”. n_eff has another problem, as n often denotes the number of observations. That’s why we recommend shortening the effective sample size as ESS. This is also what posterior package uses.
A further important point is that the effective sample size depends also on which expectation is estimated. Most commonly effective sample size has been reported for estimation of E[theta], but as the effective sample size can be very different for example for E[theta^2] it would be better to explicitly state what is estimated. By default, posterior package reports Bulk-ESS and Tail-ESS, and while neither is ESS for a simple expectation, they are more informative than just ESS.
It’s been a long time since I wrote a blog post which was not a job ad, and coincidentally also now I’m looking for postdocs who have strong background in Bayesian methods and interested to work on Bayesian cross-validation, model checking and comparison.