Condition numbers for HMC and the funnel

Statistical Modeling, Causal Inference, and Social Science 2025-09-18

This post is by Bob.

Back to some technical statistical computing.

Condition numbers for random walks

The usual notion of condition number is the ratio of the largest to the smallest eigenvalue of the negative Hessian. Large eigenvalues correspond to high curvature and small eigenvalues to low curvature. Condition numbers matter because the step size needs to be small enough to deal with the regions of high curvature and thus will require many steps to traverse flatter regions of low curvature. Eigenvalues of the negative Hessian act like inverse variances (they are inverse variances in a multivariate normal with a diagonal covariance matrix), and are thus squared scales. If you set the step size to be consistent with the direction of highest curvature, you have to take a number of steps equal to the condition number to move in the direction of lowest curvature—this is the condition number. It bounds how many steps are going to be required to get roughly independent draws.

Neal’s funnel

Radford Neal introduced a funnel density in his slice sampling paper. I assume he was well aware of just how nasty this example is. The funnel is a centered parameterization of a hierarchical model with no data in N dimensions:

y ~ normal(0, 3) x[1:N - 1] ~ normal(0, exp(y / 2))

Here’s a density plot of y versus x[1] from the Stan User’s Guide chapter on reparameterization.

As you move along the y axis between +6 and -6, the condition number goes from 1000 to roughly 1 at the origin back up to 1000. From conditioning, both the mouth and the neck of the funnel are tricky. And this is only +/- two standard deviations, which is only approximately 95% of the probability mass. One of the things that makes the funnel nasty is that during the move from -6 to 6, the eigenstructure changes with the principal eigenvector (the one with the largest eigenvalue), changes alignment from along the x axes to along the y axis.

It is very hard to estimate the uncertainty in the funnel using sampling, even independent sampling. The problem is that x[n]^2 has a mean of roughly 100, but x[n]^4 has a mean of 2 x 10^8 (!) and thus x[n]^2 itself has a standard deviation of 1.4 x 10^4 (I’m using the fact that var[X^2] = E[X^4] - E[X^2]^2). This has to be enormously skewed to the right because the values are bounded below by 0. Even with 10 billion independent draws from the funnel, the estimates of the expectation and variance of the x coordinates are all over the place.

Condition numbers for HMC

HMC is so effective precisely because it overcomes the random walk behavior of Metropolis. Where Metropolis takes O(N^2) amount of work to move a distance of N, HMC only requires O(N^5/4). But there’s still this nasty constant from conditioning lurking in that asymptotic complexity result.

I don’t know how I missed it before, but I only learned about this paper at the MCM conference in Chicago last month:

Langmore et al. introduce an appropriate notion of condition for HMC,

kappa = [ SUM_{n=1}^N (lambdaMax / lambda[n])^4 ]^(1/4)

where lambda[1:N] are the eigenvalues of the negative Hessian, and lambdaMax = max(lambda[1:N]). This tells us that it’s worse to have one big eigenvalue (one highly curved dimension) and many small eigenvalues (flat dimensions) than the other way around. Therefore, the funnel is actually more poorly conditioned for HMC in the mouth than in the neck. In the mouth, the largest eigenvalue corresponds to the relatively slow moving y axis and the x axes are all much lower curvature relatively speaking. The reason the neck is usually considered the source of the problem is that the leapfrog algorithm in HMC is only a first-order (i.e., gradient-based) approximation of the Hamiltonian trajectory, and it can diverge pretty quickly in regions of high curvature. It turns out that if you take HMC or NUTS and use a fixed step size, you cannot explore the tails of either the neck or the mouth of the funnel very well.