The Steep Price of Sparsity
Normal Deviate 2013-07-27
The Steep Price of Sparsity
We all love sparse estimators these days. I am referring to things like the lasso and related variable selection methods.
But there is a dark side to sparsity. It’s what Hannes Leeb and Benedikt Potscher call the “return of the Hodges’ estimator.” Basically, any estimator that is capable of producing sparse estimators has infinitely bad minimax risk.
1. Hodges Estimator
Let’s start by recalling Hodges famous example.
Suppose that . Define as follows:
if
if .
If , then will equal for all large . But if , then eventually . The estimator discovers that is 0.
This seems like a good thing. This is what we want whenever we do model selection. We want to discover that some coefficients are 0. That’s the nature of using sparse methods.
But there is a price to pay for sparsity. The Hodges estimator has the unfortunate property that the maximum risk is terrible. Indeed,
Contrast this will the sample mean:
The reason for the poor behavior of the Hodges estimator — or any sparse estimator — is that there is a zone near 0 where the estimator will be very unstable. The estimator has to decide when to switch from to 0 creating a zone of instability.
I plotted the risk function of the Hodges estimator here. The risk of the mle is flat. The large peaks in the risk function of the Hodges estimator are very clear (and very disturbing).
2. The Steep Price of Sparsity
Leeb and Potscher (2008) proved that this poor behavior holds for all sparse estimators and all loss functions. Suppose that
here, . Let be the support of : of and of . Say that is sparsistent (a term invented by Pradeep Ravikumar) if, for each ,
as .
Leeb and Potscher (2008) showed that if is sparsistent, then
as . More generally, for any non-negative loss function , we have
3. How Should We Interpret This Result?
One might object that the maximum risk is too conservative and includes extreme cases. But in this case, that is not true. The high values of the risk occur in a small neighborhood of 0. (Recall the picture of the risk of the Hodges estimator.) This is far from pathological.
Another objection is that the proof assumes grows while stays fixed. But in many applications that we are interested in, grows and is even possibly larger than . This is a valid objection. On the other hand, if there is unfavorable behavior in the ideal case of fixed , we should not be sanguine about the high-dimensional case.
I am not suggesting we should give up variable selection. I use variable selection all the time. But we should keep in mind that there is a steep price to pay for sparsistency.
References
Leeb, Hannes and Potscher, Benedikt M. (2008). Sparse estimators and the oracle property, or the return of Hodges’ estimator. Journal of Econometrics, 142, 201-211.
Leeb, Hannes and Potscher, Benedikt M. (2005). Model selection and inference: Facts and fiction. Econometric Theory, 21, 21-59.