Estimating Undirected Graphs Under Weak Assumptions

Normal Deviate 2013-12-20

Mladen Kolar, Alessandro Rinaldo and I have uploaded a paper to arXiv entitled “Estimating Undirected Graphs Under Weak Assumptions.”

As the name implies, the goal is to estimate an undirected graph {G} from random vectors {Y_1,\ldots, Y_n \sim P}. Here, each {Y_i = (Y_i(1),\ldots, Y_i(D))\in\mathbb{R}^D} is a vector with {D} coordinates, or features.

The graph {G} has {D} nodes, one for each feature. We put an edge between nodes {j} and {k} if the partial correlation {\theta_{jk}\neq 0}. The partial correlation {\theta_{jk}} is

\displaystyle  \theta_{jk} = - \frac{\Omega_{jk}}{\sqrt{\Omega_{jj}\Omega_{kk}}}

where {\Omega = \Sigma^{-1}} and {\Sigma} is the {D\times D} covariance matrix for {Y_i}.

At first sight, the problem is easy. We estimate {\Sigma} with the sample covariance matrix

\displaystyle  S = \frac{1}{n}\sum_{i=1}^n (Y_i - \overline{Y})(Y_i - \overline{Y})^T.

Then we estimate {\Omega} with {\hat\Omega = S^{-1}}. We can then use the bootstrap to get confidence intervals for each {\theta_{jk}} and then we put an edge between nodes {j} and {k} if the confidence interval excludes 0.

But how close is the bootstrap distribution {\hat F} to the true distribution {F} of {\hat\theta_{jk}}? Our paper provides a finite sample bound on {\sup_t | \hat F(t) - F(t)|}. Not surprisingly, the bounds are reasonable when {D < n}.

What happens when {D>n}? In that case, estimating the distribution of {\hat\theta_{jk}} is not feasible unless one imposes strong assumptions. With these extra assumptions, one can use lasso-style technology. The problem is that, the validity of the inferences then depends heavily on strong assumptions such as sparsity and eigenvalues restrictions, which are not testable if {D_n > n}. Instead, we take an atavistic approach: we first perform some sort of dimension reduction followed by the bootstrap. We basically give up on the original graph and instead estimate the graph for a dimension-reduced version of the problem.

If we were in a pure prediction framework I would be happy to use lasso-style technology. But, since we are engaged in inference, we take this more cautious approach.

One of the interesting parts of our analysis is that it leverages recent work on high dimensional Berry-Esseen theorems namely the results by Victor Chernozhukov, Denis Chetverikov and Kengo Kato which can be found here.

The whole issue of what assumptions are reasonable in high-dimensional inference is quite interesting. I’ll have more to say about the role of assumptions in high dimensional inference shortly. Stay tuned. In the meantime, if I have managed to spark your interest, please have a look at our paper.