Heller, Heller, and Gorfine on univariate and multivariate information measures

Statistical Modeling, Causal Inference, and Social Science 2014-05-01

Malka Gorfine writes:

We noticed that the important topic of association measures and tests came up again in your blog, and we have few comments in this regard.

It is useful to distinguish between the univariate and multivariate methods. A consistent multivariate method can recognise dependence between two vectors of random variables, while a univariate method can only loop over pairs of components and check for dependency between them.

There are very few consistent multivariate methods. To the best of our knowledge there are three practical methods:

1) HSIC by Gretton et al. (http://www.gatsby.ucl.ac.uk/~gretton/papers/GreBouSmoSch05.pdf)

2) dcov by Szekely et al. (http://projecteuclid.org/euclid.aoas/1267453933)

3) A method we introduced in Heller et al (Biometrika, 2013, 503—510, http://biomet.oxfordjournals.org/content/early/2012/12/04/biomet.ass070.full.pdf+html, and an R package, HHG, is available as well http://cran.r-project.org/web/packages/HHG/index.html).

As proved in Sejdinovic et al. (Annals of Stat, 2013, 2263—2291, http://projecteuclid.org/euclid.aos/1383661264) the first two methods are somewhat equivalent.

As to univariate methods, there are many consistent methods, and some of them are:

1) Hoeffding (http://www.jstor.org/stable/2236021).

2) Various methods based on mutual information estimation.

3) Any of the multivariate methods mentioned above.

4) A new class of methods we recently developed and currently available at http://arxiv.org/abs/1308.1559

Regarding MIC, we fully agree with the criticism of Professor Kinney that “there is no good reason to use MIC”. We would also like to add that since MIC requires exponential time to calculate, what actually is used is an approximation. However, this approximation might not be consistent even in the limited cases for which MIC was proven to be consistent. Therefore, MIC is not on the list above of consistent univariate methods.

Furthermore, in multiple independent power analyses MIC has been found to have lower power than other methods (Simon and Tibshirani, http://arxiv.org/abs/1401.7645; Gorfine et al. http://ie.technion.ac.il/~gorfinm/files/science6.pdf; and de Siqueira Santos et al http://www.ncbi.nlm.nih.gov/pubmed/23962479).

Regarding equitability, we again concur with Kinney and Atwal that contrary to its claim, MIC is not equitable and mutual information is an equitable measure (in the sense defined by Kinney and Atwal). However, we agree with Professor Gelman (if we understood him correctly) that being equitable is not necessarily a good thing and therefore this does not mean that MI should be the only method used to test dependence (especially as it is hard to estimate). In fact, perhaps bias towards “simpler” relationships is a good thing. Of course, one needs to find a good definition of “simpler” and we hope to contribute to that research direction in the future.

On behalf of Ruth Heller, Yair Heller and Malka Gorfine

I have nothing to add here. This is an important topic I don’t know much about, and I’m happy to circulate the ideas of researchers in this area.

The post Heller, Heller, and Gorfine on univariate and multivariate information measures appeared first on Statistical Modeling, Causal Inference, and Social Science.