Fishing for cherries

Statistical Modeling, Causal Inference, and Social Science 2013-03-15

Someone writes:

I’m currently trying to make sense of the Army’s preliminary figures on their Comprehensive Soldier Fitness programme, which I found here. That report (see for example table 4 on p.15) has only a few very small “effect sizes” with pthis entry in your blog. My question is, does that imply that when one has a large N – and, thus, presumably, large statistical power – one should systematically reduce alpha as well? Is there any literature on this? Does one always/sometimes/never need to take Lindley’s “paradox” into account?

And a supplementary question: can it ever be legitimate to quote a result as significant for one DV (“Social fitness” in table 4) when it is simply (cf. p.10) an amalgam of the four other DVs listed immediately underneath it, of which one (“Friendliness”) has a significance of

PS: If you find this interesting, I wonder if you might want to make a blog post out of it. CSF is a $140 million programme that has been controversial for all sorts of reasons. There’s a whole bunch of other stuff that about this process, such as their use of MANOVA at T1 and “ANOVA with blocking” at T2, that makes me think they are on a fishing expedition for cherries to pick. For example, the means in some of the tables are “estimated marginal means” (MANOVA output), the SD values are in fact SEMs, and I have no idea why they are expressing effect sizes as partial eta squared when they only have one independent variable. But I’m a complete newbie to stats, so I’m probably missing a lot of stuff.

My reply: I followed the link. That report is almost a parody of military bureaucracy! But the issues you raise are important. The people doing this research have real problems for which there are no easy solutions. In short: none of the effects is zero and there’s gotta be a lot of variation across people and across subgroups of people. Also, there are multiple outcomes. It’s a classic multiple comparisons situation, but the null hypothesis of zero effects (which is standard in multiple-comparisons analyses) is clearly inappropriate. Multilevel modeling seems like a good idea but it requires real modeling and real thought, not simply plugging the data into an 8-schols program.

We have seen the same issues arising in education research, another area with multiple outcomes, treatments varying across predictors, and small aggregate effects.