Why big effects are more important than small effects
Statistical Modeling, Causal Inference, and Social Science 2013-03-15
The title of this post is silly but I have an important point to make, regarding an implicit model which I think many people assume even though it does not really make sense.
Following a link from Sanjay Srivastava, I came across a post from David Funder saying that it’s useful to talk about the sizes of effects (I actually prefer the term “comparisons” so as to avoid the causal baggage) rather than just their signs. I agree, and I wanted to elaborate a bit on a point that comes up in Funder’s discussion. He quotes an (unnamed) prominent social psychologist as writing:
The key to our research . . . [is not] to accurately estimate effect size. If I were testing an advertisement for a marketing research firm and wanted to be sure that the cost of the ad would produce enough sales to make it worthwhile, effect size would be crucial. But when I am testing a theory about whether, say, positive mood reduces information processing in comparison with negative mood, I am worried about the direction of the effect, not the size (indeed, I could likely change the size by using a different manipulation of mood, a different set of informational stimuli, a different contextual setting for the research — such as field versus lab). But if the results of such studies consistently produce a direction of effect where positive mood reduces processing in comparison with negative mood, I would not at all worry about whether the effect sizes are the same across studies or not, and I would not worry about the sheer size of the effects across studies. . . .
I’ve added the emphasis in the quote above to point to what I see as its key mistake, which is an implicit model in which effects are additive and interactions are multiplicative. My impression is that people think this way all the time: an effect is positive, negative, or zero, and if it’s positive, it will have different degrees of positivity depending on conditions (with a “pure” measurement having larger effects than an “attenuated” measurement). You can see this attitude in the above quote. There seems to be an idea, when considering true effects or population comparisons (that is, forgetting for a moment about sampling or estimation uncertainty), that there is a high fence at zero, stopping positive effects from becoming negative or vice versa.
This high fence doesn’t make sense to me. If main effects are additive, so can interactions. If “a different manipulation of mood, a different set of informational stimuli, a different contextual setting for the research” can change the magnitude of an effect, I think it can shift the sign as well. One reason not to trust effects of magnitude 0.001 is that they can be fragile; there’s no guarantee the effect might not be -0.002 next time around. And I’m not talking about sampling variability here, I’m talking about interactions, that is, real variability in the underlying effect or comparison. This idea is familiar to those of us who use multilevel models but it can be missing in some standard presentations of statistics in which parameters are estimated one at a time without interest in their variation.
P.S. Funder’s post is fine too; he focuses on a different point, which is how to assess the relevance of correlations such as 0.3 which are too large to be noise but too small to be overwhelming.