The harm done by tests of significance

Statistical Modeling, Causal Inference, and Social Science 2013-03-25

After seeing this recent discussion, Ezra Hauer sent along an article of his from the journal Accident Analysis and Prevention, describing three examples from accident research in which null hypothesis significance testing led researchers astray. Hauer writes:

The problem is clear. Researchers obtain real data which, while noisy, time and again point in a certain direction. However, instead of saying: “here is my estimate of the safety effect, here is its precision, and this is how what I found relates to previous findings”, the data is processed by NHST, and the researcher says, correctly but pointlessly: “I cannot be sure that the safety effect is not zero”. Occasionally, the researcher adds, this time incorrectly and unjustifiably, a statement to the effect that: “since the result is not statistically significant, it is best to assume the safety effect to be zero”. In this manner, good data are drained of real content, the direction of empirical conclusions reversed, and ordinary human and scientific reasoning is turned on its head for the sake of a venerable ritual. As to the habit of subjecting the data from each study to the NHST separately, as if no pre- vious knowledge existed, Edwards (1976, p. 180) notes that “it is like trying to sink a battleship by firing lead shot at it for a long time”.

Indeed, when I say that a Bayesian wants other researchers to be non-Bayesian, what I mean is that I want people to give me their data or their summary statistics, unpolluted by any prior distributions. But I certainly don’t want them to discard all their numbers in exchange for a simple yes/no statement on statistical significance.

P-values as data summaries can be really misleading, and unfortunately this sort of thing is often encouraged (explicitly or implicitly) by standard statistics books.