What to make of reported statistical analysis summaries: Hear no distinction, see no ensembles, speak of no non-random error.
Statistical Modeling, Causal Inference, and Social Science 2017-08-31
Recently there has been a lot of fuss about the inappropriate interpretations and uses of p-values, significance tests, Bayes factors, confidence intervals, credible intervals and almost anything anyone has ever thought of. That is to desperately discern what to make of reported statistical analysis summaries of individual studies – largely on their own. Including a credible quantification of the uncertainties involved. Immediately after a study has been completed, or soon after – by the very experimenters who were involved in carrying it out. Perhaps along with consultants or collaborators with hopefully somewhat more statistical experience. So creators, perpetrators, evaluators, jurors and judges all biased to a hopeful sentence of many citations and continued career progression.
Three things that do not seem to be getting adequate emphasis in these discussions of what to make of reported statistical analysis summaries are – 1. failing to distinguish what something is versus what to make of it, 2. ignoring the ensemble of similar studies (completed, ongoing and future) and 3. neglecting important non-random errors. This does seem to be driven by academic culture and so it won’t be easy to change. As Nazi Reich Marshal Hermann Goring once quipped? “Whenever I hear the word culture, I want to reach for my pistol!”.
What is meant by “what to make of” a reported statistical analysis summary, its upshot or how it should affect our future actions and thinking as opposed to simply what it is? CS Peirce called this the pragmatic grade of clarity of a concept. To him it was the third grade that needed to be proceeded by two other grades, the ability to recognise instances of a concept and the ability to define it. For instance with regard to p_values, the ability to recognise what is or is not a p_value, the ability to define a p_value and the ability to know what to make of a p_value in a given study. Its the third that is primary and paramount to “enabling researchers to be less misled by the observations” and thereby discern what to make for instance of a p_value. Importantly it also always remains open ended.
A helpful quote from Peirce might be “. . . there are three grades of clearness in our apprehensions of the meanings of words. The first consists in the connexion of the word with familiar experience. . . . The second grade consists in the abstract definition, depending upon an analysis of just what it is that makes the word applicable. . . . The third grade of clearness consists in such a representation of the idea that fruitful reasoning can be made to turn upon it, and that it can be applied to the resolution of difficult practical problems.” (CP 3.457, 1897)
Now almost all the teaching in statistics is about the first two and much (most) of the practice of statistics skips over the third with the usual, this is the p_value in your study and don’t forget its actual definition. If you do people will have the right to laugh at you. But all the fuss here is or should be about – what should be made of this p-value, or other statistical analysis summary. How should it affect our future actions and thinking? Again, that will always remains open ended.
Additionally, ignoring the ensemble of similar studies makes that task unduly hazardous (except in emergency or ethical situations where multiple studies are not possible or can’t be waited for). So why are most statistical discussions framed with reference to a single solitary study with the expectation that, if done adequately, one should be able to discern what to make of it and adequately quantify the uncertainties involved. Why, why, why? As Mosteller and Tukey put it in their chapter Hunting Out the Real Uncertainty of Data analysis and Regression way back in 1977 – you don’t even have access to the real uncertainty with just a single study.
Unfortunately, when many do consider the ensemble (i.e. do an meta-analysis) they almost exclusively obsess about combining studies to get more power paying not much more than lip service to assessing the real uncertainty (e.g. doing an horribly under powered test of heterogeneity or thinking a random effect will adequately soak up all the real differences). Initially the first or second sentence of the wiki entry on meta-analysis was roughly “meta-analysis has the capacity to contrast results from different studies and identify patterns among study results, sources of disagreement among those results, or other interesting relationships that may come to light in the context of multiple studies.” That progressively got moved further and further down and prefaced with “In addition to providing an estimate of the unknown common truth” (i.e. in addition to this amazing offer you will also receive…). Why in the world would you want an estimate of the unknown common truth without some credible assessment that things are common?
Though perhaps most critical of all, not considering important non-random error in discerning what to make of a p-value or other statistical analysis summary makes no sense. Perhaps the systematic error (e.g. confounding) is much larger than the random error. Maybe the random error is so small relative to systematic that the random error can be safely ignored (i.e. no need to even calculate a p_value)?
Earlier I admitted these oversights are culturally driven in academia and reaching for one’s pistol is almost never a good idea. Academics really want (or even feel they absolutely need) to make something out of an individual study on its own (especially if its theirs). Systematic errors are just hard too deal with adequately for most statisticians and usually require domain knowledge statisticians and even study authors won’t have. Publicly discerning what to make of p-value or other statistical analysis summary is simply too risky. It is open ended and in some sense you will always fall short and others might laugh at you.
Too bad always being wrong (in some sense) seems so wrong.
The post What to make of reported statistical analysis summaries: Hear no distinction, see no ensembles, speak of no non-random error. appeared first on Statistical Modeling, Causal Inference, and Social Science.