Getting a pass on evaluating ways to improve science
Statistical Modeling, Causal Inference, and Social Science 2024-09-19
This is Jessica. I was thinking recently about how doing research on certain topics related to helping people improve their statistical practice (like data visualization, or open science) can seem to earn researchers a free pass where we might otherwise expect to see rigorous evaluation. For example, I’m sometimes surprised when I see researchers from outside the field getting excited about studies on visualization that I personally wouldn’t trust. It’s like there’s a rosy glow effect when they realize that there is actually research being done on such topics. Then there is open science research, which proposes interventions like preregistration or registered reports, but has been criticized for failing to rigorously motivate and evaluate its claims.
Some of it is undoubtedly selective attention, where we’re less inclined to get critical when the goals of the research align with something we want to believe. Maybe there’s also an implicit tendency to trust that if researchers are working on improving data analysis practices and eliminating sources of bias, they must understand data and statistics well enough themselves not to make dumb mistakes. (Turns out this is not true).
But on the more extreme end, there’s a belief that the goal of these procedures, whether its “improving science” in the open science case or “improving learning and decision-making from data” in the visualization case, are too hard to evaluate in the usual ways. In visualization research for example, this sometimes manifests as pushback to anything perceived as too logical positivist. Some argue that to really understand the impacts of the visualization or data analysis tools we’re developing, we need to use ethnographic methods like embedding ourselves in the domain as participant observers.
Arguments against controlled evaluation also pop up in meta-science discussions. For example, Daniel Lakens recently published a blog post that argues that science reforms like preregistration are beyond empirical evidence, because running the sort of long-term randomized controlled experiments to produce causal evidence of their effect is prohibitive. He references Paul Meehl’s idea of cliometric meta-theory, the long term study of how theories affect scientific progress.
Lakens however is not suggesting a more ethnographic or interpretivist approach to understand the implications of reforms like preregistration. He argues instead that rather than seeking empirical evidence, we should recognize the distinction between empirical and logical justification:
An empirical justification requires evidence. A logical justification requires agreement with a principle. If we want to justify preregistration empirically, we need to provide evidence that it improved science. If you want to disagree with the claim that preregistration is a good idea, you need to disagree with the evidence. If we want to justify preregistration logically, we need to people to agree with the principle that researchers should be able to transparently evaluate how coherently their peers are acting (e.g., they are not saying they are making an error controlled claim, when in actuality they did not control their error rate).
In other words, if we think it’s important to evaluate the severity of published claims, then needing to preregister is a logical conclusion.
Logic is obviously an important part of rigor, and I can certainly relate to being annoyed with the undervaluing of logic in fields where evidence is conventionally empirical (I am often frustrated with this aspect of research on interfaces!) But the “if we think it’s important” is critical here, as it points to some buried assumptions. It’s worth noting that the argument that preregistration enables evaluating whether researchers are making error controlled claims depends on a specific philosophy of science based in Mayo’s view of severe testing. While Lakens may have chosen a philosophy of science to embrace as complete, this is not necessarily a universally agreed upon approach for how best to do science (see, e.g., discussions on the blog). And so, the simple logical argument Lakens appears to be going for depends on a much larger scaffold of logic, inferential goals, assumptions, epistemic commitments, values, beliefs, etc.
All this points to a problem with trying to make a logical argument for preregistration, which is that ultimately it’s not really all about “logic.” One might find it useful to adopt in one’s own practice for various reasons, but when it comes to establishing its value for science writ broadly, we end up firmly rooted in the realm of values. Beyond your philosophy of scientific progress, it comes down to the extent to which you think that scientists owe it to others to “prove” that they followed the method they said they did. It’s about how much transparency (versus trust) we feel we owe our fellow scientists, not to mention how committed we are to the idea that lying or bad behavior on the part of scientists are the big limiter of scientific progress. As someone who considers themselves to be highly logical, I don’t expect logic alone to get me very far on these questions.
Overall Lakens’ post leaves me with more questions than answers. I find his argument unsatisfying because it’s not quite clear what exactly he is proposing. It reads a bit as if it’s a defense of preregistration, delivered with an assurance that this logical argument could not possibly be paralleled by empirical evidence: “A little bit of logic is worth more than two centuries of cliometric metatheory.” He argues that all rational individuals who agree with the premise (i.e., share his philosophical commitments) should accept the logical view, whereas empirical evidence has to be “strong enough” to convince and may still be critiqued. And so while he seems to start out by admitting that we’ll never know if science would be better if preregistration was ubiquitous, he ends up concluding that if one shares his views on science, it’s logically necessary to preregister for science to improve. I’m not sure what to do with this. For example, is the implication that logical justification should be enough for journals to require preregistration to publish, or that lack of preregistration should be valid ground for rejecting a paper that makes claims requiring error control?
Elsewhere in his post, Lakens also suggests that empirical evidence is sometimes worth pursuing:
At this time, I do not believe there will ever be sufficiently conclusive empirical evidence for causal claims that a change in scientific practice makes science better. You might argue that my bar for evidence is too high. That conclusive empirical evidence in science is rarely possible, but that we can provide evidence from observational studies – perhaps by attempting to control for the most important confounds, measuring decent proxies of ‘better science’ on a shorter time scale. I think this work can be valuable, and it might convince some people, and it might even lead to a sufficient evidence base to warrant policy change by some organizations. After all, policies need to be set anyway, and the evidence base for most of the policies in science are based on weak evidence, at best.
It strikes me as contradictory to say that it is a flaw that “Psychologists are empirically inclined creatures, and to their detriment, they often trust empirical data more than logical arguments” while at the same time saying it’s ok to produce weak empirical evidence to convince some people.
Reading this, I can’t help but think of the recent NHB paper, ‘High replicability of newly discovered social-behavioural findings is achievable’, which as we previously discussed on the blog, had some flaws including a missing preregistration. I bring it up here because one could question whether the paper’s titular claim really required an empirical study (and previous reviewers like Tal Yarkoni did bring this up). If we do high powered replications of high powered original studies, then of course we should be able to find some effects that replicate. Unless we are taking the extreme position that there are no real effects being studied in psychology. This seems like an example of a logical justification that is less tied to a particular philosophy of science than Lakens’ preregistration argument (though it still requires some consensus, e.g., on what we mean by replicate).
I’m reminded in particular of a social media discussion between Tal Yarkoni and Brian Nosek after the criticism of the NHB paper surfaced, on the question of when it’s ok to produce empirical evidence to justify reforms. Yarkoni argued that it’s wrong to use empirical evidence to try to convince someone who doesn’t understand statistics well that a higher n study is more likely to replicate, while Nosek seemed to be arguing that sometimes it’s appropriate because we should be meeting people where they are at. My personal view aligns with the former: why would you set out to show something that you personally don’t believe is necessary to show? What happens to the “scientific long game” when scientists operate out of a perceived need to persuade with data? Anyway, Lakens has defended the NHB paper on social media, so maybe his post is related to his views on that case.