The both both of science reform

Statistical Modeling, Causal Inference, and Social Science 2021-04-01

This is Jessica. I pay some attention to what gets discussed in methodological/statistical reform research and discussions, and I’m probably not the only one who’s watched as the movement (at least in psychology) seems to be getting more self-aware recently. The other day I jotted down what strike me as some yet-unresolved tensions worth reflecting on:

The need for more rigor in specifying and evaluating proposed reforms, versus the simultaneous need to consider aspects of doing good science that seem harder to formalize and tangential to the dominant narratives. For example, developing the ability to be honest with oneself about how flawed your work is and to value your progress in getting better at doing science over the external outcomes. These types of concerns seem very non-scientific, but in my experience are just as crucial. They might correlate with interest in methodological reform, but how to instill them without assuming intrinsic interest in reforming one’s practices?

The fact that science reform seems to inevitably imply stretching or growing our tolerance for uncertainty in consuming results, versus the fact that many of the core challenges that are purported to have led to the current “crisis”, e.g,. heavy reliance on NHST, researcher degrees of freedom, dichotomization in reporting, etc. seem pretty clearly related to bounded cognition. By which I mean all the evidence that suggests that to make progress in thinking and communicating, we can only handle so many shades of gray at once. There’s been a fair amount of focus on finding the best (i.e., most complete, transparent) representations of evidence. But the strategies that might be applied in consuming or creating these are often not discussed much. I personally find this one interesting, and see it a lot in the kinds of topics I work on, like uncertainty visualization, and other proposals about how to do better science. Some examples: how concerned should we be that something like a multiverse analysis might overwhelm a reviewer such that judgment calls are made using heuristics, perhaps even more so than they would have been without it? How much mental NHSTing happens even when the researchers have intentionally decided not to NHST, and are there ways to keep that NHSTing in check?

Somewhat related to that last point, previously I wrote something about signalling in science reform, where under certain conditions the simple fact of whether one has used certain practices or not may be used to make value judgments about the work. This isn’t entirely bad of course (transparency is good and undoubtedly correlated with more careful research, at least at this point in time!) But I think there’s room for more focus on how people use and consume some of the new practices being espoused, since one thing we can be pretty sure of is that we all like to mentally substitute simple things for hard things.

The need to acknowledge the various ways in which a diversity of methods and intellectual perspectives is naturally good for scientific progress, while coming to some agreement in terms of what are better versus worse methods or mindsets for doing science, or more or less replicable areas of research. There have been several simulation-based studies showing how different types of methodological and ideological diversity can have positive effects. But, many common science reform narratives imply there’s a certain set of things that are definitively best.

Similarly, the need to agree upon and advocate for practical recommendations (like transparency, preregistration) that are well motivated in the current situation, while not forgetting that critical evaluations of proposals will be crucial if the movement is to stay intellectually malleable, i.e., as evidence or theory about limitations of proposals grows.

The need for oversight through, for instance, stricter “auditing” of materials in paper reviewing, and for blunt, outside criticism to become more accepted, versus the facts that a) centralization of knowledge and resources to enact audits has been difficult to achieve, b) in the absence of a trusted, centralized force to audit, retract, etc., these attempts may be more likely to come from those already in power and more likely to target those who well, seem like easy targets, such as scientists working on problems that are already marginalized, taking unfavored (at least within academia) perspectives, or coming from backgrounds that are already marginalized.

Finally, the need to instill awareness of analysis degrees of freedom and along with that to continue to theorize about the underspecified distinction between EDA and CDA, versus the need to redirect some focus to “design freedom,” which seem harder to theorize about and create guidelines around. Degrees of freedom in experiment design came up in a comment on a recent post, and I think of it as all the kinds of tweaking one can do of the experimental setup, prompts, etc to get desired “evidence” for a hypothesis. With enough experience and ingenuity, I tend to think a good experimenter could probably design a compelling experiment to demonstrate many things of questionable real world importance. True, an experimental paper has to clearly describe the tasks and interfaces, but when you look at the process of getting to whatever the final set of tasks and interfaces is, the researchers have many many degrees of freedom. Combined with a hypothesis that isn’t clearly testable to being with, can look a lot like an even bigger garden of forking paths. It reminds me of how back when I learned experiment design, by taking courses and working with economists, the more I came to understand their experiments, the more I found myself thinking, Wow, this is not what I thought it was. These were researchers doing what I would call solid empirical work, but there was a certain cleverness of getting the design to work that struck me as more art than science.

What am I missing?

P.S. The weird title comes from a phrase Ezra Pound used to refer to situations having a dual nature, e.g., the need to be both the poet and the warrior, actor and the critic. The phrase has stayed with me as a reference to duality as the solution. So I guess the message of this post is about simultaneously accepting and rejecting what we know about how to improve science.