People have needed rituals to turn data into truth for many years. Why would we be surprised if many people now need procedural reforms to work?

Statistical Modeling, Causal Inference, and Social Science 2024-04-08

This is Jessica. How to weigh metascience or statistical reform proposals has been on my mind more than usual lately as a result of looking into and blogging about the Protzko et al. paper on rigor-enhancing practices. Seems it’s also been on Andrew’s mind

Like Andrew, I have been feeling “bothered by the focus on procedural/statistical ‘rigor-enhancing practices’ of ‘confirmatory tests, large sample sizes, preregistration, and methodological transparency’” because I suspect researchers are taking it to heart that these steps will be enough to help them produce highly informative experiments. 

Yesterday I started thinking about it via an analogy to writing. I heard once that if you’re trying to help someone become a better writer, you should point out no more than three classes of things they’re doing wrong at one time, because too much new information can be self-defeating. 

Imagine you’re a writing consultant and people bring you their writing and you advise them on how to make it better. Keeping in mind the need to not overwhelm, initially maybe you focus on the simple things that won’t make them a great writer, but are easy to teach. “You’re doing comma splices, your transitions between paragraphs suck, you’re doing citation wrong.” You talk up how important these things are to get right to get them motivated, and you bite your tongue when it comes to all the other stuff they need help with to avoid discouraging them. 

Say the person goes away and fixes the three things, then comes back to you with some new version of what they’ve written. What do you do now? Naturally, you give them three more things. Over time as this process repeats, you eventually get to the most nuanced stuff that is harder to get right but ultimately more important to their success as a writer.

But for this approach to work presupposes that your audience either intrinsically cares enough about improving to keep coming back, or that they have some outside reason they must keep coming back, like maybe they are a Ph.D. student and their advisor is forcing their hand. What if you can never be sure when someone walks in the door that they will come back a second time after you give your advice? In fact, what if the people who really need your help with their writing are bad writers because they fixated on the superficial advice they got in middle school or high school that boils good writing down to a formula, and considered themselves done? And now they’re looking for the next quick fix, so they can go back to focusing on whatever they are actually interested in and treating writing as a necessary evil? 

Probably they will latch onto the next three heuristics you give them and consider themselves done. So if we suspect the people we are advising will be looking for easy answers, it seems unlikely that we are going to get them there using the approach above where we give them three simple things and we talk up the power of these things to make them good writers. Yet some would say this is what mainstream open science is doing, by giving people simple procedural reforms (just preregister, just use big samples, etc) and talking up how they help eliminate replication problems.  

I like writing as an analogy for doing experimental social science because both are a kind of wicked problem where there are many possible solutions, and the criteria for selecting between them are nuanced. There are simple procedural things that are easier to point out, like the comma splices or lacking transitions between paragraphs in writing, or not having a big enough sample or invalidating your test by choosing it posthoc in experimental science. But avoiding mistakes at this level is not going to make you a good writer, just like enacting simple procedural heuristics are not going to make you a good experimentalist or modeler. For that you need to adopt an orientation that acknowledges the inherent difficulty of the task and prompts you to take a more holistic approach 

Figuring out how to encourage that is obviously not easy. But one reason that starting with the simple procedural stuff (or broadly applicable stuff, as Nosek implies the “rigor-enhancing practices” are), seems insufficient to me is that I don’t necessarily think there’s a clear pathway from the simple formulaic stuff to the deeper stuff, like the connection between your theory and what you are measuring and how you are measuring it and how you specify and select among competing models. I actually think things make more sense to go the opposite way, from why inference from experimental data is necessarily very hard as a result of model misspecification, effect heterogeneity, measurement error etc. to the ingredients that have to be in place for us to even have a chance, like sufficient sample size and valid confirmatory tests. The problem is that one can understand the concept of preregistration or large sample size while still having a relatively simple mental model of effects as real or fake and questionable research practices as the main source of issues. 

In my own experience, the more I’ve thought about statistical inference from experiments over the years, the more seriously I take heterogeneity and underspecification/misspecification, to the point that I’ve largely given up doing experimental work. This is an extreme outcome of course, but I think we should expect that the more one recognizes how hard the job really is, the less likely one is to firehose the literature in one’s field with a bunch of careless dead-in-the-water style studies. As work by Berna Devezer and colleagues has pointed out, open science proposals are often subject to the same kinds of problems such as overconfident claims and reliance on heuristics that contributed to the replication crisis in the first place. This solution-ism (a mindset I’m all too familiar with as a computer scientist) can be counterproductive.