"Researchers are trying to fix the problem. They’re encouraging more sharing of data sets and urging each other to preregister their hypotheses—declaring what they intend to find and how they intend to find it. The idea is to cut down on the statistical shenanigans and memory-holing of negative results that got the field into this mess. No more collecting a giant blob of data and then combing through it for a publishable outcome, a practice known as “HARKing”—hypothesizing after results are known.

And self-appointed teams are even going back through old work, manually, to see what holds up and what doesn’t. That means doing the same experiment again, or trying to expand it to see if the effect generalizes. It’s a slog—boring, expensive, and time-consuming. To the Defense Advanced Research Projects Agency, the Pentagon’s mad-science wing, the problem demands an obvious solution: Robots.

A Darpa program called Systematizing Confidence in Open Research and Evidence—yes, SCORE—aims to assign a “credibility score” (see what they did there) to research findings in the social and behavioral sciences, a set of related fields to which the reproducibility crisis has been particularly unkind. In 2017, I called the project a bullshit detector for science, somewhat to the project director’s chagrin. Well, now it’s game on: Darpa has promised $7.6 million to the Center for Open Science, a nonprofit organization that’s leading the charge for reproducibility. COS is going to aggregate a database of 30,000 claims from the social sciences. For 3,000 of those claims, the Center will either attempt to replicate them or subject them to a prediction market—asking human beings to essentially bet on whether the claims would replicate or not. (Prediction markets are pretty good at this; in a study of reproducibility in the social sciences last summer, for example, a betting market and a survey of other researchers performed about as well as actual do-overs of the studies.)..."



