The worst research papers I’ve ever published

Statistical Modeling, Causal Inference, and Social Science 2025-10-09

Following up on this recent post, I’m preparing something on weak research produced by Nobel prize winners.

Just to be fair, I thought I should lead this off with a post on weak research produced by . . . me!

Putting together this list wasn’t as easy as I’d thought. I’ve published hundreds of papers and I like almost all of them! But I found a few that I think it’s fair to say are pretty bad.

These papers have coauthors, but I blame me, not them, for the bad stuff.

The absolute worst

[1993] Characterizing a joint probability distribution by conditionals. Journal of the Royal Statistical Society B 55, 185-188. (Andrew Gelman and T. P. Speed)

What can I say? The entire contribution of this paper is a “theorem” that turned out to be false. Take away the theorem, and there’s nothing there! I wouldn’t call this “research misconduct”; we just made a mistake. I guess my coauthor and I share the blame on this one: the result was my idea, but I ran it by my coauthor and he didn’t notice the problem.

Not horrible, exactly, but ultimately not useful contributions to the literature

[2015] Simulation-efficient shortest probability intervals. Statistics and Computing 25, 809-819. (Ying Liu, Andrew Gelman, and Tian Zheng)

I liked the idea behind this paper, but we never had a clean implementation. Really it was a half-done project that we should not have pushed out the door. I don’t blame my collaborators here: I was pushing to get this project done.

[2014] Multiple imputation for continuous and categorical data: Comparing joint and conditional approaches. Political Analysis 22, 497-519. (Jonathan Kropko, Ben Goodrich, Andrew Gelman, and Jennifer Hill)

[2014] On the stationary distribution of iterative imputations. Biometrika 101, 155–173. (Jingchen Liu, Andrew Gelman, Jennifer Hill, Yu-Sung Su, and Jonathan Kropko)

These two papers are the product of a research project with Jennifer on missing-data imputation. I still think we had a good perspective on the topic, but we got stuck on various details and never ended up with something as general as we wanted. We created an R package called “mi” but it never quite got over the hump. And we published these two articles. The topics are important–theoretical conditions for the performance of iterative imputation, and empirical comparison of different methods of multivariate imputation–but, ultimately, neither paper delivered much. Again, I blame myself as much as anyone else. This was one of those projects where we leaned on each other, nobody really took the lead, and none of us did the work to really push things through. This can happen with collaboration.

[2007] Evaluation of multilevel decision trees. Journal of Statistical Planning and Inference 137, 1151-1160. (Erwann Rogard, Andrew Gelman, and Hao Lu)

This was another one where I had a cool idea but we ended up with a kind of half-assed treatment. As with the other papers above, this one isn’t completely bad–it’s readable and it addresses a real statistics problem–but it deservedly dropped without a trace.

[2006] Weighted classical variogram estimation for data with clustering. Technometrics 49, 184-194. (Cavan Reilly and Andrew Gelman)

Yet another example of a cool idea that’s kind of orphaned in this paper, in that we developed a method that nobody’s ever gonna use. OK, not “nobody,” exactly–according to Google scholar the paper has 21 citations–but not a lot of use. If you’re gonna do classical variogram estimation, then I think some weighting is a good idea, and the project has interesting overlap of ideas of experimental design, sampling, and spatial statistics, but I just don’t know how much classical variogram estimation is done anymore, and we weren’t connected enough to that world for this work to have much impact.

[2006] Output assessment for Monte Carlo simulations via the score statistic. Journal of Computational and Graphical Statistics 15, 178-206. (Yanan Fan, Stephen Brooks, and Andrew Gelman)

Maybe there’s something useful in this paper, but I kinda doubt it. It’s not a bad paper; it’s just sitting there doing nothing.

[2003] A method for estimating design-based sampling variances for surveys with weighting, poststratification, and raking. Journal of Official Statistics 19, 133-151. (Hao Lu and Andrew Gelman)

Another one that we just pushed out for no good reason. Again, there’s something there, but it’s not clear who is the intended audience. It’s possible that at some point I’ll return to the topic, in which case this article could retroactively become useful.

Some papers not included in the above list

[2010] Adaptively scaling the Metropolis algorithm using expected squared jumped distance. Statistica Sinica 20, 343-364. (Cristian Pasarica and Andrew Gelman)

Sometimes we write a paper that isn’t itself useful but it includes insights that get used in more important later work. An example is the above paper, which we wrote when we were trying to get better adaptive Metropolis algorithms. We and others were tuning based on acceptance rate, but we realized that it made more sense to just directly target expected squared jumped distance. Shortly after this paper appeared, we (and, eventually, everybody else) switched to Hamiltonian Monte Carlo, but then the ESJD idea inspired the NUTS algorithm. So this paper was indirectly very important, even if it was not useful at the time.

[2015] Bayesian nonparametric weighted sampling inference. Bayesian Analysis 10, 605-625. (Yajuan Si, Natesh Pillai, and Andrew Gelman)

This one represented a lot of work, and the result is a method that neither we nor anyone else will ever use–but it was a helpful step along the way to our more recent work on model-based analysis of sampling weights.

[2008] Should the Democrats move to the left on economic policy? Annals of Applied Statistics 2, 536-549. (Andrew Gelman and Cexun Jeffrey Cai)

[2006] Validation of software for Bayesian models using posterior quantiles. Journal of Computational and Graphical Statistics 15, 675-692. (Samantha Cook, Andrew Gelman, and Donald B. Rubin)

[1996] Efficient Metropolis jumping rules. In Bayesian Statistics 5, ed. J. Bernardo et al., 599-607. Oxford University Press. (Andrew Gelman, Gareth O. Roberts, and Walter R. Gilks)

The above three had mistakes that were serious enough that I issued correction notices–but I don’t actually think they were bad papers! In each case, the errors were real and the corrections were important, but the parts that were not in error were fine. And two of those papers had a lot of influence, in a good way.

[1998] Improving upon probability weighting for household size. {\em Public Opinion Quarterly} {\bf 62}, 398–404. (Andrew Gelman and Thomas C. Little)

I like this paper a lot, and I would not consider including it in any list of bad papers or worst papers. I mention it here only because it’s unashamedly minor work, a small paper on a small, well-defined problem. That’s fine. It’s good to do minor work too, both for its own sake and as part of building up a bigger picture of the world. Our books and our statistical understanding are built upon thousands of minor problems we’ve tackled over the years, and we could not have solved the big problems without experience on the small ones.

Summary

Perhaps unsurprisingly, I didn’t find anything that I’d call “research misconduct” in my published papers. As noted, many of my publications are minor–that’s just the way research goes, and I think it’s just fine to publish a short paper making a small but interesting point. But I did my best above to find some of my papers that I don’t like very much.

P.S. Also relevant is this post, I love this paper but it’s barely been noticed, which lists a whole bunch of my published articles that I absolutely love, even if they’ve had close to zero influence.