The stories behind our published research from last year

Statistical Modeling, Causal Inference, and Social Science 2026-01-06

It’s January so time to look back on what we’ve done in the past year.  I thought this time I’d give a little story of background on each of our published papers.

First, here’s the list of recently published papers:

Also we completed some new work that’s not yet been published:

We have a lot on deck for 2026, including two new books (Bayesian Workflow and the second edition of the edited Handbook of Monte Carlo) and a bunch of research articles on different topics in statistical modeling, causal inference, and social science.

And you can expect another 600 or so blog posts.

The stories behind the papers

It’s hard for me to pick my favorites among all the recently published papers, so let me just say something about each of them, in the same order they were listed above (roughly inverse chronological order of publication):

  • Adaptive sequential Monte Carlo for structured cross validation in Bayesian hierarchical models:  GH took a couple of my classes and had ideas for a couple of papers, including this one.  This is his idea that I just helped on a small amount.
  • Reanalysis of “Competition and innovation: An inverted-U relationship”:  This was originally a blog post.  The editor of the Journal of Robustness Reports asked me to submit it to them.  It took a couple rounds–the reviewers made some good points!–and fun thing about this journal is you can go to the link and see the entire review process.
  • The ladder of abstraction in statistical graphics:  I absolutely love this paper.  It originated in a talk I gave to Ron Yurko’s statistical graphics class at CMU.  I sent it to the journal and they had some good suggestions for improvement that my friend and colleague Kaiser Fung was able to do.
  • Statistical workflow:  As many of youall know, we’ve been writing a book on Bayesian Workflow–it will appear very soon!  I felt that the workflow concept would be useful in non-Bayesian statistics too, so my colleagues and I organized a special issue of a journal, where we solicited a bunch of articles from theoretical and applied researchers, mostly not Bayesian, to get different perspectives on workflow.  The journal issue is looking good–I guess it will be out soon–and we wrote this short article to lead off that issue.  It’s a short paper and I recommend you take a look!
  • Adjusting for underreporting of child protective services involvement in the Future of Families and Child Wellbeing Study and assessing its empirical implications through illustrative analyses of young adult disconnection:  OK, I don’t have much to say about this one.  It’s by my colleagues at the school of social work at Columbia; I was involved in the survey weighting for the study.
  • A multilevel Bayesian approach to climate-fueled migration and conflict:  Hey, I don’t remember much about this at all!  But, yeah, multilevel modeling, I guess I did something useful here!
  • Artificial intelligence and aesthetic judgment:  This one’s mostly by Jessica and Ari, but I made some contributions throughout, which might be recognized from earlier appearances of some of these ideas on the blog.  It’s published in Sankhya because I think they asked me to submit something for a special issue, and we had this cool paper that we couldn’t figure out what to do with.
  • Discussion of “Statistical exploration of the manifold hypothesis”:  This journal sometimes runs papers with discussions (they did a couple of mine in the past decade), and sometimes I contributed something.  Here I saw a good opportunity to remind people of my thoughts on Tibshirani’s “bet on sparsity” principle and where it can go wrong.
  • Meta-analysis with a single study:  What can I say?  This paper has an awesome title.  Erik, Witold, and I have been meeting weekly and will be coming out with more articles soon on science and meta-science.
  • Normative scientific conflict is unavoidable and should be welcomed:  I can’t remember how, but I came across an announcement of a special issue of the journal Theory and Society on the topic of normative scientific conflict.  I had some things to say on the topic, and this seemed like a good outlet.  I like this paper!  You should read it.
  • Russian roulette:  The need for stochastic potential outcomes when utilities depend on counterfactuals:  This paper has a funny story behind it.  I was contacted by economist Amanda Kowalski about a paper she and her colleagues had written about causal inference.  That paper got me thinking about stochastic potential outcomes and asymmetric utility functions, and I had this idea of demonstrating these ideas in a simple example of Russian roulette.  Jonas joined as a collaborator and clarified a bunch of issues that I’d been sloppy with.  We asked Amanda if she wanted to join in, but she was too busy on her own stuff.  Anyway, the final paper is cool–it’s really clean, and it’s timely because lots of people are interested in going beyond the stable unit treatment value assumption.
  • Multilevel regression and poststratification using margins of poststratifiers:  Improving inference for HIV health outcomes during the COVID-19 pandemic:  Qixuan has been taking the lead on a bunch of papers we’ve been doing, generalizing MRP in various ways.  I think we’re gradually moving toward a bright future of generalizing from sample to population.
  • Statistical graphics and comics:  Parallel histories of visual storytelling:  This is an idea that I’ve had for a while.  I mentioned it in class offhandedly one day, and one of the students told me she was interested in the topic too, so we wrote this article.  It was a true collaboration.  It’s kind of a specialized topic, but I think it should have a potentially wide audience, because lots of people love comics and lots of people love statistical graphics.  We focus on the fascinating question of how it is that these two modes of communication have developed only in the past few centuries, even though they could have been invented much earlier.  This is a sister paper to the “ladder of abstraction” paper mentioned above.
  • Letter to the editor:  Long story here.   Back in 2017, a bigshot professor lied about me in a published article in the journal, Perspectives on Psychological Science.  It was the kind of crap article that should never have been accepted, but at the time that journal was run by a corrupt cabal and they were publishing their friends’ articles essentially without peer review.  At the time I complained to the journal but only got rude responses from the cabal.  But things change.  The journal is now run by civilized people and they published my letter.  Better 8 years late than never at all.  And, no, the people who wrote and published the lies never apologized.  Of course not!  Apologies are for losers, not for members of the prestigious National Academy of Sciences.
  • Rethinking approaches to analysis of global randomised controlled trials:  Epidemiologist Jay Brophy wrote this one.  I had some minor contribution, I can’t remember what.
  • Simulation-based calibration checking for Bayesian computation:  The choice of test quantities shapes sensitivity:  This is the latest version of a long series of papers on SBC, starting with Samantha Cook’s Ph.D. thesis, which we turned into a paper that was published twenty years earlier.  I continue to be interested in the idea of accompanying inferences with simulations that check the computations.
  • Visualizing distributions of covariance matrices:  This paper is nearly 20 years old!  At the time we had difficulty getting it published and we moved on to other things.  Then a couple years ago a journal asked me for an article and I sent them this one.  Unfortunately it was a so-called predatory journal, and one of my coauthors didn’t want our article appearing there.  Fair enough!  But then we thought we might as well get it published, so we sent it off.  I like the paper, and I also like that it’s on the relatively understudied topic of visualizing models (as opposed to visualizing data).
  • Interrogating the “cargo cult science” metaphor:  This topic had been bugging me for a while, and Megan and I wrote this paper which got rejected by a couple of places.  Neither of us really knows how to communicate with researchers in the field of science studies, so it was a hard paper to place, even though it makes a clean point.  Then I happened to hear about the journal Theory and Society, which seemed like the perfect place.  I don’t know if anyone read our article, but I’d like to think that, in the future, people will think twice before talking about cargo cult science.
  • A calibrated BISG for inferring race from surname and geolocation:  This is Philip’s project.  I did help out a bit, but I remain frustrated in that we haven’t been able to frame this in a fully Bayesian or generative way.  We’re continuing to work on the problem, and we have a new method, supercaliBISG, which does even better than caliiBISG, which is an improvement on BISG, which itself has the word “improved” in its title (and also calls itself Bayesian, but it’s not fully so).
  • Hierarchical Bayesian models to mitigate systematic disparities in prediction with proxy outcomes:  I can’t remember exactly where his paper came from, but it was somehow associated with some conversations we had with Sharad Goel and others on statistical measures of disparity.  As is often the case, I think much is gained by framing the problem within a generative model.
  • The piranha problem:  Large effects swimming in a small pond:  This one’s important!  The basic idea–there are probabilistic or statistical constraints regarding patterns of dependence in high dimensions, and this has implications for our understanding of patterns in complex structures–was mine, but the coauthors did most of the rest, to collect some relevant mathematical results.  As I like to say, I think there’s more to be said in this area, maybe some connections to random matrix theory.  Also, the paper has an unusual publication story.  What happened was that a student from the statistics club at San Diego State University asked me to do a remote meeting with them.  I did so–it was a fun conversation–and it turned out that their faculty adviser, Richard Levine, was editor of the Notices of the American Mathematical Society, and was looking for general-interest math papers with applied or statistical relevance.  So I sent him the piranha paper.  Articles in this journal have a strict limit of no more than 10 pages and no more than 20 references.  It was hard for me to keep the references under 20 while demonstrating the applied relevance of the topic, so I cheated and wrote a blog post entitled, “Here are just some of the factors that have been published in the social priming and related literatures as having large effects on behavior,” so that just counted as 1 reference in our paper.  Kind of like if the genie gives you 3 wishes and you spend one of them on more wishes.
  • For how many iterations should we run Markov chain Monte Carlo?:  This is an update of my paper with Kenny Shirley for the new edition of the MCMC handbook.  Charles took the lead on this chapter.