Update on that politically-loaded paper published in Demography that I characterized as a “hack job”: Further post-publication review
Statistical Modeling, Causal Inference, and Social Science 2024-10-10
Joel Schwartz writes:
After reading your recent post on Langer et al. (2024) (the Demography article that concluded Trump’s election caused an increase in rates of preterm birth and low birth weight among infants of Black, Hispanic, and Asian/Pacific Islander mothers, relative to White mothers), I downloaded the birth data to see if I could reproduce their study. I did everything in R and put together the results in an html document produced with Quarto (I provide a link to that at the end of this email). Below is a summary of key results. The full document has additional analyses, plots, tables, and discussion.I tried to follow Langer et al.’s description of how they selected and cleaned the data, but ended up with a substantially larger set of observations than they reported. After limiting the data to their four maternal race and two maternal nativity groups, and removing missing values for all of the variables they included in their modeling, I ended up with 19.68 million observations, as compared with 15.57 million observations reported by Langer et. al. This is out of 23.89 million total observations to start with for the period November 2012 through November 2018, which is the time range that Langer et al. focused on. I was a bit surprised that Langer et al. ended up removing 35% of the data. I can’t see how their data selection process could remove that much data, but maybe I’m missing something.I ran into some other issues as well. For example, the public use data includes mother’s nativity starting in 2014, so I don’t have mother’s nativity for November 2012 through December 2013 (it can be obtained by requesting restricted-use data, which I haven’t attempted yet). I excluded that time period in the rest of my analysis.Below is a table comparing some key means and percentages for U.S.-born White and Black mothers for the period after Trump’s election for the data Langer et al. used (from Table 1 of their paper) with my version of the data. I had access to the same variables as Langer et al. for the after-election period. As you can see, for the two outcomes, Low Birth Weight and Preterm Birth, my version of the cleaned data had lower rates than reported by Langer et al. (I confirmed that I used the same cutoff values as they did for defining these binary outcomes). For the covariates, most are similar to the values reported by Langer et al. while a few are different. And, as I mentioned above, I ended up with a lot more observations than Langer et al.
Below are Langer et al.’s figures 1 and 2, along with my attempted replications. My versions of the graphs also include data through December 2023 (the most recent year available in the public use files) so you can see what was happening after November 2018. As you can see, my attempted replications of the Langer et al. regression lines often have a similar pattern, but not always (and we wouldn’t expect the pre-Trump-election period to necessarily have the same pattern, since I’m missing the first 14 months of data). Also, as already shown in the table above, the rates of low birth weight and preterm birth are lower in my versions than in Langer et al. Since the data go through December 2023, you can also see that rates of low birth weight continued to increase after the election of Biden, as shown by the generalized additive model (GAM) fit to the full data range.I generated graphs like the ones above, but with Biden’s election as the cutoff date for the linear regression lines (using the exact same data, but limiting the date ranges to the four years before Biden’s election and all of the months available after Biden’s election). After Biden’s election, the probability of low birth weight continued to increase for all sub-groups.