Books to Read While the Algae Grow in Your Fur, April 2020
Three-Toed Sloth 2020-05-06
Summary:
Attention conservation notice: I have no taste, and no qualifications to opine on American civil rights law, literary criticism, the psychology of reading, the literary ambitions of Karl Marx, Islamic history, or even, really, mathematical models of epidemic disease.
- Lisa Sattenspiel with Alun Lloyd, The Geographic Spread of Infectious Diseases: Models and Applications [JSTOR]
- This is an fine shorter book (~300 pp.) on mathematical descriptions and models of how contagious diseases spread over space and time. It alternates between chapters which lay out classes of models and mathematical tools, and more empirical chapters which cover specific diseases. The models start with the basic, a-spatial SIR model and its variants (ch. 2), then model expansions which run an SIR model at each spatial location and couple them (ch. 4), network models (ch. 6), and approaches from geographers, emphasizing map-making and regression-style modeling (ch. 8), which is where I learned the most. The introductory chapters do a good job of laying out the basic concepts and behavior of epidemic models, so in principle no previous background is required to read this, beyond upper-level-undergrad or beginning-grad mathematical competence. (There are no proofs, and only really elementary derivations, but the reader is expected to be familiar with differential equations, basic probability, and the idea of an eigenvalue of a matrix.)
- The applications chapters cover, in order, influenza (especially seasonal influenza but also the 20th century pandemics), measles, foot-and-mouth disease, and SARS. A nice feature of the latter two chapters is a careful, somewhat skeptical look at how much policy-makers relied on mathematical models, and the extent to which such reliance helped.
- This was on my stack to read as preparation for revising the "Data Over Space and Time" course, but prioritized by recent events; recommended.
- Peter Diggle, Kung-Yee Liang and Scott L. Zeger, Analysis of Longitudinal Data
- A standard text on longitudinal data analysis for lo these many years now, and deservedly so. The following notes are based on (finally) reading my copy of the first, 1994 edition all the way through; I have not had the chance to read the second edition (2002, paperback 2013).
- What statisticians (especially biostatisticians) call "longitudinal" data is basically what econometricians call "panel" data: there are a bunch of people (or animals or factories, generically "units"), and we collect the same information for each of them, but we do it repeatedly, over time. (Different units may be measured at different times, and not every variable may be measured for every unit at every time.) We typically assume no interaction between the units. In symbols, then, for unit $i$ at time $t_{ij}$ we measure $Y_{ij}$ (which may be a vector, even a vector with some missing coordinates), there are covariates $X_{ij}$ (which may be constant over time within a unit or not), and we assume independence between $Y_{i}$ and $Y_{j}$.
- From the point of view of someone trained to approach everything as a regression of independent variables on dependent variables, here a regression of $Y_{ij}$ on $X_{ij}$, longitudinal data is a tremendous headache, because all the observations within a unit are dependent, even when we condition on the independent variable. In the simple linear-model situation, we'd write $Y_{ij} = \beta_0 + \beta \cdot X_{ij} + \epsilon_{ij}$, but with the caveat that $\epsilon_{ij}$ and $\epsilon_{ik}$ are not independent, but have some covariance. (Generalized linear models are also treated extensively here, to handle binary and count data.) This means that the usual formulas for standard errors, confidence intervals, etc., are all wrong. This point of view, which the book calls that of "marginal modeling", is the implied starting point for the reader, and I think it's fair to say it gets most of the attention. The key is to come up with estimates of the covariance of the $\epsilon$s, perhaps starting from a very rough "working" covariance model, or even from looking at the residuals of a model which ignores covariance in the first place, and then using weighted least squares and robust standard errors to improve the inferences. There's a chapter on why ANOVA is not enough, which I suspect arose from students wanting to know