Regression, Thermostats, Causal Inference: Some Finger Exercises
Three-Toed Sloth 2021-03-27
Summary:
Attention conservation notice: An 800-word, literally academic exercise about an issue in causal inference. Its point is familiar to those in the field, and deservedly obscure to everyone else. Also, too cutesy and pleased with itself by at least half.
I wrote the first version of this for the class where we do causal inference long enough ago that I actually don't remember when --- 2011? 2013? (In retrospect I had probably read Milton Friedman's thermostat analogy but didn't consciously remember it at the time.) Posted now because I've gone over the point with two different people in the last month.
The temperature outside \( (X) \) is a direct cause of the temperature inside my house \( (Y) \). But every morning I measure the temperature, and adjust my heating/cooling system \( (C) \) to try to maintain a constant temperature \( y_0 \). For simplicity, we'll say that all the relations are linear, so \[ \begin{eqnarray} X & \sim & \mathrm{whatever}\\ C|X & \leftarrow & a+bX + \epsilon_1\\ Y|X,C & \leftarrow & X-C + \epsilon_2 \end{eqnarray} \] where \( \epsilon_1 \) and \( \epsilon_2 \) are exogenous, independent, mean-zero noise terms. We can think of \( \epsilon_1 \) as a combination of my sloppiness in measuring the temperature and in tuning the heating/cooling system; \( \epsilon_2 \) is sheer fluctuations.
Exercise: Draw the DAG.
To ensure that the expectation of \( Y \) remains at \( y_0 \), no matter the external temperature, we need \[ \begin{eqnarray} y_0 & = & \Expect{Y|X=x}\\ & = & \Expect{X - a + bX + \epsilon_1 + \epsilon_2|X=x}\\ & = & (1-b)x -a \end{eqnarray} \] Since this must hold for all \( x \), we need \( b=1, a=-y_0 \).
What follows from this?
- Internal temperature \( Y \) is uncorrelated with external temperature \( X \): \[ \begin{eqnarray} \Cov{X,Y} & = & \Expect{XY} - \Expect{X}\Expect{Y}\\ & = & \Expect{X\Expect{Y|X}} - \Expect{X}\Expect{Y}\\ & = & \Expect{X}y_0 - \Expect{X}y_0 = 0 \end{eqnarray} \] The internal temperature will fluctuate around the set-point \( y_0 \), but those fluctuations will not correlate with the external temperature.
- Internal temperature \( Y \) is correlated with the control signal \( C \) only through my sloppiness: \[ \begin{eqnarray} \Cov{C,Y} & = & \Expect{CY} - \Expect{C}\Expect{Y}\\ & = & \Expect{(-y_0 + X + \epsilon_1)(X+y_0-X-\epsilon_1+\epsilon_2)} - (\Expect{X}-y_0)y_0\\ & = & -y_0^2 - \Expect{\epsilon^2} + \Expect{X}y_0 -\Expect{X \epsilon_1} + \Expect{X\epsilon_2} + \Expect{\epsilon_1 \epsilon_2} - \Expect{X}y_0 + y_0^2\\ & = & -\Var{\epsilon_1} \end{eqnarray} \] since all the cross-expectations are zero, and \( \Expect{\epsilon_1}=0 \).
- The control signal \( C \) is correlated with the external temperature: \[ \begin{eqnarray} \Cov{C,X} & = & \Expect{CX} - \Expect{C}\Expect{X}\\ & = & \Expect{(-y_0 + X+\epsilon_1)X} + (-y_0 +\Expect{X})\Expect{X}\\ & = & \Expect{X^2} - \left(\Expect{X}\right)^2\\ & = & \Var{X} \end{eqnarray} \]
- A linear regression of \( Y \) on \( X \) and \( C \) will consistently recover the correct coefficients, namely \( +1 \) and \( -1 \). To see this, recall (e.g., from [[here]]) that the OLS estimates will tend towards the coefficients of the optimal linear predictor. Those coefficients, in turn, are the solution to \[ \beta = {\left[ \begin{array}{cc} \Var{X} & \Cov{C,X}\\ \Cov{X,C} & \Var{C} \end{array}\right]}^{-1} \left[ \begin{array}{c} \Cov{Y,X}\\ \Cov{Y,C} \end{array}\right] \] Plugging in our previous results, \[ \beta = {\left[ \begin{array}{cc} \Var{X} & \Var{X}\\ \Var{X} & \Var{X}+\Var{\epsilon_1} \end{array}\right]}^{-1} \left[ \begin{array}{c} 0\\ -\Var{\epsilon} \end{array}\right] \] After some character-building algebra, you can confirm that the covariance matrix is invertible as long as \( \Var{\epsilon_1} > 0 \), and then, as promised \( \beta = (1,-1) \).
Exercise: Build your character by doing the algebra.
So, as long as control isn't perfect, the naive statistician (or experienced econometrician...) who just does a kitchen-sink regression will actually get the relationship between \( Y \), \( X \) and \( C \) right, concluding that external temperature and the climate control have equal and opposite effects on internal temperature. Sure, there will be sampling noise, but with enough data they'll approach t