The Frisch–Waugh–Lovell Theorem for Both OLS and 2SLS

R-bloggers 2013-06-05

(This article was first published on DiffusePrioR » R, and kindly contributed to R-bloggers)

The Frisch–Waugh–Lovell (FWL) theorem is of great practical importance for econometrics. FWL establishes that it is possible to re-specify a linear regression model in terms of orthogonal complements. In other words, it permits econometricians to partial out right-hand-side, or control, variables. This is useful in a variety of settings. For example, there may be cases where a researcher would like to obtain the effect and cluster-robust standard error from a model that includes many regressors, and therefore a computationally infeasible variance-covariance matrix.

Here are a number of practical examples. The first just takes a simple linear regression model, with two regressors: x1 and x2. To partial out the coefficients on the constant term and x2, we first regress x2 on y1 and save the residuals. We then regress x2 on x1 and save the residuals. The final stage regresses the second residuals on the first. The following code illustrates how one can obtain an identical coefficient on x1 by applying the FWL theorem.

x1 = rnorm(100)x2 = rnorm(100)y1 = 1 + x1 - x2 + rnorm(100)r1 = residuals(lm(y1 ~ x2))r2 = residuals(lm(x1 ~ x2))# olscoef(lm(y1 ~ x1 + x2))# fwl olscoef(lm(r1 ~ -1 + r2))

In instrumental variable (IV) settings, slightly more work is required. Here we have a matrix of instruments (Z), exogenous variables (X), and endogenous variables (Y1). Let us imagine we want the coefficient on one endogenous variable y1. In this case we can apply FWL as follows. Regress X on each IV in Z in separate regressions, saving the residuals. Then regress X on y1, and X on y2, saving the residuals for both. In the last stage, perform a two-stage-least-squares regression of the X on y2 residuals on the X on y2 residuals using the residuals from X on each Z as instruments. An example of this is shown in the below code.

library(sem)ov = rnorm(100)z1 = rnorm(100) z2 = rnorm(100)y1 = rnorm(100) + z1 + z2 + 1.5*ovx1 = rnorm(100) + 0.5*z1 - z2x2 = rnorm(100) y2 = 1 + y1 - x1 + 0.3*x2 + ov + rnorm(100)r1 = residuals(lm(z1 ~ x1 + x2))r2 = residuals(lm(z2 ~ x1 + x2))r3 = residuals(lm(y1 ~ x1 + x2))r4 = residuals(lm(y2 ~ x2 + x2))# biased coef on y1 as expected for olscoef(lm(y2~y1+x1+x2)) # 2slscoef(tsls(y2~y1+x1+x2,~z1+z2+x1+x2))# fwl 2slscoef(tsls(r4~-1+r3,~-1+r1+r2))

The FWL can also be extended to cases where there are multiple endogenous variables. I have demonstrated this case by extending the above example to model x1 as an endogenous variable.

# 2 endogenous variablesr5 = residuals(lm(z1 ~ x2))r6 = residuals(lm(z2 ~ x2))r7 = residuals(lm(y1 ~ x2))r8 = residuals(lm(x1 ~ x2))r9 = residuals(lm(y2 ~ x2))# 2sls coefficientsp1 = fitted.values(lm(y1~z1+z2+x2))p2 = fitted.values(lm(x1~z1+z2+x2))lm(y2 ~ p1 + p2 + x2)# 2sls fwl coefficientsp3 = fitted.values(lm(r7~-1+r5+r6))p4 = fitted.values(lm(r8~-1+r5+r6))lm(r9 ~ p3 + p4)

To leave a comment for the author, please follow the link and comment on his blog: DiffusePrioR » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...