All else is equal except that one is more equal than the other

Numbers Rule Your World 2014-02-07

The mis-use of "all else being equal" in statistics and much of the social sciences is a disease. The key to diagnosing this abuse is to recognize that there are two modes of "all else being equal".

I call it the "design" mode and the "pretend" mode.

An example of using "all else being equal" in the design mode is the classic randomization principle. If we run an A/B test (test of two means or proportions), by virtue of the random assignment, we are assured that groups A and B are statistically indistinguishable except for the known treatment. There is little danger of abuse in this mode. Unless we draw a bad lot, we know all else is truly equal.

Most of the time, though, when a researcher invokes "all else being equal", he or she is using the "pretend" mode. A classic example is a regression analysis run on observational data, otherwise known as "controlling for all other factors". First of all, you can only control for the known factors that you include in the regression equation. More seriously, in the "pretend" mode, all else being equal is an assumption. The effect given for any factor is under the assumption that the unit is average on all the other known factors. When you have observational data, you are unable to verify that all else is truly equal!

So next time someone mentions all else is equal, first figure out if this is used in design or pretend mode. Adjust your expectation accordingly.