Cutting data won't make cutting weight easier

Numbers Rule Your World 2013-06-05

In my new book, I have a chapter on interpreting the statistics of obesity. Andrew Sullivan (link) recently pointed to a Nature article discussing an aspect of the controversy around these numbers.

The bone of contention is the shape of the mortality curve. It has been thought that the curve is monotonic increasing, meaning that the higher your BMI, the higher the mortality rate. But survey data in the U.S. now show that the curve is probably U-shaped: mortality rates are high for both obese and thin people. Overweight (less than obese) people paradoxically seemed to live longer than those with "normal" weight. This last observation has driven some people nuts.

The article focused on two Harvard researchers who organized a conference specifically to attack a CDC paper demonstrating the U-shaped curve. This is the crux of their argument:

When the researchers excluded women who had ever smoked and those who died during the first four years of the study (reasoning that these women may have had disease-related weight loss), they found a direct linear relationship between BMI and death, with the lowest mortality at BMIs below 19.

Excluding portions of a sample from analysis is a dangerous game, and should be heavily discouraged. It's one thing to adjust the data; it's another thing to remove data completely. Notice that what was removed weren't outliers, that is, data that might be incorrect and so extreme as to dominate the outcomes. They removed data specifically to conform to their model of the world.

First, they removed smokers because "smokers tend to be leaner and die earlier than non-smokers". This sounded like smokers who die earlier are on the thin side of the curve; removing them has th effect of straightening the curve.

The second cut is even more egregious. How can there be any justification for removing people who died during the first four years when the study's primary metric is death rate? They claimed reverse causality.

The most important reason why you should never drop large chunks of data in a systematic way is that your conclusions are now limited to the group that hasn't been dropped. Since there are no smokers in your sample, you cannot make a statement that applies to the general population. And yet, these researchers seem to have done so.

***

Later on in the article, the journalist repeats the nonsense about how using BMI is a problem. I have previously written about this topic here.

On a related note, a visiting professor at NYU has been making the news, having made insulting comments about "fat PhD applicants". Somehow, the field of evolutionary psychology has attracted many crazies.