Binge Reading Gelman

Numbers Rule Your World 2014-06-23

As others binge watch Netflix TV, I binge read Gelman posts, while riding a train with no wifi and a dying laptop battery. (This entry was written two weeks ago.)

Andrew Gelman is statistics’ most prolific blogger. Gelman-binging has become a necessity since I have not managed to keep up with his accelerated posting schedule. Earlier this year, he began publishing previews of future posts, one week in advance, and one month in advance.

Also, I have been stubbornly waiting for the developers of my former favorite RSS reader to work out an endless parade of the most elementary bugs, after they launched a new site in response to Google Reader shutting down. Not having settled on a new RSS tool has definitely shrank the volume of my reading.

I only managed to go through about a week’s worth of posts because the recent pieces interest me a lot.

***

Debunking the cannabis causes brain abnormalities paper (link)

Gelman links to Lior Pachter's review of what he calls "quite possibly the worst paper I've read all year".

This bit deserves further mocking: when the researchers fail to achieve conventional 5% significance, they draw conclusions based on "trend towards significance". This sleight of hand happens frequently in practice as well, where the phrase directional result is utilized.

When an observed effect, as in this case, is not statistically significant, the implication is that the signal is not large enough to distinguish from background noise. When the researcher then says “but I still see a signal”, said researcher is now ignoring the uncertainty around the point estimate, pretending that the noise doesn’t exist. The researcher is in effect making a decision using the point estimate. Anyone who has taken Stats 101 should know not to use a point estimate.

One great tenet of statistical thinking is the recognition that the observed data sample is merely one of many possible things that could have happened. The confidence interval is an attempt to capture the range of possibilities, and the much-maligned tests of significance represent an attempt to reduce such analysis to one statistic. It achieves simplicity at the expense of nuance.

This cannabis study is also a great example of what I’ve been calling “causation creep”. The authors are well-aware that they have merely found an instance of correlation (not even but just for the sake of argument), but when they start narrating their finding, they cannot help but use causal language.

The title of the paper is "Cannabis use is quantitatively associated with...", and yet the lead author told USA Today: "Just casual use appears to create changes in the brain in areas you don't want to change."

Causal creep is actually endemic in academic publishing of observational studies, and I don't want to single these authors out.

***

Debunking the Himmicanes study (link)

Gelman has been on this one for a while. The offensive paper looked at the correlation between hurricane damage and the gender of the names we give these hurricanes. I didn’t find it worth spending my time studying this line of research but I’m assuming that the problem is considered interesting because they claim to have found a “natural experiment” in that the gender is effectively “randomly assigned” to the hurricanes as they appear.

I have been quite irritated over the years by this type of research, encouraged by the fad of Freakonomics. Even if they did find a natural experiment, what is that experiment about? Instead of spending research hours on correlating damage with naming conventions, why not spend the precious time looking for real causes of hurricane damage? You know, like weather patterns, currents, physical phenomena, human-induced climate changes, human decisions to live in high-risk areas, etc.?

I should note that much of Steven Levitt’s original work that launched this field deal with real problems, like crime rates and . It’s just that many of his followers have gone astray.

***

Debunking the technology adoption curves (link)

Vox_tech_adoption_curveMatt Novak debunks an article in Vox which repeats the assertion by the tech industry that new technologies have been adopted much more quickly in recent years than in the past. Vox is not the only place where you see this assertion. We have all seen variations of the chart shown on the right.

Novak puts on a statistician's hat and asks how the data came about. This type of chart is particularly prone to errors since many different studies across different eras are needed.

What Novak found: the invention date of older technologies (like TV and radio) were defined by their invention in the laboratory while recent technologies (such as Internet, mobile) were defined by their date of commercialization. Needless to say, adoption is expected to be slow when the technologies were not yet available to consumers!

Needless to say, anyone who cites this chart or its conclusion from here on out should be publicly shamed.

***

All the assumptions that are my life… (link)

Gelman nicely distills one of the central messages in my Numbersense book (Get it here). All data analyses require assumptions; assumptions are subjective; making assumptions is not a sin; clarifying one’s assumptions and vigorously testing them is what make good analyses. Go read this post.

***

Did you buy detergent on your most recent trip to the store? (link)

Gelman was surprised by a recent paper in which the researchers found that 42% of their sample purchased detergent on their most recent trip to the store. This reminds me of the section of Numbersense (Get it here) in which I described a study in which some marketing professors had mystery shoppers track people in a supermarket and within seconds of them placing groceries in their trolley, asked them how much the items cost. The error rate was quite shocking.

There is another big problem with this research design. People's memory of what they purchased depends on how long ago that "most recent" trip was. I also wonder how online purchasing affects this sort of study as I typically don't count going to a website as "a trip to the supermarket". It seems like some sort of prequalification is needed but prequalification always restricts the generalizability of any finding.

***

Stepwise Regression and Outlier Detection (link)

Andrew gently mocks both of these commonly used procedures. The discussion of outlier detection is buried in the comments section so if you are interested, you should scroll below the fold.  Gelman’s annoyance with outlier detection is semantic: but important semantics, which align with my own practice. Like Gelman, I don't consider any extreme value an outlier.

Stepwise is a suboptimal procedure and Gelman prefers modern techniques like lasso. But lots of practitioners use stepwise because the procedure is “intuitive”, that is to say, one can explain it to a non-technical person without rolling their eyes. The discussion below the post is worth reading.