Reading Statistical Analytics Illustrated

Numbers Rule Your World 2022-05-16

A few months ago, Prof. Kottemann sent me a copy of his new book, "Statistical Analysis Illustrated: Foundations You Should Know" (link). On a recent flight, I finally found time to give it a quick read.

The book covers a central section of frequentist statistical theory, which forms the core of most traditional Statistics 101 courses. This is the sequence that runs through random sampling, sampling distributions, law of large numbers, central limit theory, confidence intervals, and hypothesis testing (tests of significance).

Kottemann adopts an approach that deemphasizes mathematical derivation, and delists any theorem while favoring verbal and visual explanation. This is an approach that I also prefer in introductory statistics courses so I flip the pages eagerly.

With the stated goal of helping readers appreciate a "way of thinking" about data, Kottemann improves upon most textbooks in the following manner:

He concentrates on the case of sample proportions (e.g. x% of respondents felt this way), which can be easily appreciated given their common usage in surveys. This use case avoids one of the tricky turns that confuse statistics students, namely, learning the difference between the population standard deviation, the sample standard deviation and the standard error.
He smartly abandons the usual bottoms-up presentation of the central sequence. In the first chapter, he goes straight into sampling distributions, confidence intervals and significance testing without having defined a random variable, a normal distribution, the law of large numbers, the standard error or the null hypothesis (all of which will appear in later chapters).
The visuals in the book are improved versions of typical textbook diagrams as he removed unnecessary and confusing details.

This central sequence of frequentist statistics has proven challenging to generations of students because it's not a simple, straightforward argument. It's not easy because generalizing from samples of data is a complex undertaking. Kottemann's scope is narrower than the typical Statistics 101 textbook but what he covers is essential.

Most of the writing by Kotteman is lucid. The complexity of the materials - when explained in words - only got the better of him on a few occasions. Above Figure 1.1, he writes "[This histogram] shows the frequency (out of 1,000) with which we should expect to get random samples that yield the various possible values for the percent-in-favor sample statistic." I quote it only to illustrate the type of tongue-twisters statistics teachers run into when covering this material (and so he may rephrase it in the next edition); the sentence is an outlier in a very readable book.

The hypothesis testing framework has been under constant attack from other "ways of thinking". In the last chapter of the book, Kottemann briefly covers the Bayesian way of thinking. I hope readers appreciate that everything is different, from which question is answered to what assumptions are made to how decisions are framed.

The frequentist viewpoint has been widely abused. Journal editors reject any studies that do not reach a 95% confidence level, effectively ordaining publication bias, and motivating scholars to "p-hack," which is a general term that means engineering the "p-value" to hit 95%. In practice, people may "d-hack" which is to review a large array of outcomes, and select the one that happens to reach 95%; or they may "n-hack," which is to make the sample size so big that even tiny signals are declared statistically significant. I might even add "ps-hack" which is to pre-specify a large array of outcomes to cover up "d-hacking".

If there is one gap in Kotteman's presentation, it's in the transition from the case of proportions to "the rest of the (frequentist) iceberg". If the reader is asked to do a test of difference in means at the 10% significance level, I'm not sure s/he can figure it out.

I'd also like to see a few worked-out examples of analyzing real-life opinion polls or market research surveys, and some exercises.

If you're learning statistics, and having trouble with the central sequence from random sampling to statistical testing, or if your sole interest is understanding polls or surveys, give Kottemann's book a try. It may help unblock you.