Make a hypothesis about what you expect to see, every step of the way. A manifesto:

Statistical Modeling, Causal Inference, and Social Science 2024-11-13

We learn from surprise.

Surprise is when something unexpected happens.

The unexpected is defined relative to the expected.

To learn from surprise, it is good practice to specify the expected in as detailed a form as possible.

OK, here it is again, from a slightly different angle:

“The greatest value of a picture is when it forces us to notice what we never expected to see.” — John Tukey, from his classic book, Exploratory Data Analysis.

We will be most prepared to learn from the unexpected if we think clearly about what we are expecting.

Nathan Yau offers a good take:

Data exploration with visualization is good, but when someone describes their project as an exploration tool, it often means it lacks focus or direction. Instead it looks like generic graphs that don’t answer anything particular and leave all interpretation to the reader.

In doing research I find it useful to pause frequently before getting results to predict what I expect to see. In that way, I learn much more than if I just fumble forward, seeing what comes up. Research is a much more active process if I put in the work to formulate my expectations.

Similarly, when conducting a computer demonstration in class, I’ll pause before hitting Return and ask the students to discuss in pairs what they expect the output to be. Making that commitment is a valuable step toward learning.

All this is related to the idea that when you do applied statistics, you’re acting like a scientist. It also came up in the comment thread on statistical practice as scientific exploration.

One good thing about NIH research proposals is that they typically include statements of hypotheses: not “null hypotheses,” which I hate, but scientific hypotheses representing theories and expectations of what might happen. I think that’s much better than just jumping in and gathering data. Your hypotheses might well be wrong—we learn from our mistakes, we learn from the unexpected. But, again, this is all so much more effective when we write down these expectations, as explicitly as possible.