There is no empirical optimal

Statistical Modeling, Causal Inference, and Social Science 2025-04-09

This is Jessica. Like “guarantees”, optimal is one of those words that gets thrown around all the time in computer science, but which can be used in cringier and less cringy ways. Today’s post is about using “optimal” to describe empirically estimated quantities, especially those that depend on human behavior.

For example, imagine we employ some model (e.g., an LLM) to mimic some human judgments. We try various ways of doing this and determine that we get the best alignment of the model behavior and the human behavior under some particular configuration of the method, so we call it optimal. Or, maybe we are measuring human performance given some model predictions, and we do a grid search over different model parameter combinations and call the parameterization that coincides with the best performance the optimal one.

Using “optimal” in such cases bugs me because there is no way in which the specific estimated value is a consistent property of the process that produces it. Running the same data collection process in either of the above cases will result in different “optimal” values. Calling something that is best for some bespoke sample where often we can’t really describe the sampling process or even the population “optimal” implies auxiliary assumptions we can’t back up.

Instead “optimal” should be reserved for where we are talking about idealized processes, e.g., where we can imagine hypothetical replications and talk about how the solution a method returns is consistent in some way. For example, when we fit a regression to some data, we can talk about the optimality of our estimator in the sense that it promises a certain relationship with the true values, but we should not call the specific fit that this process returns optimal, unless maybe we know we are dealing with the entire population.

My impression is that perceiving things that are empirically learned as optimal tends to happen more often in machine learning than statistics. At the extreme, you get the Breiman-esque view that optimizing for prediction is all you need. I heard a version of this while participating in a panel on AI safety recently, where one of the other panelists announced that statistics will ultimately be replaced entirely by machine learning. Regardless of what the speaker might have meant by this, it’s easy to interpret such comments as saying that the output of the learning process is sufficiently “optimal” to not need much theory.