Survey Statistics: divine probabilities

Statistical Modeling, Causal Inference, and Social Science 2025-12-09

Last week Anon pointed us to Meng 2022, which clarified some of my confusion about “non-probability” samples:

the phrase non-probability samples should be understood as a short hand for “samples without an identified design probability construct”.

Without (human) design probability, we can still have “divine probability”:

we typically conceptualize that the data at hand is a realization of a generative probabilistic mechanism given by nature or God.

But to bring us back down to earth, Meng introduces “device probability”:

By far, most probabilities used in statistical modeling are devices for expressing our belief, prior knowledge, assumptions, idealizations, compromises, or even desperation.

In math notation:

  • Responders may be selected by human design probability P(R_i = 1 | X_i) where X_i can include stratification variables (for example).
  • Without fully controlled human design, responders follow laws of nature, with divine probability pi_i = P(R_i = 1) that can differ across people arbitrarily, perhaps depending on outcome of interest y_i.
  • To estimate divine probability, we might assume device probability, e.g. a generalized linear model P(R_i = 1 | X_i) = g^-1(b*X_i) that doesn’t include y_i.