Sampling from the tail end of the normal distribution

Wildon's Weblog 2025-02-04

The purpose of this post is to present two small observations on the normal distribution, with more speculative applications to academic recruitment. The mathematical part at least is rigorous.

Tails of the normal distribution and the exponential approximation

We can sample from the tail end of the standard normal distribution $N(0,1)$ by sampling from $N(0,1)$ as usual, but then rejecting all samples that are not at least some threshold value $t$ . From my mental picture of the bell curve, I think it is intuitive that the mean of the resulting sample will be not much more than $t$ . In fact the mean is

$\displaystyle t + \frac{1}{t} - \frac{2}{t^3} + \mathrm{O}(\frac{1}{t^4}).$

Please pause for a moment and ask yourself: do you expect the variance of the tail-end sample to increase with $t$ or instead to decrease with $t$ ?

The answer, less intuitively I think, is that the variance decreases as $1/t^2$ . More precisely, it is

$\displaystyle \frac{1}{t^2} - \frac{6}{t^4} + \mathrm{O}(\frac{1}{t^6}).$

Thus, the larger $t$ is, the more concentrated is the sample. Given that one expects tail events to be rare, and so perhaps more variable, this is maybe a surprise. One can derive the series expansions above by routine but fiddly calculations using the probability density function $\frac{1}{\sqrt{2\pi}} \exp(-x^2/2)$ of the standard normal distribution.

Alternatively, and I like that this can be done entirely in one’s head, observe that for large $t$ and small $h$ , we have

$\displaystyle \frac{1}{\sqrt{2\pi}} \exp(-(x+h)^2/2 ) \approx \frac{1}{\sqrt{2\pi}} \exp(-x^2) \exp(-xh)$

and so the normal distribution conditioned on $x \ge t$ should be well approximated by an exponential distribution with parameter $t$ , shifted by $t$ . The probability density function of the unshifted exponential distribution with parameter $t$ is $t \exp(-th)$ , and, correspondingly, as is well known, its mean and standard deviation are both $1/t$ . Thus this approximation suggests that the mean of the tail distribution should be $t + 1/t$ and its variance $1/t^2$ , agreeing with the two asymptotic series above.

The University of Erewhon

In the selection process for the University of Erewhon, applicants are invited sit a fiendishly difficult exam. The mean score is zero (these candidates are of course rejected) and the faculty recruits the tiny minority who exceed the pass mark. The result, of course, is that the faculty recruits not the best mathematicians, but those best at passing their absurd test. (Nonetheless there is a long queue at the examination centre each year: people want jobs!) But more deleteriously, the effect is to create a monoculture: modelling exam-taking ability as an $N(0,1)$ random variable, the analysis above shows that sampling with threshold $t$ gives a distribution in which almost everyone scores about $t + 1/t$ . (We saw the standard deviation was $1/t$ up to terms of order $1/t^2$ .) And of course $t$ is chosen so large that only those able to devote years of their life to preparing for the exam, to the exclusion of all other activities, have any chance of passing. And thus the dispririting result is that all candidates are not only rather alike in their test-taking ability, but the selection by this one statistic creates a culture entirely dominated by the branches of mathematics and the personality traits favoured by the test.

I have gone through selection processes in my life that had something in common with this obvious parody (sometimes unjustly benefitting, other times, in retrospect, being glad to fail), and believe I can write from experience that the result has never been good. The alternative of course is to use multiple criteria when recruiting, allowing that some successful candidates may be weak in some areas, while very strong in others. Rather obvious, I know, but I think it’s interesting that a very simple mathematical argument backs up this conclusion.

In the next part of the post we’ll see that the effect of luck can mean that Erewhonian-style recruitment not only creates a mono-culture, it doesn’t even successfully select the most talented candidates.

The effect of luck

If our intuition about tail events is perhaps suspect, our intuition for conditional probability is surely worse. When the two come together, by conditioning on rare events, the effects can be catastrophic. Selection bias is frequently overlooked even by those who should know better. It seems quite possible to me that we’re here today only because in the many worlds in which the delicate balance of nuclear deterrence failed, there is no-one left in the charred embers to read blog posts on selection bias. Or to pick a less extreme example, would you spend £1000 to attend a seminar given by a panel of lottery winners in which they promise to reveal the detail of their gambling strategies?

Returning to the normal distribution, and the University of Erewhon, suppose that with probability $1-p$ we sample from $N(0,1)$ (people with average luck), and with probability $p$ we sample from $N(0,1) + \ell$ (lucky people). Suppose that $\ell$ is small, say $\ell = 0.5$ , so the effect of luck is relatively modest, boosting one’s performance by just $1/2$ a standard deviation. As before, we then reject all samples below the threshold $t$ . Does the effect of luck become more or less significant as $t$ increases?

I think here most people would intuitively guess the right answer, but the magnitude of the effect is striking. The table below shows the number of ‘average’ and ‘lucky’ people when we take a million samples according to the scheme above, keeping only the samples that are at least $t$ . We suppose that there is a probability $p = 1/5$ of being lucky. The final column is the ratio of average to lucky people.

$t$ averageluckyratio1127064615730.481.553375317740.60218361130700.712.5492445620.933105411761.123.51872501.34424512.13

Thus even though average people outnumber lucky people four to one, when we sample from the $t \ge 3$ tail of the distribution (with a boost of 0.5 for lucky people), lucky people predominate. The effect is that the University of Erewhon fails in its goal of recruiting people three standard deviations above the mean: instead of the 437 successful candidate, 250 are people who are merely about 2.5 standard deviations above the mean, but happened to get lucky. Again the moral of the story is clear.