Intransitive dice VI: sketch proof of the main conjecture for the balanced-sequences model
Gowers's Weblog 2018-03-10
I have now completed a draft of a write-up of a proof of the following statement. Recall that a random -sided die (in the balanced-sequences model) is a sequence of length
of integers between 1 and
that add up to
, chosen uniformly from all such sequences. A die
beats a die
if the number of pairs
such that
exceeds the number of pairs
such that
. If the two numbers are the same, we say that
ties with
.
Theorem. Let and
be random
-sided dice. Then the probability that
beats
given that
beats
and
beats
is
.
In this post I want to give a fairly detailed sketch of the proof, which will I hope make it clearer what is going on in the write-up.
The first step is to show that the theorem is equivalent to the following statement.
Theorem. Let be a random
-sided die. Then with probability
, the proportion of
-sided dice that
beats is
.
We had two proofs of this statement in earlier posts and comments on this blog. In the write-up I have used a very nice short proof supplied by Luke Pebody. There is no need to repeat it here, since there isn’t much to say that will make it any easier to understand than it already is. I will, however, mention once again an example that illustrates quite well what this statement does and doesn’t say. The example is of a tournament (that is, complete graph where every edge is given a direction) where every vertex beats half the other vertices (meaning that half the edges at the vertex go in and half go out) but the tournament does not look at all random. One just takes an odd integer and puts arrows out from
to
mod
for every
, and arrows into
for every
. It is not hard to check that the probability that there is an arrow from
to
given that there are arrows from
to
and
to
is approximately 1/2, and this turns out to be a general phenomenon.
So how do we prove that almost all -sided dice beat approximately half the other
-sided dice?
The first step is to recast the problem as one about sums of independent random variables. Let stand for
as usual. Given a sequence
we define a function
by setting
to be the number of
such that
plus half the number of
such that
. We also define
to be
. It is not hard to verify that
beats
if
, ties with
if
, and loses to
if
.
So our question now becomes the following. Suppose we choose a random sequence with the property that
. What is the probability that
? (Of course, the answer depends on
, and most of the work of the proof comes in showing that a “typical”
has properties that ensure that the probability is about 1/2.)
It is convenient to rephrase the problem slightly, replacing by
. We can then ask it as follows. Suppose we choose a sequence
of
elements of the set
, where the terms of the sequence are independent and uniformly distributed. For each
let
. What is the probability that
given that
?
This is a question about the distribution of , where the
are i.i.d. random variables taking values in
(at least if
is odd — a small modification is needed if
is even). Everything we know about probability would lead us to expect that this distribution is approximately Gaussian, and since it has mean
, it ought to be the case that if we sum up the probabilities that
over positive
, we should get roughly the same as if we sum them up over negative
. Also, it is highly plausible that the probability of getting
will be a lot smaller than either of these two sums.
So there we have a heuristic argument for why the second theorem, and hence the first, ought to be true.
There are several theorems in the literature that initially seemed as though they should be helpful. And indeed they were helpful, but we were unable to apply them directly, and had instead to develop our own modifications of their proofs.
The obvious theorem to mention is the central limit theorem. But this is not strong enough for two reasons. The first is that it tells you about the probability that a sum of random variables will lie in some rectangular region of of size comparable to the standard deviation. It will not tell you the probability of belonging to some subset of the y-axis (even for discrete random variables). Another problem is that the central limit on its own does not give information about the rate of convergence to a Gaussian, whereas here we require one.
The second problem is dealt with for many applications by the Berry-Esseen theorem, but not the first.
The first problem is dealt with for many applications by local central limit theorems, about which Terence Tao has blogged in the past. These tell you not just about the probability of landing in a region, but about the probability of actually equalling some given value, with estimates that are precise enough to give, in many situations, the kind of information that we seek here.
What we did not find, however, was precisely the theorem we were looking for: a statement that would be local and 2-dimensional and would give information about the rate of convergence that was sufficiently strong that we would be able to obtain good enough convergence after only steps. (I use the word “step” here because we can think of a sum of
independent copies of a 2D random variable as an
-step random walk.) It was not even clear in advance what such a theorem should say, since we did not know what properties we would be able to prove about the random variables
when
was “typical”. That is, we knew that not every
worked, so the structure of the proof (probably) had to be as follows.
1. Prove that has certain properties with probability
.
2. Using these properties, deduce that the sum converges very well after
steps to a Gaussian.
3. Conclude that the heuristic argument is indeed correct.
The key properties that needed to have were the following two. First, there needed to be a bound on the higher moments of
. This we achieved in a slightly wasteful way — but the cost was a log factor that we could afford — by arguing that with high probability no value of
has magnitude greater than
. To prove this the steps were as follows.
- Let
be a random element of
. Then the probability that there exists
with
is at most
(for some
such as 10).
- The probability that
is at least
for some absolute constant
.
- It follows that if
is a random
-sided die, then with probability
we have
for every
.
The proofs of the first two statements are standard probabilistic estimates about sums of independent random variables.
The second property that needed to have is more difficult to obtain. There is a standard Fourier-analytic approach to proving central limit theorems, and in order to get good convergence it turns out that what one wants is for a certain Fourier transform to be sufficiently well bounded away from 1. More precisely, we define the characteristic function of the random variable
to be
where is shorthand for
,
, and
and
range over
.
I’ll come later to why it is good for not to be too close to 1. But for now I want to concentrate on how one proves a statement like this, since that is perhaps the least standard part of the argument.
To get an idea, let us first think what it would take for to be very close to 1. This condition basically tells us that
is highly concentrated mod 1: indeed, if
is highly concentrated, then
takes approximately the same value almost all the time, so the average is roughly equal to that value, which has modulus 1; conversely, if
is not highly concentrated mod 1, then there is plenty of cancellation between the different values of
and the result is that the average has modulus appreciably smaller than 1.
So the task is to prove that the values of are reasonably well spread about mod 1. Note that this is saying that the values of
are reasonably spread about.
The way we prove this is roughly as follows. Let , let
be of order of magnitude
, and consider the values of
at the four points
and
. Then a typical order of magnitude of
is around
, and one can prove without too much trouble (here the Berry-Esseen theorem was helpful to keep the proof short) that the probability that
is at least , for some positive absolute constant
. It follows by Markov’s inequality that with positive probability one has the above inequality for many values of
.
That’s not quite good enough, since we want a probability that’s very close to 1. This we obtain by chopping up into intervals of length
and applying the above argument in each interval. (While writing this I’m coming to think that I could just as easily have gone for progressions of length 3, not that it matters much.) Then in each interval there is a reasonable probability of getting the above inequality to hold many times, from which one can prove that with very high probability it holds many times.
But since is of order
,
is of order 1, which gives that the values
are far from constant whenever the above inequality holds. So by averaging we end up with a good upper bound for
.
The alert reader will have noticed that if , then the above argument doesn’t work, because we can’t choose
to be bigger than
. In that case, however, we just do the best we can: we choose
to be of order
, the logarithmic factor being there because we need to operate in many different intervals in order to get the probability to be high. We will get many quadruples where
and this translates into a lower bound for of order
, basically because
has order
for small
. This is a good bound for us as long as we can use it to prove that
is bounded above by a large negative power of
. For that we need
to be at least
(since
is about
), so we are in good shape provided that
.
The alert reader will also have noticed that the probabilities for different intervals are not independent: for example, if some is equal to
, then beyond that
depends linearly on
. However, except when
is very large, this is extremely unlikely, and it is basically the only thing that can go wrong. To make this rigorous we formulated a concentration inequality that states, roughly speaking, that if you have a bunch of
events, and almost always (that is, always, unless some very unlikely event occurs) the probability that the
th event holds given that all the previous events hold is at least
, then the probability that fewer than
of the events hold is exponentially small in
. The proof of the concentration inequality is a standard exponential-moment argument, with a small extra step to show that the low-probability events don’t mess things up too much.
Incidentally, the idea of splitting up the interval in this way came from an answer by Serguei Popov to a Mathoverflow question I asked, when I got slightly stuck trying to prove a lower bound for the second moment of . I eventually didn’t use that bound, but the interval-splitting idea helped for the bound for the Fourier coefficient as well.
So in this way we prove that is very small if
. A simpler argument of a similar flavour shows that
is also very small if
is smaller than this and
.
Now let us return to the question of why we might like to be small. It follows from the inversion and convolution formulae in Fourier analysis. The convolution formula tells us that the characteristic function of the sum of the
(which are independent and each have characteristic function
) is
. And then the inversion formula tells us that
What we have proved can be used to show that the contribution to the integral on the right-hand side from those pairs that lie outside a small rectangle (of width
in the
direction and
in the
direction, up to log factors) is negligible.
All the above is true provided the random -sided die
satisfies two properties (the bound on
and the bound on
), which it does with probability
.
We now take a die with these properties and turn our attention to what happens inside this box. First, it is a standard fact about characteristic functions that their derivatives tell us about moments. Indeed,
,
and when this is
. It therefore follows from the two-dimensional version of Taylor’s theorem that
plus a remainder term that can be bounded above by a constant times
.
Writing for
we have that
is a positive semidefinite quadratic form in
and
. (In fact, it turns out to be positive definite.) Provided
is small enough, replacing it by zero does not have much effect on
, and provided
is small enough,
is well approximated by
.
It turns out, crucially, that the approximations just described are valid in a box that is much bigger than the box inside which has a chance of not being small. That implies that the Gaussian decays quickly (and is why we know that
is positive definite).
There is a bit of back-of-envelope calculation needed to check this, but the upshot is that the probability that is very well approximated, at least when
and
aren’t too big, by a formula of the form
.
But this is the formula for the Fourier transform of a Gaussian (at least if we let and
range over
, which makes very little difference to the integral because the Gaussian decays so quickly), so it is the restriction to
of a Gaussian, just as we wanted.
When we sum over infinitely many values of and
, uniform estimates are not good enough, but we can deal with that very directly by using simple measure concentration estimates to prove that the probability that
is very small outside a not too large box.
That completes the sketch of the main ideas that go into showing that the heuristic argument is indeed correct.
Any comments about the current draft would be very welcome, and if anyone feels like working on it directly rather than through me, that is certainly a possibility — just let me know. I will try to post soon on the following questions, since it would be very nice to be able to add answers to them.
1. Is the more general quasirandomness conjecture false, as the experimental evidence suggests? (It is equivalent to the statement that if and
are two random
-sided dice, then with probability
, the four possibilities for whether another die beats
and whether it beats
each have probability
.)
2. What happens in the multiset model? Can the above method of proof be adapted to this case?
3. The experimental evidence suggests that transitivity almost always occurs if we pick purely random sequences from . Can we prove this rigorously? (I think I basically have a proof of this, by showing that whether or not
beats
almost always depends on whether
has a bigger sum than
. I’ll try to find time reasonably soon to add this to the draft.)
Of course, other suggestions for follow-up questions will be very welcome, as will ideas about the first two questions above.