Intransitive dice IV: first problem more or less solved?

Gowers's Weblog 2018-03-10

I hope, but am not yet sure, that this post is a counterexample to Betteridge’s law of headlines. To back up that hope, let me sketch an argument that has arisen from the discussion so far, which appears to get us close to showing that if A,B and C are three n-sided dice chosen independently at random from the sequences model (I will recap some of these definitions in a moment), then the probability that A beats C given that A beats B and B beats C is 1/2+o(1). Loosely speaking, the information that A beats B and B beats C tells you nothing about how likely A is to beat C. Having given the argument, I will try to isolate a statement that looks plausible, not impossible to prove, and sufficient to finish off the argument.

Basic definitions

An nsided die in the sequence model is a sequence (a_1,\dots,a_n) of elements of [n]=\{1,2,\dots,n\} such that \sum_ia_i=n(n+1)/2, or equivalently such that the average value of a_i is (n+1)/2, which is of course the average value of a random element of [n]. A random n-sided die in this model is simply an n-sided die chosen uniformly at random from the set of all such dice.

Given n-sided dice A=(a_1,\dots,a_n) and B=(b_1,\dots,b_n), we say that A beats B if

|\{(i,j):a_i>b_j\}|>|\{(i,j):a_i<b_j\}|.

If the two sets above have equal size, then we say that A ties with B.

A first reduction

When looking at this problem, it is natural to think about the following directed graph: the vertex set is the set of all n-sided dice and we put an arrow from A to B if A beats B.

We believe (and even believe we can prove) that ties are rare. Assuming that to be the case, then the conjecture above is equivalent to the statement that if A,B,C are three vertices chosen independently at random in this graph, then the probability that ABC is a directed cycle is what you expect for a random tournament, namely 1/8.

One can also make a more general conjecture, namely that the entire (almost) tournament is quasirandom in a sense defined by Chung and Graham, which turns out to be equivalent to the statement that for almost all pairs (A,B) of dice, the four possible pairs of truth values for the pair of statements

(A beats C, B beats C)

each occur with probability approximately 1/4. If this is true, then given k random dice A_1,\dots,A_k, all the 2^{\binom k2} possibilities for which A_i beat which A_j have probability approximately 2^{-\binom k2}. This would imply, for example, that if A_1,\dots,A_k are independent random n-sided dice, then the probability that A_1 beats A_k given that A_i beats A_j for all other pairs (i,j) with i<j is still 1/2+o(1).

Several of us have done computer experiments to test these conjectures, and it looks as though the first one is true and the second one false. A further reason to be suspicious of the stronger conjecture is that a natural approach to prove it appears to be morally equivalent to a relationship between the correlations of certain random variables that doesn’t seem to have any heuristic justification or to fit with experimental evidence. So although we don’t have a disproof of the stronger conjecture (I think it would be very interesting to find one), it doesn’t seem like a good idea to spend a lot of effort trying to prove it, unless we can somehow explain away the evidence that appears to be stacking up against it.

The first conjecture turns out to be equivalent to a statement that doesn’t mention transitivity. The very quick proof I’ll give here was supplied by Luke Pebody. Suppose we have a tournament (that is, a complete graph with each edge directed in one of the two possible directions) and write d_+(x) for the out-degree of a vertex x (that is, the number of y such that there is an arrow from x to y) and d_-(x) for the in-degree. Then let us count the number of ordered triples (x,y,z) such that x\to y\to z. Any directed triangle xyz in the tournament will give rise to three such triples, namely (x,y,z), (y,z,x) and (z,x,y). And any other triangle will give rise to just one: for example, if x\to y, x\to z and y\to z we get just the triple (x,y,z). So the number of ordered triples (x,y,z) such that x\to y and y\to z is \binom n3 plus twice the number of directed triangles. Note that \binom n3 is approximately n^3/6.

But the number of these ordered triples is also \sum_yd_+(y)d_-(y). If almost all in-degrees and almost all out-degrees are roughly n/2, then this is approximately n^3/4, which means that the number of directed triangles is approximately (n^3/4-n^3/6)/2\approx\binom n3/4. That is, in this case, the probability that three dice form an intransitive triple is approximately 1/4, as we are hoping from the conjecture. If on the other hand several in-degrees fail to be roughly n/2, then \sum_yd_+(y)d_-(y) is substantially lower than n^3/4 and we get a noticeably smaller proportion of intransitive triples.

Thus, the weaker conjecture is equivalent to the statement that almost every die beats approximately half the other dice.

Why should a “typical” die beat approximately half the other dice?

The answer to this is fairly simple, heuristically at least. Let A be an arbitrary die. For j=1,2,\dots,n define g_A(j) to be the number of i with a_i\leq j and define h_A(j) to be g_A(j)-j. Then

\sum_jg_A(j)=\sum_j\sum_i\mathbf 1_{[a_i\leq j]}=\sum_i(n-a_i+1)

=n(n+1)-\sum_ia_i=n(n+1)/2,

from which it follows that \sum_jh_A(j)=0.

We also have that if B is another die, then

\sum_jh_A(b_j)=\sum_j(\sum_i\mathbf 1_{a_i\leq b_j}-b_j)=\sum_{i,j}\mathbf 1_{[a_i\leq b_j]}-n(n+1)/2.

If we make the simplifying assumption that a_i=b_j sufficiently infrequently to make no real difference to what is going on (which is not problematic, as a slightly more complicated but still fairly simple function can be used instead of h_A to avoid this problem), then we find that to a reasonable approximation B beats A if and only if \sum_jh_A(b_j) is positive.

So what we would like to prove is that if b_1,\dots,b_n are chosen independently at random from [n], then

\mathbb P\Bigl[\sum_jh_A(b_j)>0\Bigm|\sum_jb_j=n(n+1)/2\Bigr]\approx 1/2.

We are therefore led to consider the random variable

(X,Y)=(\sum_jh_A(b_j),\sum_j(b_j-(n+1)/2))

where now B=(b_1,\dots,b_n) is chosen uniformly at random from [n]^n without any condition on the sum. To write this in a more transparent way, let (U,V) be the random variable (h_A(x), x-(n+1)/2), where x is chosen uniformly at random from [n]. Then (X,Y) is a sum of n independent copies of (U,V). What we are interested in is the distribution we obtain when we condition the random variable (X,Y) on Y=0.

This should mean that we are in an excellent position, since under appropriate conditions, a lot is known about sums of independent random variables, and it looks very much as though those conditions are satisfied by (U,V), at least when A is “typical”. Indeed, what we would expect, by the central limit theorem, is that (X,Y) will approximate a bivariate normal distribution with mean 0 (since both U and V have mean zero). But a bivariate normal distribution is centrally symmetric, so we expect the distribution of [X|Y=0] to be approximately centrally symmetric, which would imply what we wanted above, since that is equivalent to the statement that \mathbb P[X>0|Y=0]\approx 1/2.

Why are we not immediately done?

How can we make the above argument rigorous? The central limit theorem on its own is not enough, for two reasons. The first is that it does not give us information about the speed of convergence to a normal distribution, whereas we need a sum of n copies of (U,V) to be close to normal. The second is that the notion of “close to normal” is not precise enough for our purposes: it will allow us to approximate the probability of an event such as [X\geq x\wedge Y\geq y] but not of a “probability zero” event such as [X>0\wedge Y=0].

The first of these difficulties is not too worrying, since plenty of work has been done on the speed of convergence in the central limit theorem. In particular, there is a famous theorem of Berry and Esseen that is often used when this kind of information is needed.

However, the Berry-Esseen theorem still suffers from the second drawback. To get round that one needs to turn to more precise results still, known as local central limit theorems, often abbreviated to LCLTs. With a local central limit theorem, one can even talk about the probability that (X,Y) takes a specific value after a specific number of steps. Roughly speaking, it says (in its 2-dimensional version) that if (U,V) is a random variable of mean zero taking values in \mathbb Z^2 and if (U,V) satisfies suitable moment conditions and is not supported in a proper sublattice of \mathbb Z^2, then writing (X,Y) for a sum of n copies of (U,V), we have that the probability that (X,Y) takes a particular value differs from the “expected” probability (given by a suitable Gaussian formula) by O(n^{-2}). (I’m not 100% sure I’ve got that right: the theorem in question is Theorem 2.1.1 from this book.)

That looks very close to what we want, but it still falls short. The problem is that the implied constant depends on the random variable (U,V). A simple proof of this is that if (U,V) is not supported in a sublattice but very nearly is — for example, if the probability that it takes a value outside the sublattice is 2^{-2^n} — then one will have to add together an extremely large number of copies of (U,V) before the sum ceases to be concentrated in the sublattice.

So the situation we appear to be in is the following. We have more precise information about the random variable (U,V) than is assumed in the LCLT in the reference above, and we want to use that to obtain an explicit constant in the theorem.

It could be that out there in the literature is exactly the result we need, which would be nice, but it also seems possible that we will have to prove an appropriate version of the LCLT for ourselves. I’d prefer the first, but the second wouldn’t be too disappointing, as the problem is quite appealing and even has something of an additive-combinatorial flavour (since it is about describing an iterated convolution of a subset of \mathbb Z^2 under appropriate assumptions).

What properties can we establish about the random variable (U,V)?

I said above, with no justification, that we have more precise information about the random variable (U,V). Let me now try to give the justification.

First of all, we know everything we could possibly want to know about V: it is the uniform distribution on [n] - (n+1)/2. (In particular, if n is odd, then it is the uniform distribution on the set of integers in [-(n-1)/2, (n-1)/2].)

How about the distribution of U? That question is equivalent to asking about the values taken by h_A(j), and their multiplicities. There is quite a lot one can say about those. For example, I claim that with high probability (if A is a random n-sided die) h_A(j) is never bigger than C\sqrt{n\log n}. That is because if we choose a fully random sequence (a_1,\dots,a_n)\in[n]^n, then the expected number of i such that a_i\leq j is j, and the probability that this number differs from j by more than t\sqrt n is \exp(-ct^2), by standard probabilistic estimates, so if we set t=C\sqrt{\log n}, then this is at most n^{-cC}, which we can make a lot smaller than n^{-1} by choosing C to be, say, 10c^{-1}. (I think c can be taken to be 1/8 if you want me to be more explicit.) Since the probability that \sum_ia_i=n(n+1)/2 is proportional to 1/n, it follows that this conclusion continues to hold even after we condition on that event.

Another simple observation is that the values taken by U are not contained in a sublattice (assuming, that is, that h_A is ever non-zero). That is simply because h_A(j+1)\geq h_A(j)-1 and h_A averages zero.

A third simple observation is that with probability 1-o(1) h_A will take a value of at least \sqrt n/\log n at least somewhere. I’ll sketch a proof of this. Let k be around \log n and let m_1<\dots<m_k be evenly spaced in [n], staying away from the end points 1 and n. Let A be a purely random sequence in [n]^n. Then the standard deviation of h_A(m_1) is around \sqrt{n/\log n}, so the probability that it is less than \sqrt n/\log n is around (\log n)^{-1/2}. The same is true of the conditional probability that h_A(m_i) is less than \sqrt n/\log n conditioned on the value of h_A(m_{i-1}) (the worst case being when this value is 0). So the probability that this happens for every i is at most (\log n)^{-\log n/2}. This is much smaller than n^{-1}, so the conclusion remains valid when we condition on the sum of the a_i being n(n+1)/2. So the claim follows. Note that because of the previous simple observation, it follows that h_A must be at least c\sqrt{n/\log n} in magnitude at least c\sqrt{n/\log n} times, so up to log factors we get that \sum_jh_A(j)^2 is at least n^{3/2}. With a bit more effort, it should be possible to push this up to something more like n^2, since one would expect that h_A(j) would have rough order of magnitude \sqrt n for a positive fraction of the j. Maybe this would be a good subproblem to think about, and ideally not too difficult.

How about the joint distribution (U,V)? It seems highly likely that for typical A this will not be concentrated in a lattice, and that elementary arguments such as the above can be used to prove this. But let me indicate the kind of situation that we would have to prove is not typical. Suppose that n=15 and A=(1,1,1,1,1,8,8,8,8,8,15,15,15,15,15). Then as j runs from 1 to 15 the values taken by g_A are (5,5,5,5,5,5,5,10,10,10,10,10,10,10,15) and the values taken by h_A are (4,3,2,1,0,-1,-2,2,1,0,-1,-2,-3,-4,0). For this example, all the points (h_A(x),x) live in the lattice of points (u,v) such that u+v is a multiple of 5.

This wouldn’t necessarily be a disaster for us actually, since the LCLT can be restricted to a sublattice and if after conditioning on Y=0 we happen to have that X is always a multiple of 5, that isn’t a problem if we still have the central symmetry. But it would probably be nicer to prove that it is an atypical occurrence, so that we don’t have to worry about (U,V) living inside a sublattice (or even being concentrated in one).

My guess is that if we were to pursue these kinds of thoughts, we would end up being able to prove a statement that would say something like that (U,V) takes a pretty representative sample of values with V being between -(n-1)/2 and (n-1)/2 and U being in a range of width around \sqrt n. I would expect, for example, that if we add three or four independent copies of (U,V), then we will have a distribution that is similar in character to the uniform distribution on a rectangle of width of order of magnitude n and height of order of magnitude \sqrt n. And if that’s true, then adding n of them should give us something very close to normal (in an appropriate discrete sense of the word “normal”).

What next?

There are two obvious tasks here. One is to try to prove as much as we can about the random variable (U,V). The other is to try to prove a suitable LCLT that is strong enough to give us that the probability that X>0 given that Y=0 is approximately 1/2, under suitable assumptions about (U,V). And then we have to hope that what we achieve for the first is sufficient for the second.

It’s possible that the second task can be achieved by simply going through one of the existing proofs of the LCLT and being more careful about the details. But if that’s the case, then we should spend some time trying to find out whether anyone has done it already, since there wouldn’t be much point in duplicating that work. I hope I’ve set out what we want clearly enough for any probabilist who might stumble upon this blog post to be able to point us in the right direction if indeed the result we want is out there somewhere.