Intransitive dice IV: first problem more or less solved?
Gowers's Weblog 2018-03-10
I hope, but am not yet sure, that this post is a counterexample to Betteridge’s law of headlines. To back up that hope, let me sketch an argument that has arisen from the discussion so far, which appears to get us close to showing that if and are three -sided dice chosen independently at random from the sequences model (I will recap some of these definitions in a moment), then the probability that beats given that beats and beats is . Loosely speaking, the information that beats and beats tells you nothing about how likely is to beat . Having given the argument, I will try to isolate a statement that looks plausible, not impossible to prove, and sufficient to finish off the argument.
Basic definitions
An –sided die in the sequence model is a sequence of elements of such that , or equivalently such that the average value of is , which is of course the average value of a random element of . A random -sided die in this model is simply an -sided die chosen uniformly at random from the set of all such dice.
Given -sided dice and , we say that beats if
If the two sets above have equal size, then we say that ties with .
A first reduction
When looking at this problem, it is natural to think about the following directed graph: the vertex set is the set of all -sided dice and we put an arrow from to if beats .
We believe (and even believe we can prove) that ties are rare. Assuming that to be the case, then the conjecture above is equivalent to the statement that if are three vertices chosen independently at random in this graph, then the probability that is a directed cycle is what you expect for a random tournament, namely 1/8.
One can also make a more general conjecture, namely that the entire (almost) tournament is quasirandom in a sense defined by Chung and Graham, which turns out to be equivalent to the statement that for almost all pairs of dice, the four possible pairs of truth values for the pair of statements
beats beats
each occur with probability approximately 1/4. If this is true, then given random dice , all the possibilities for which beat which have probability approximately . This would imply, for example, that if are independent random -sided dice, then the probability that beats given that beats for all other pairs with is still .
Several of us have done computer experiments to test these conjectures, and it looks as though the first one is true and the second one false. A further reason to be suspicious of the stronger conjecture is that a natural approach to prove it appears to be morally equivalent to a relationship between the correlations of certain random variables that doesn’t seem to have any heuristic justification or to fit with experimental evidence. So although we don’t have a disproof of the stronger conjecture (I think it would be very interesting to find one), it doesn’t seem like a good idea to spend a lot of effort trying to prove it, unless we can somehow explain away the evidence that appears to be stacking up against it.
The first conjecture turns out to be equivalent to a statement that doesn’t mention transitivity. The very quick proof I’ll give here was supplied by Luke Pebody. Suppose we have a tournament (that is, a complete graph with each edge directed in one of the two possible directions) and write for the out-degree of a vertex (that is, the number of such that there is an arrow from to ) and for the in-degree. Then let us count the number of ordered triples such that . Any directed triangle in the tournament will give rise to three such triples, namely and . And any other triangle will give rise to just one: for example, if and we get just the triple . So the number of ordered triples such that and is plus twice the number of directed triangles. Note that is approximately .
But the number of these ordered triples is also . If almost all in-degrees and almost all out-degrees are roughly , then this is approximately , which means that the number of directed triangles is approximately . That is, in this case, the probability that three dice form an intransitive triple is approximately 1/4, as we are hoping from the conjecture. If on the other hand several in-degrees fail to be roughly , then is substantially lower than and we get a noticeably smaller proportion of intransitive triples.
Thus, the weaker conjecture is equivalent to the statement that almost every die beats approximately half the other dice.
Why should a “typical” die beat approximately half the other dice?
The answer to this is fairly simple, heuristically at least. Let be an arbitrary die. For define to be the number of with and define to be . Then
,
from which it follows that .
We also have that if is another die, then
If we make the simplifying assumption that sufficiently infrequently to make no real difference to what is going on (which is not problematic, as a slightly more complicated but still fairly simple function can be used instead of to avoid this problem), then we find that to a reasonable approximation beats if and only if is positive.
So what we would like to prove is that if are chosen independently at random from , then
.
We are therefore led to consider the random variable
where now is chosen uniformly at random from without any condition on the sum. To write this in a more transparent way, let be the random variable , where is chosen uniformly at random from . Then is a sum of independent copies of . What we are interested in is the distribution we obtain when we condition the random variable on .
This should mean that we are in an excellent position, since under appropriate conditions, a lot is known about sums of independent random variables, and it looks very much as though those conditions are satisfied by , at least when is “typical”. Indeed, what we would expect, by the central limit theorem, is that will approximate a bivariate normal distribution with mean 0 (since both and have mean zero). But a bivariate normal distribution is centrally symmetric, so we expect the distribution of to be approximately centrally symmetric, which would imply what we wanted above, since that is equivalent to the statement that .
Why are we not immediately done?
How can we make the above argument rigorous? The central limit theorem on its own is not enough, for two reasons. The first is that it does not give us information about the speed of convergence to a normal distribution, whereas we need a sum of copies of to be close to normal. The second is that the notion of “close to normal” is not precise enough for our purposes: it will allow us to approximate the probability of an event such as but not of a “probability zero” event such as .
The first of these difficulties is not too worrying, since plenty of work has been done on the speed of convergence in the central limit theorem. In particular, there is a famous theorem of Berry and Esseen that is often used when this kind of information is needed.
However, the Berry-Esseen theorem still suffers from the second drawback. To get round that one needs to turn to more precise results still, known as local central limit theorems, often abbreviated to LCLTs. With a local central limit theorem, one can even talk about the probability that takes a specific value after a specific number of steps. Roughly speaking, it says (in its 2-dimensional version) that if is a random variable of mean zero taking values in and if satisfies suitable moment conditions and is not supported in a proper sublattice of , then writing for a sum of copies of , we have that the probability that takes a particular value differs from the “expected” probability (given by a suitable Gaussian formula) by . (I’m not 100% sure I’ve got that right: the theorem in question is Theorem 2.1.1 from this book.)
That looks very close to what we want, but it still falls short. The problem is that the implied constant depends on the random variable . A simple proof of this is that if is not supported in a sublattice but very nearly is — for example, if the probability that it takes a value outside the sublattice is — then one will have to add together an extremely large number of copies of before the sum ceases to be concentrated in the sublattice.
So the situation we appear to be in is the following. We have more precise information about the random variable than is assumed in the LCLT in the reference above, and we want to use that to obtain an explicit constant in the theorem.
It could be that out there in the literature is exactly the result we need, which would be nice, but it also seems possible that we will have to prove an appropriate version of the LCLT for ourselves. I’d prefer the first, but the second wouldn’t be too disappointing, as the problem is quite appealing and even has something of an additive-combinatorial flavour (since it is about describing an iterated convolution of a subset of under appropriate assumptions).
What properties can we establish about the random variable ?
I said above, with no justification, that we have more precise information about the random variable . Let me now try to give the justification.
First of all, we know everything we could possibly want to know about : it is the uniform distribution on . (In particular, if is odd, then it is the uniform distribution on the set of integers in .)
How about the distribution of ? That question is equivalent to asking about the values taken by , and their multiplicities. There is quite a lot one can say about those. For example, I claim that with high probability (if is a random -sided die) is never bigger than . That is because if we choose a fully random sequence , then the expected number of such that is , and the probability that this number differs from by more than is , by standard probabilistic estimates, so if we set , then this is at most , which we can make a lot smaller than by choosing to be, say, . (I think can be taken to be 1/8 if you want me to be more explicit.) Since the probability that is proportional to , it follows that this conclusion continues to hold even after we condition on that event.
Another simple observation is that the values taken by are not contained in a sublattice (assuming, that is, that is ever non-zero). That is simply because and averages zero.
A third simple observation is that with probability 1-o(1) will take a value of at least at least somewhere. I’ll sketch a proof of this. Let be around and let be evenly spaced in , staying away from the end points 1 and . Let be a purely random sequence in . Then the standard deviation of is around , so the probability that it is less than is around . The same is true of the conditional probability that is less than conditioned on the value of (the worst case being when this value is 0). So the probability that this happens for every is at most . This is much smaller than , so the conclusion remains valid when we condition on the sum of the being . So the claim follows. Note that because of the previous simple observation, it follows that must be at least in magnitude at least times, so up to log factors we get that is at least . With a bit more effort, it should be possible to push this up to something more like , since one would expect that would have rough order of magnitude for a positive fraction of the . Maybe this would be a good subproblem to think about, and ideally not too difficult.
How about the joint distribution ? It seems highly likely that for typical this will not be concentrated in a lattice, and that elementary arguments such as the above can be used to prove this. But let me indicate the kind of situation that we would have to prove is not typical. Suppose that and . Then as runs from 1 to 15 the values taken by are and the values taken by are . For this example, all the points live in the lattice of points such that is a multiple of 5.
This wouldn’t necessarily be a disaster for us actually, since the LCLT can be restricted to a sublattice and if after conditioning on we happen to have that is always a multiple of 5, that isn’t a problem if we still have the central symmetry. But it would probably be nicer to prove that it is an atypical occurrence, so that we don’t have to worry about living inside a sublattice (or even being concentrated in one).
My guess is that if we were to pursue these kinds of thoughts, we would end up being able to prove a statement that would say something like that takes a pretty representative sample of values with being between and and being in a range of width around . I would expect, for example, that if we add three or four independent copies of , then we will have a distribution that is similar in character to the uniform distribution on a rectangle of width of order of magnitude and height of order of magnitude . And if that’s true, then adding of them should give us something very close to normal (in an appropriate discrete sense of the word “normal”).
What next?
There are two obvious tasks here. One is to try to prove as much as we can about the random variable . The other is to try to prove a suitable LCLT that is strong enough to give us that the probability that given that is approximately 1/2, under suitable assumptions about . And then we have to hope that what we achieve for the first is sufficient for the second.
It’s possible that the second task can be achieved by simply going through one of the existing proofs of the LCLT and being more careful about the details. But if that’s the case, then we should spend some time trying to find out whether anyone has done it already, since there wouldn’t be much point in duplicating that work. I hope I’ve set out what we want clearly enough for any probabilist who might stumble upon this blog post to be able to point us in the right direction if indeed the result we want is out there somewhere.