Intransitive dice II

Gowers's Weblog 2018-03-10

I’m not getting the feeling that this intransitive-dice problem is taking off as a Polymath project. However, I myself like the problem enough to want to think about it some more. So here’s a post with some observations and with a few suggested subproblems that shouldn’t be hard to solve and that should shed light on the main problem. If the rate of comments by people other than me doesn’t pick up, then I think I’ll simply conclude that there wasn’t sufficient interest to run the project. However, if I do that, I have a back-up plan, which is to switch to a more traditional collaboration — that is, done privately with a small number of people. The one non-traditional aspect of it will be that the people who join the collaboration will select themselves by emailing me and asking to be part of it. And if the problem gets solved, it will be a normal multi-author paper. (There’s potentially a small problem if someone asks to join in with the collaboration and then contributes very little to it, but we can try to work out some sort of “deal” in advance.)

But I haven’t got to that point yet: let me see whether a second public post generates any more reaction.

I’ll start by collecting a few thoughts that have already been made in comments. And I’ll start that with some definitions. First of all, I’m going to change the definition of a die. This is because it probably makes sense to try to prove rigorous results for the simplest model for which they are true, and random multisets are a little bit frightening. But I am told that experiments suggest that the conjectured phenomenon occurs for the following model as well. We define an n-sided die to be a sequence A=(a_1,\dots,a_n) of integers between 1 and n such that \sum_ia_i=n(n+1)/2. A random n-sided die is just one of those chosen uniformly from the set of all of them. We say that A beats B if \sum_{i,j}\mathbf 1_{[a_i>b_j]}>\sum_{i,j}\mathbf 1_{[a_i<b_j]}. That is, A beats B if the probability, when you roll the two dice, that A shows a higher number than B is greater than the probability that B shows a higher number than A. If the two probabilities are equal then we say that A ties with B.

The main two conjectures are that the probability that two dice tie with each other tends to zero as n tends to infinity and that the “beats” relation is pretty well random. This has a precise meaning, but one manifestation of this randomness is that if you choose three dice A, B and C uniformly at random and are given that A beats B and B beats C, then the probability that A beats C is, for large n, approximately 1/2. In other words, transitivity doesn’t happen any more often than it does for a random tournament. (Recall that a tournament is a complete graph in which every edge is directed.)

Now let me define a function that helps one think about dice. Given a die A, define a function f_A on the set [n]=\{1,2,\dots,n\} by f_A(j)=\sum_i(\mathbf 1_{[a_i>j]}-\mathbf 1_{[a_i<j]})=|\{i:a_i>j\}|-|\{i:a_i<j\}|. Then it follows immediately from the definitions that A beats B if \sum_jf_A(b_j)>0, which is equivalent to the statement that \sum_jf_B(a_j)<0.

If the “beats” tournament is quasirandom, then we would expect that for almost every pair of dice A,B the remaining dice are split into four parts of roughly equal sizes, according to whether they beat A and whether they beat B. So for a typical pair of dice A,B we would like to show that \sum_jf_A(c_j)>0 for roughly half of all dice C, and \sum_jf_B(c_j)>0 for roughly half of all dice C, and that these two events have almost no correlation.

It is critical here that the sums should be fixed. Otherwise, if we are told that A beats B, the most likely explanation is that the sum of A is a bit bigger than the sum of B, and then A is significantly more likely to beat C than B is.

Note that for every die A we have \sum_jf_A(j)=\sum_{i,j}(\mathbf 1_{[a_i>j]}-\mathbf 1_{[a_i<j]}) =\sum_i((a_i-1)-(n-a_i))=2\sum_ia_i-n(n+1)=0. That is, every die ties with the die (1,2,\dots,n).

Now let me modify the functions f_A to make them a bit easier to think about, though not quite as directly related to the “beats” relation (though everything can be suitably translated). Define g_A(j) to be |\{i:a_i\leq j\}| and h_A(j) to be g_A(j)-j. Note that f_A(j)=(n-g_A(j))-g_A(j-1) which would normally be approximately equal to n-2g_A(j).

We are therefore interested in sums such as \sum_jg_A(c_j). I would therefore like to get a picture of what a typical sequence (g_A(1),\dots,g_A(n)) looks like. I’m pretty sure that g_A(j) has mean j. I also think it is distributed approximately normally around j. But I would also like to know about how g_A(j_1) and g_A(j_2) correlate, since this will help us get some idea of the variance of \sum_jg_A(c_j), which, if everything in sight is roughly normal, will pin down the distribution. I’d also like to know about the covariance of \sum_jg_A(c_j) and \sum_jg_B(c_j), or similar quantities anyway, but I don’t want to walk before I can fly.

Anyhow, I had the good fortune to see Persi Diaconis a couple of days ago, and he assured me that the kind of thing I wanted to understand had been studied thoroughly by probabilists and comes under the name “constrained limit theorems”. I’ve subsequently Googled that phrase and found some fairly old papers written in the typical uncompromising style and level of generality of their day, which leaves me thinking that it may be simpler to work a few things out from scratch. The main purpose of this post is to set out some exercises that have that as their goal.

What is the average of g_A(j)?

Suppose, then, that we have a random n-sided die A. Let’s begin by asking for a proper proof that the mean of g_A(j) is j. It clearly is if we choose a purely random n-tuple of elements of [n], but what happens if we constrain the average to be (n+1)/2?

I don’t see an easy proof. In fact, I’m not sure it’s true, and here’s why. The average will always be j if and only if the probability that a_1\leq j is always equal to j/n, and that is true if and only if a_1 is uniformly distributed. (The distributions of the a_i are of course identical, but — equally of course — not independent.) But do we expect a_1 to be uniformly distributed? No we don’t: if a_1=(n+1)/2 that will surely make it easier for the global average to be (n+1)/2 than if a_1=n.

However, I would be surprised if it were not at least approximately true. Here is how I would suggest proving it. (I stress that I am not claiming that this is an unknown result, or something that would detain a professional probabilist for more than two minutes — that is why I used the word “exercise” above. But I hope these questions will be useful exercises.)

The basic problem we want to solve is this: if a_1,\dots,a_n are chosen independently and uniformly from [n], then what is the conditional probability that a_1=j given that the average of the a_i is exactly (n+1)/2?

It’s not the aim of this post to give solutions, but I will at least say why I think that the problems aren’t too hard. In this case, we can use Bayes’s theorem. Using well-known estimates for sums of independent random variables, we can give good approximations to the probability that the sum is n(n+1)/2 and of the probability of that given that a_1=j (which is just the probability that the sum of the remaining n-1 a_is is n(n+1)/2-j\ ). We also know that the probability that a_1=j is 1/n. So we have all the information we need. I haven’t done the calculation, but my guess is that the tendency for a_1 to be closer to the middle than to the extremes is not very pronounced.

In fact, here’s a rough argument for that. If we choose a_i uniformly from [n], then the variance is about n^2/12. So the variance of the sum of the a_i (in the fully independent case) is about n^3/12, so the standard deviation is proportional to n^{3/2}. But if that’s the case, then the probability that the sum equals n(n+1)/2+t is roughly constant for t=O(n).

I think it should be possible to use similar reasoning to prove that if m=o(\sqrt n), then a_1,\dots,a_m are approximately independent. (Of course, this would apply to any m of the a_i, if correct.)

How is g_A(j) distributed?

What is the probability that j+s of the a_i are at most j? Again, it seems to me that Bayes’s theorem and facts about sums of independent random variables are enough for this. We want the probability of the above event given that \sum_ia_i=n(n+1)/2. By Bayes’s theorem, we can work this out if we know the probability that \sum_ia_i=n(n+1)/2 given that g_A(j)=j+s, together with the probability that \sum_ia_i=n(n+1)/2 and the probability that g_A(j)=j+s, in both cases when A is chosen fully independently. The last two calculations are simple. The first one isn’t 100% simple, but it doesn’t look too bad. We have a sum of j+s random variables that are uniform on [j] and n-j-s that are uniform on \{j+1,\dots,n\} and we want to know how likely it is that they add up to n(n+1)/2. We could do this by conditioning on the possible values of the two sums, which then leaves us with sums of independent variables, and adding up all the results. It looks to me as though that calculation shouldn’t be too unpleasant. What I would recommend is to do the calculation on the assumption that the distributions are normal (in a suitable discrete sense) with whatever mean and variance they have to have, since that will yield an answer that is almost certainly correct. A rigorous proof can come later, and shouldn’t be too much harder.

The answer I expect and hope for is that g_A(j) is approximately normally distributed with mean j and a variance that would come out of the calculations.

What is the joint distribution of g_A(j_1) and g_A(j_2)?

This can in principle be done by exactly the same technique, except that now things get one step nastier because we have to condition on the sum of the a_i that are at most j_1, the sum of the a_i that are between j_1+1 and j_2, and the sum of the rest. So we end up with a double sum of products of three probabilities at the end instead of a single sum of products of two probabilities. The reason I haven’t done this is that I am quite busy with other things and the calculation will need a strong stomach. I’d be very happy if someone else did it. But if not, I will attempt it at some point over the next … well, I don’t want to commit myself too strongly, but perhaps the next week or two. At this stage I’m just interested in the heuristic approach — assume that probabilities one knows are roughly normal are in fact given by an exact formula of the form Ae^{-\lambda (x-\mu)^2}.

For some experimental evidence about this, see a comment by Ian on the previous post, which links to some nice visualizations. Ian, if you’re reading this, it would take you about another minute, I’d have thought, to choose a few random dice A and plot the graphs h_A. It would be interesting to see such plots to get an idea of what a typical one looks like: roughly how often does it change sign, for example?

What is a heuristic argument for the “A beats B” tournament being quasirandom?

I have much less to say here — in particular, I don’t have a satisfactory answer. But I haven’t spent serious time on it, and I think it should be possible to get one.

One slight simplification is that we don’t have to think too hard about whether A beats B when we are thinking about the three dice A, B and C. As I commented above, the tournament will be quasirandom (I think I’m right in saying) if for almost every A and B the events “A beats C” and “B beats C” have probability roughly 1/2 each and are hardly correlated.

A good starting point would be the first part. Is it true that almost every die beats approximately half the other dice? This question was also recommended by Bogdan Grechuk in a comment on the previous post. He suggested, as a preliminary question, the question of finding a good sufficient condition on a die for this to be the case.

That I think is approachable too. Let’s fix some function g_A without worrying too much about whether it comes from a die (but I have no objection to assuming that it is non-decreasing and that g_A(n)=n, should that be helpful). Under what conditions can we be confident that the sum \sum_jg_A(c_j) is greater than n(n+1)/2 with probability roughly 1/2, where C=(c_1,\dots,c_n) is a random die?

Assuming it’s correct that each c_j is roughly uniform, \sum_jg_A(c_j) is going to average \sum_jg_A(j), which if A is a die will be close to n(n+1)/2. But we need to know rather more than that in order to obtain the probability in question.

But I think the Bayes approach may still work. We’d like to nail down the distribution of \sum_jg_A(c_j) given that \sum_jc_j=n(n+1)/2. So we can look at \mathbb P[\sum_jg_A(c_j)=n(n+1)/2+t|\sum_jc_j=n(n+1)/2], where now the c_i are chosen uniformly and independently. Calling that \mathbb P[X|Y], we find that it’s going to be fairly easy to estimate the probabilities of X and Y. However, it doesn’t seem to be notably easier to calculate \mathbb P[Y|X] than it is to calculate \mathbb P[X|Y]. But we have made at least one huge gain, which is that now the c_j are independent, so I’d be very surprised if people don’t know how to estimate this probability. Indeed, the probability we really want to know is \mathbb P[X\wedge Y]. From that all else should follow. And I think that what we’d like is a nice condition on g_A that would tell us that the two events are approximately independent.

I’d better stop here, but I hope I will have persuaded at least some people that there’s some reasonably low-hanging fruit around, at least for the time being.