Intransitive dice III

Gowers's Weblog 2018-03-10

I now feel more optimistic about the prospects for this project. I don’t know whether we’ll solve the problem, but I think there’s a chance. But it seems that there is after all enough appetite to make it an “official” Polymath project. Perhaps we could also have an understanding that the pace of the project will be a little slower than it has been for most other projects. I myself have various other mathematical projects on the boil, so can’t spend too much time on this one, but quite like the idea of giving it an occasional go when the mood takes me, and trying to make slow but steady progress. So I’ve created a polymath13 category, into which this post now fits. I’ve also retrospectively changed the category for the previous two posts. I don’t think we’ve got to the stage where a wiki will be particularly useful, but I don’t rule that out at some point in the future.

In this post I want to expand on part of the previous one, to try to understand better what would need to be true for the quasirandomness assertion to be true. I’ll repeat a few simple definitions and simple facts needed to make the post more self-contained.

By an nsided die I mean a sequence in [n]^n (where [n] is shorthand for \{1,2,\dots,n\}) that adds up to n(n+1)/2. Given an n-sided die A=(a_1,\dots,a_n) and j\in[n], I define g_A(j) to be the number of i such that a_i\leq j and h_A(j) to be g_A(j)-j.

We can write h_A(j) as \sum_i\mathbf 1_{[a_i\leq j]}-j. Therefore, if C=(c_1,\dots,c_n) is another die, or even just an arbitrary sequence in [n]^n, we have that \sum_jh_A(c_j)=\sum_j(\sum_i\mathbf 1_{[a_i\leq c_j]}-c_j)=\sum_{i,j}\mathbf 1_{[a_i\leq c_j]} - \sum_jc_j. If \sum_jc_j=n(n+1)/2 and no a_i is equal to any c_j, then the sign of this sum therefore tells us whether A beats C. For most A, we don’t expect many ties, so the sign of the sum is a reasonable, but not perfect, proxy for which of the two dice wins. (With a slightly more complicated function we can avoid the problem of ties: I shall stick with the simpler one for ease of exposition, but would expect that if proofs could be got to work, then we would switch to the more complicated functions.)

This motivates the following question. Let A and B be two random dice. Is it the case that with high probability the remaining dice C are split into four sets of roughly equal size according to the signs of h_A(C) and h_B(C)? I expect the answer to this question to be the same as the answer to the original transitivity question, but I haven’t checked as carefully as I should that my cavalier approach to ties isn’t problematic.

I propose the following way of tackling this question. We fix A and B and then choose a purely random sequence C=(c_1,\dots,c_n) (that is, with no constraint on the sum) and look at the 3D random variable (\sum_jh_A(c_j),\sum_jh_B(c_j),\sum_j(c_j-(n+1)/2)). Each coordinate separately is a sum of independent random variables with mean zero, so provided not too many of the h_A or h_B are zero, which for random A and B is a reasonable assumption, we should get something that approximates a trivariate normal distribution.

Therefore, we should expect that when we condition on \sum_j(c_j-(n+1)/2) being zero, we will get something that approximates a bivariate normal distribution. Although that may not be completely straightforward to prove rigorously, tools such as the Berry-Esseen theorem ought to be helpful, and I’d be surprised if this was impossibly hard. But for now I’m aiming at a heuristic argument, so I want simply to assume it.

What we want is for the signs of the first two coordinates to be approximately independent, which I think is equivalent to saying (assuming normality) that the first two coordinates themselves are approximately independent.

However, what makes the question interesting is that the first two coordinates are definitely not independent without the conditioning: the random variables \sum_jh_A(c_j) and \sum_jh_B(c_j) are typically quite strongly correlated. (There are good reasons to expect this to be the case, and I’ve tested it computationally too.) Also, we expect correlations between these variables and \sum_j(c_j-(n+1)/2). So what we are asking for is that all these correlations should disappear when we condition appropriately. More geometrically, there is a certain ellipsoid, and we want its intersection with a certain plane to be a circle.

The main aim of this post is to make the last paragraph more precise. That is, I want to take three standard normal random variables X, Y and Z that are not independent, and understand precisely the circumstances that guarantee that X and Y become independent when we condition on Z.

The joint distribution of (X,Y,Z) is determined by the matrix of correlations. Let this matrix be split up as \begin{pmatrix}\Sigma_{11}&\Sigma_{12}\\ \Sigma_{21}&\Sigma_{22}\\ \end{pmatrix}, where \Sigma_{11} is the 2\times 2 covariance matrix of (X,Y), \Sigma_{12} is a 2\times 1 matrix, \Sigma_{21} is a 1\times 2 matrix and \Sigma_{22} is the matrix (1). A general result about conditioning joint normal distributions on a subset of the variables tells us, if I understand the result correctly, that the covariance matrix of (X,Y) when we condition on the value of Z is \Sigma_{11}-\Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}. (I got this from Wikipedia. It seems to be quite tricky to prove, so I hope it really can be used as a black box.) So in our case if we have a covariance matrix \begin{pmatrix}1&\lambda&\mu\\ \lambda&1&\nu\\ \mu&\nu&1\\ \end{pmatrix} then the covariance matrix of (X,Y) conditioned on Z should be \begin{pmatrix}1-\mu^2&\lambda-\mu\nu\\ \lambda-\mu\nu&1-\nu^2\\ \end{pmatrix}.

That looks dimensionally odd because I normalized the random variables to have variance 1. If instead I had started with the more general covariance matrix \begin{pmatrix}a&\lambda&\mu\\ \lambda&b&\nu\\ \mu&\nu&c\\ \end{pmatrix} I would have ended up with \begin{pmatrix}a-c^{-1}\mu^2&\lambda-c^{-1}\mu\nu\\ \lambda-c^{-1}\mu\nu&b-c^{-1}\nu^2\\ \end{pmatrix}.

So after the conditioning, if we want X and Y to become independent, we appear to want \lambda c to equal \mu\nu. That is, we want \langle X,Z\rangle\langle Y,Z\rangle=\langle X,Y\rangle\langle Z,Z\rangle where I am using angle brackets for covariances.

If we divide each variable by its standard deviation, that gives us that the correlation between X and Y should be the product of the correlation between X and Z and the correlation between Y and Z.

I wrote some code to test this, and it seemed not to be the case, anything like, but I am not confident that I didn’t make careless mistakes in the code. (However, my correlations were reasonable numbers in the range [-1,1], so any mistakes there might have been didn’t jump out at me. I might just rewrite the code from scratch without looking at the old version.)

One final remark I’d like to make is that if you feel there is something familiar about the expression \langle x,z\rangle\langle y,z\rangle-\langle x,y\rangle\langle z,z\rangle, then you are not completely wrong. The formula for the vector triple product is

x\wedge(y\wedge z)=\langle x,z\rangle y - \langle x,y\rangle z.

Therefore, the expression can be condensed to \langle x\wedge(y\wedge z),z\rangle. Now this is the scalar triple product of the three vectors x, y\wedge z, and z. For this to be zero, we need x to lie in the plane generated by y\wedge z and z. Note that y\wedge z is orthogonal to both y and z. So if P is the orthogonal projection to the subspace generated by z, we want x-Px to be orthogonal to y. Actually, that can be read out of the original formula too, since it is \langle\langle x,z\rangle z - \langle z,z\rangle x,y\rangle. A nicer way of thinking of it (because more symmetrical) is that we want the orthogonal projections of x and y to the subspace orthogonal to z to be orthogonal. To check that, assuming (WLOG) that \|z\|_2=1, \langle x-\langle x,z\rangle z,y-\langle y,z\rangle z\rangle=\langle x,y\rangle-\langle x,z\rangle\langle y,z\rangle.

So what I’d like to see done (but I’m certainly not saying it’s the only thing worth doing) is the following.

1. Test experimentally whether for a random pair (A,B) of n-sided dice we find that the correlations of the random variables X=\sum_jh_A(j), Y=\sum_jh_B(j) and Z=\sum_j(c_j-(n+1)/2) really do appear to satisfy the relationship

corr(X,Z).corr(Y,Z)= corr(X,Y).

Here the c_j are chosen randomly without any conditioning on their sum. My experiment seemed to indicate not, but I’m hoping I made a mistake.

2. If they do satisfy that relationship, then we can start to think about why.

3. If they do not satisfy it, then we can start to think about why not. In particular, which of the heuristic assumptions used to suggest that they should satisfy that relationship is wrong — or is it my understanding of multivariate normals that is faulty?

If we manage to prove that they typically do satisfy that relationship, at least approximately, then we can think about whether various distributions become sufficiently normal sufficiently quickly for that to imply that intransitivity occurs with probability 1/4.