Intransitive dice V: we want a local central limit theorem

Gowers's Weblog 2018-03-10

It has become clear that what we need in order to finish off one of the problems about intransitive dice is a suitable version of the local central limit theorem. Roughly speaking, we need a version that is two-dimensional — that is, concerning a random walk on $\mathbb Z^2$ — and completely explicit — that is, giving precise bounds for error terms so that we can be sure that they get small fast enough.

There is a recent paper that does this in the one-dimensional case, though it used an elementary argument, whereas I would prefer to use Fourier analysis. Here I’d like to begin the process of proving a two-dimensional result that is designed with our particular application in mind. If we are successful in doing that, then it would be natural to try to extract from the proof a more general statement, but that is not a priority just yet.

As people often do, I’ll begin with a heuristic argument, and then I’ll discuss how we might try to sharpen it up to the point where it gives us good bounds for the probabilities of individual points of $\mathbb Z^2$ . Much of this post is cut and pasted from comments on the previous post, since it should be more convenient to have it in one place.

Using characteristic functions

The rough idea of the characteristic-functions approach, which I’ll specialize to the 2-dimensional case, is as follows. (Apologies to anyone who knows about this properly for anything idiotic I might accidentally write.) Let $(X,Y)$ be a random variable on $\mathbb Z^2$ and write $f(x,y)$ for $\mathbb P[(X,Y)=(x,y)]$ . If we take $n$ independent copies of $(X,Y)$ and add them together, then the probability of being at $(x,y)$ is

$f*\dots*f(x,y)$

where that denotes the $n$ -fold convolution.

Now let’s define the Fourier transform of $f$ , which probabilists call the characteristic function, in the usual way by

$\hat f(\alpha,\beta)=\sum_{x,y}f(x,y)e^{2\pi i(\alpha x+\beta y)}$ .

Here $\alpha$ and $\beta$ belong to $\mathbb R/\mathbb Z$ , but I’ll sometimes think of them as belonging to $[0,1)$ too.

We have the convolution law that $\widehat{f*g}=\hat f\hat g$ and the inversion formula

$f(x,y)=\int\int \hat f(\alpha,\beta)e^{-2\pi i(\alpha x+\beta y)}\,d\alpha\,d\beta.$

Putting these together, we find that if random variables $(X_i,Y_i)$ are $n$ independent copies of $(X,Y)$ , then the probability that their sum is $(x,y)$ is

$\int\int \hat f(\alpha,\beta)^n e^{-2\pi i(\alpha x+\beta y)}\,d\alpha\,d\beta$ .

The very rough reason that we should now expect a Gaussian formula is that we consider a Taylor expansion of $f$ . We can assume for our application that $X$ and $Y$ have mean zero. From that one can argue that the coefficients of the linear terms in the Taylor expansion are zero. (I’ll give more details in a subsequent comment.) The constant term is 1, and the quadratic terms give us the covariance matrix of $X$ and $Y$ . If we assume that we can approximate $f(\alpha,\beta)$ by an expression of the form $1-q(\alpha,\beta)$ for some suitable quadratic form in $\alpha$ and $\beta$ , then the $n$ th power should be close to $\exp(-nq(\alpha,\beta))$ , and then, since Fourier transforms (and inverse Fourier transforms) take Gaussians to Gaussians, when we invert this one, we should get a Gaussian-type formula for $f(x,y)$ . So far I’m glossing over the point that Gaussians are defined on $\mathbb R^2$ , whereas $\alpha$ and $\beta$ live in $\mathbb T$ and $x$ and $y$ live in $\mathbb Z^2$ , but if most of $\hat f^n$ is supported in a small region around 0, then this turns out not to be too much of a problem.

The Taylor-expansion part

If we take the formula

$\hat f(\alpha,\beta)=\sum_{x,y}f(x,y)e^{2\pi i(\alpha x+\beta y)}$

and partially differentiate $r$ times with respect to $\alpha$ and $s$ times with respect to $\beta$ we obtain the expression

$(2\pi i)^{r+s}\sum_{x,y}x^ry^sf(x,y)e^{2\pi i(\alpha x+\beta y)}$ .

Setting $\alpha=\beta=0$ turns this into $(2\pi i)^{r+s}\mathbb E(X^rY^s)$ . Also, for every $\alpha$ and $\beta$ the absolute value of the partial derivative is at most $(2\pi)^{r+s}\mathbb E(X^rY^s)$ . This allows us to get a very good handle on the Taylor expansion of $\hat f$ when $\alpha$ and $\beta$ are close to the origin.

Recall that the two-dimensional Taylor expansion of $\hat f$ about $(0,0)$ is given by the formula

$\hat f(\alpha,\beta)=\hat f(0,0)+D_1\hat f(0,0)\alpha+D_2\hat f(0,0)\beta+\frac 12D_{11}\hat f(0,0)\alpha^2$ $+D_{12}\hat f(0,0)\alpha\beta+\frac 12 D_{22}\hat f(0,0)\beta^2+\mbox{error term}$

where $D_1$ is the partial derivative operator with respect to the first coordinate, $D_{12}$ the mixed partial derivative, and so on.

In our case, $\hat f(0,0)=\sum_{x,y}f(x,y)=1$ , $D_1\hat f(0,0)=2\pi i\mathbb EX=0$ , and $D_2\hat f(0,0)=2\pi i\mathbb EY=0$ .

As in the one-dimensional case, the error term has an integral representation, namely

$\sum_{j=0}^3\binom 3j \alpha^j\beta^{3-j}\int_0^1 (1-t)^2 (D_1^jD_2^{3-j}\hat f)(t\alpha,t\beta)\,dt$ ,

which has absolute value at most $8\pi^3\sum_{j=1}^3\binom 3j|\alpha^j\beta^{3-j}\mathbb E X^jY^{3-j}|$ , which in turn is at most

$8\pi^3\sum_{j=1}^3\binom 3j|\alpha|^j|\beta|^{3-j}\|X\|_\infty^j\|Y\|_\infty^{3-j}=(|\alpha|\|X\|_\infty+|\beta|\|Y\|_\infty)^3$ .

When $(X,Y)$ is the random variable $(h_A(j),j-(n+1)/2)$ (where $A$ is a fixed die and $j$ is chosen randomly from $[n]$ ), we have that $\|Y\|_\infty=(n-1)/2<n$ .

With very slightly more effort we can get bounds for the moments of $h_A$ as well. For any particular $j$ and a purely random sequence $A=(a_1,\dots,a_n)\in [n]^n$ , the probability that $|h_A(j)|\geq t\sqrt n$ is bounded above by $e^{-ct^2}$ for an absolute constant $c>0$ . (Something like 1/8 will do.) So the probability that there exists such a $j$ conditional on $\sum_ia_i=n(n+1)/2$ (which happens with probability about $1/n$ ) is at most $Cn^2e^{-ct^2}$ , and in particular is small when $t=100\sqrt{\log n}$ . I think that with a bit more effort we could probably prove that $\mathbb E_j |h_A(j)|^3$ is at most $Cn^{3/2}$ , which would allow us to improve the bound for the error term, but I think we can afford the logarithmic factor here, so I won’t worry about this. So we get an error of $C(|\alpha|\sqrt{n\log n}+\beta n)^3$ .

For this error to count as small, we want it to be small compared with the second moments. For the time being I’m just going to assume that the rough size of the second-moment contribution is around $c(|\alpha|\sqrt n+|\beta|n)^2$ . So for our error to be small, we want $\alpha$ to be $o(1/\sqrt{n}(\log n)^{3/2})$ and $\beta$ to be $o(1/n)$ .

That is giving us a rough idea of the domain in which we can say confidently that the terms up to the quadratic ones give a good approximation to $\hat f$ , and hence that $\hat f^n$ is well approximated by a Gaussian.

The remaining part

Outside the domain, we have to do something different, and that something is fairly simple: we shall show that $\hat f^n$ is very small. This is equivalent to showing that $|\hat f|$ is bounded away from 1 by significantly more than $1/n$ . This we do by looking more directly at the formula for the Fourier transform:

$\hat f(\alpha,\beta)=\sum_{x,y}f(x,y)e^{2\pi i(\alpha x+\beta y)}$ .

We would like this to have absolute value bounded away from 1 by significantly more than $1/n$ except when $\alpha$ is quite a bit smaller than $n^{-1/2}(\log n)^{-3/2}$ and $\beta$ is quite a bit smaller than $n^{-1}$ .

Now in our case $f$ is uniformly distributed on the $n$ points $(h_A(j),j-(n+1)/2)$ . So we can write $\hat f$ as

$\hat f(\alpha,\beta)=n^{-1}\sum_je^{2\pi i(\alpha h_A(j)+\beta(j-(n+1)/2))}$ .

Here’s a possible way that we might try to bound that sum. Let $m=n/2$ and let us split up the sum into pairs of terms with $j$ and $j+m$ , for $j\leq m$ . So each pair of terms will take the form

$e^{2\pi i(\alpha h_A(j)+\beta(j-(n+1)/2))}+e^{2\pi i(\alpha h_A(j+m)+\beta(j+m-(n+1)/2))}.$

The ratio of these two terms is

$e^{2\pi i(\alpha(h_A(j)-h_A(j+m))-\beta m)}$ .

And if the ratio is $e^{i\theta}$ , then the modulus of the sum of the two terms is at most $2(1-c\theta^2)$ .

Now let us suppose that as $j$ varies, the differences $h_A(j)-h_A(j+m)$ are mostly reasonably well distributed in an interval between $-\sqrt n$ and $\sqrt n$ , as seems very likely to be the case. Then the ratios above vary in a range from about $-\alpha\sqrt n$ to $\alpha\sqrt n$ . But that should imply that the entire sum, when divided by $n$ , has modulus at most $1-c\alpha^2n$ . (This analysis obviously isn’t correct when $\alpha$ is bigger than $n^{-1/2}$ , since the modulus can’t be negative, but once we’re in that regime, then it really is easy to establish the bounds we want.)

If $\alpha$ is, say $n^{-2/3}$ , then this gives us $1-cn^{-1/3}$ , and raising that to the power $n$ gives us $e^{-cn^{2/3}}$ , which is tiny.

As a quick sanity check, note that for $(1-c\alpha^2n)^n$ not to be tiny we need $\alpha$ to be not much more than $n^{-1}$ . This reflects the fact that a random walk of $n$ steps of typical size about $\sqrt n$ will tend to be at a distance comparable to $n$ from the origin, and when you take the Fourier transform, you take the reciprocals of the distance scales.

If $\alpha$ is quite a bit smaller than $n^{-1/2}$ and $\beta$ is not too much smaller than $n^{-1}$ , then the numbers $\alpha h_A(j)$ are all small but the numbers $\beta(j-(n+1)/2)$ vary quite a bit, so a similar argument can be used to show that in this case too the Fourier transform is not close enough to 1 for its $n$ th power to be large. I won’t give details here.

What is left to do?

If the calculations above are not too wide of the mark, then the main thing that needs to be done is to show that for a typical die $A$ the numbers $h_A(j)$ are reasonably uniform in a range of width around $\sqrt n$ , and more importantly that the numbers $h_A(j)-h_A(j+m)$ are not too constant: basically I’d like them to be pretty uniform too.

It’s possible that we might want to try a slightly different approach, which is to take the uniform distribution on the set of points $(h_A(j),j-(n+1)/2)$ , convolve it once with itself, and argue that the resulting probability distribution is reasonably uniform in a rectangle of width around $n$ and height around $\sqrt n$ . By that I mean that a significant proportion of the points are hit around $\sqrt n$ times each (because there are $n^2$ sums and they lie in a rectangle of area $n^{3/2}$ ). But one way or another, I feel pretty confident that we will be able to bound this Fourier transform and get the local central limit theorem we need.