The counter-intuitive behaviour of high-dimensional spaces

Wildon's Weblog 2020-04-19

This post is an extended version of a talk I last gave a few years ago on an extended Summer visit to Bristol; three weeks into lockdown seems like a good time to write it up for a wider audience. The overall thesis is that it’s hard to have a good intuition for really high-dimensional spaces, and that the reason is that, asked to picture such a space, most of it come far closer to something like \mathbb{R}^4 then the more accurate \mathbb{F}_2^{1000}. Some details are left to the (highly optional) exercise sheet (with solutions) that accompanies this post. Sources are acknowledged at the end of the post.

For the moment, I’ve only typed up the section on Euclidean spaces.

As a warm up for my talk, I invited to audience to order the following cardinals:

2, 2^{56}, 10^{120}, \mathbb{N}, 2^\mathbb{N}, 2^{2^\mathbb{N}}.

Of course they are already correctly ordered by size. From the perspective of ‘effective computation’, I claim that 2 and 2^{56} are indistinguishable and ‘tiny’, 10^{120}, \mathbb{N} and 2^\mathbb{N} are indistinguishable and ‘huge’ and 2^{2^\mathbb{N}} sits somewhere in the middle. To give some hint of the most surprising part of this, interpret 2^\mathbb{N} as the Cantor set C, so 2^{2^\mathbb{N}} is the set of subsets of C. Computationally definable subsets of C are then computable predicates on C (i.e. functions C \rightarrow \{T, F\}), and since C is compact, it is computable whether two such predicates are equal. In contrast, there is no algorithm that, given two predicates on \mathbb{N}, will run for a finite length of time and then return the correct answer.

Euclidean space and unit balls.

Let B^n = \{x \in \mathbb{R}^n : ||x|| \le 1\} be the solid n-dimensional unit ball and let S^n = \{x \in \mathbb{R}^{n+1} : ||x|| = 1\} be the n-dimensional surface of B^{n+1}.

House prices in Sphereland

Imagine a hypothetical ‘Sphereland’ whose inhabitants live uniformly distributed on the surface S^n. (For instance, \mathbb{S}^2 is the one-point compactification of Flatland.) With a god’s eye view you are at the origin, and survey the world. At what values of the nth coordinate are most of the inhabitants to be found?

For example, the diagram below shows the case n=2 with two regions of equal height shaded.

It is a remarkable fact, going all the way back to Archimedes, that the surface areas of the red and blue regions are equal: the reduction in cross-section as we go upwards is exactly compensated by the shallower slope. For a calculus proof, observe that in the usual Cartesian coordinate system, we can parametrize the part of the surface with x=0 and z > 0 by (0,y,\sqrt{1-y^2}). Choose k > 0. Then since (0,-z,y) is orthogonal to the gradient of the sphere at (0,y,z), to first order, if we increase the height z from z to z+k then we must decrease the second coordinate y from $y$ to $y-z/y k$. This is shown in the diagram below.

Hence the norm squared of the marked line segment tangent to the surface is

\displaystyle || (0, -\frac{z}{y}k, k) ||^2 = \frac{z^2k^2}{y^2} + k^2 = k^2 \frac{z^2+y^2}{y^2}  = k^2 \frac{1}{1-z^2}.

Hence, as Archimedes knew, the area of the region R between the lines of latitude of height between z and z+k is (to first order in k) the product of k/\sqrt{1-z^2} and the circumference of the line of latitude at height z. It is therefore

\displaystyle \frac{k}{\sqrt{1-z^2}} \times 2 \pi \sqrt{1-z^2} = 2\pi k

which is independent of z. As a small check, integrating over $altex z$ from -1 to 1 we get 4\pi for the surface area of the sphere; as expected this is also the surface area of the enclosing cylinder of radius 1 and height 2.

This cancellation in the displayed formula above is special to the case n=2. For instance, when n=1, by the argument just given, the arclength at height z is proportional to 1/\sqrt{1-z^2}. Therefore in the one-point compactification of Lineland (a place one might feel is already insular enough), from the god’s perspective (considering slices with varying z), most of the inhabitants are near the north and south poles (z = \pm 1), and almost no-one lives on the equation (z=0).

More generally, the surface area of S^n is given by integrating the product of the arclength 1/\sqrt{1-z^2} and the surface area of the ‘latitude crosssection’ at height z. The latter is the (n-1)-dimensional sphere of radius \sqrt{1-z^2}. By dimensional analysis, its surface area is C\bigl( \sqrt{1-z^2}\bigr)^{n-1} for some constant C. Hence the density of Sphereland inhabitants at height z is proportional to \bigl( \sqrt{1-z^2} \bigr)^{n-2}. (A complete proof of this using only elementary calculus is given in Exercise 1 of the problem sheet.) In particular, we see that n is large, almost all the density is at the equator z=0. From the god’s eye perspective, in high-dimensional Sphereland, everyone lives on the equator, or a tiny distance from it. To emphasise this point, here are the probability density functions for small values of n.

This seems deeply unintuitive to me. There is, after all, nothing intrinsic about the z-coordinate. Indeed, the god can pick any hyperplane in \mathbb{R}^{n+1} through the origin, and get a similar conclusion. In particular, if a random inhabitant of Spaceland is at (x_1,\ldots, x_{n+1}) then it is almost certain that most of the x_i are very small.

Concentration of measure

One could make this precise by continuing with differential geometry, but Henry Cohn’s answer to this Mathoverflow question on concentration of measure gives an elegant alternative approach. Let X_1, \ldots, X_n be independent identically distributed normal random variables each with mean 0 and variance 1/n. Then \mathbb{E}[X_1^2 + \cdots + X_n^2] = 1, and so, by the Weak Law of Large Numbers, the probability that X_1^2 + \cdots + X_n^2 \not\in (1-\epsilon,1+\epsilon) tends to 0 as n \rightarrow \infty. Therefore we can, with small error, regard (X_1,\ldots, X_n) as a uniformly chosen point on S^{n-1}. By Markov’s inequality, the probability that |X_1| \ge c/\sqrt{n} (or equivalently, X_1^2 \ge c^2/n) is at most 1/c^2, and the probability that |X_1| \ge 1/n^{1/3} (or equivalently, X_1^2 \ge n^{2/3}) is at most 1/n^{1/6}. Since

(1 - \frac{1}{n^{1/6}})^n \approx e^{-n/n^{1/6}} \rightarrow 0

as n \rightarrow \infty, we see that with high probability, a random Sphereland inhabitant has all its coordinates in (-1/n^{1/3}, 1/n^{1/3}). I think this makes the counter-intuitive conclusion of the previous subsection even starker.

Volume of the unit ball

The ‘length of tangent line times area of cross-section’ argument says that if A(S^n) is the surface area of S^n then

\begin{aligned} \displaystyle A(S^n) &= \int_{-1}^1 \frac{A(S^{n-1}) \sqrt{1-z^2}^{n-1}}{\sqrt{1-z^2}} \mathrm{d} z \\ &= A(S^{n-1}) \int_{-1}^1 \sqrt{1-z^2}^{n-2} \mathrm{d} z. \end{aligned}

A quick consultation of Mathematica to evaluate the integral shows that

\displaystyle A(S^n) = A(S^{n-1}) \frac{\sqrt{\pi}\Gamma(\frac{n}{2})}{\Gamma(\frac{n+1}{2})}.

It follows easily by induction that A(S^n) = 2\sqrt{\pi}^{n+1} / \Gamma(\frac{n+1}{2}). Since B^n = \bigcup_{r=0}^1 rS^{n-1} and A(rS^{n-1}) = r^{n-1} A(S^{n-1}), the volume V_n of B^n satisfies

V_n = \displaystyle \frac{2\sqrt{\pi}^n}{n\Gamma(\frac{n}{2} )}.

In particular, from \Gamma(t) = (t-1)! for $t \in \mathbb{N}$ we get V_{2m} = \pi^m / (m-1)!. Hence V_{2(m+1)} = \pi/m V_{2m} (is there a quick way to see this?), and the proportion of the cube [-1,1]^{2m} occupied by the sphere S^{2m} is

\displaystyle \frac{(\pi/2)^m}{m!}.

Clearly this proportion tends to 0, very quickly. I find this somewhat surprising, since my mental picture of a sphere is as a bulging convex object that should somehow `fill out’ an enclosing cube. This is clearly completely misconceived.

Sources

The results on volumes of spheres and balls are cobbled together from several MathOverflow questions and answers, in particular Joseph O’Rourke’s question asking for an intuitive explanation of the concentration of measure phenomenon, and S. Carnahan’s answer to a question on the volume of the n-dimensional unit ball. The coding theory results go back to Shannon, and can be found in many textbooks, for example Van Lint’s Introduction to coding theory. The diagrams below were drawn using TikZ with the help of these macros due to Tomasz M. Trzeciak. The material about computability is either very basic or comes from a blog post by Martín Escardó (on Andrej Bauer’s blog) giving a wonderfully accessible account of Infinite sets that admit fast exhaustive search, published in the proceedings of Logic in Computer Science 2007.