Reflections on the recent solution of the cap-set problem I

Gowers's Weblog 2018-03-10

Sometimes blog posts about recent breakthroughs can be useful because they convey the main ideas of a proof without getting bogged down in the technical details. But the recent solution of the cap-set problem by Jordan Ellenberg, and independently and fractionally later by Dion Gijswijt, both making crucial use of an amazing lemma of Croot, Lev and Pach that was made public a week or so before, does not really invite that kind of post, since the papers are so short, and the ideas so transparent, that it’s hard to know how a blog post can explain them more clearly.

But as I’ve got a history with this problem, including posting about it on this blog in the past, I feel I can’t just not react. So in this post and a subsequent one (or ones) I want to do three things. The first is just to try to describe my own personal reaction to these events. The second is more mathematically interesting. As regular readers of this blog will know, I have a strong interest in the question of where mathematical ideas come from, and a strong conviction that they always result from a fairly systematic process — and that the opposite impression, that some ideas are incredible bolts from the blue that require “genius” or “sudden inspiration” to find, is an illusion that results from the way mathematicians present their proofs after they have discovered them.

From time to time an argument comes along that appears to present a stiff challenge to my view. The solution to the cap-set problem is a very good example: it’s easy to understand the proof, but the argument has a magic quality that leaves one wondering how on earth anybody thought of it. I’m referring particularly to the Croot-Lev-Pach lemma here. I don’t pretend to have a complete account of how the idea might have been discovered (if any of Ernie, Seva or Peter, or indeed anybody else, want to comment about this here, that would be extremely welcome), but I have some remarks.

The third thing I’d like to do reflects another interest of mine, which is avoiding duplication of effort. I’ve spent a little time thinking about whether there is a cheap way of getting a Behrend-type bound for Roth’s theorem out of these ideas (and I’m not the only one). Although I wasn’t expecting the answer to be yes, I think there is some value in publicizing some of the dead ends I’ve come across. Maybe it will save others from exploring them, or maybe, just maybe, it will stimulate somebody to find a way past the barriers that seem to be there.

Personal reflections

There’s not actually all that much to say here. I just wanted to comment on a phenomenon that’s part of mathematical life: the feeling of ambivalence one has when a favourite problem is solved by someone else. The existence of such a feeling is hardly a surprise, but slightly more interesting are the conditions that make it more or less painful. For me, an extreme example where it was not at all painful was Wiles’s solution of Fermat’s Last Theorem. I was in completely the wrong area of mathematics to have a hope of solving that problem, so although I had been fascinated by it since boyhood, I could nevertheless celebrate in an uncomplicated way the fact that it had been solved in my lifetime, something that I hadn’t expected.

Towards the other end of the spectrum for me personally was Tom Sanders’s quasipolynomial version of the Bogolyubov-Ruzsa lemma (which was closely related to his bound for Roth’s theorem). That was a problem I had worked on very hard, and some of the ideas I had had were, it turned out, somewhat in the right direction. But Tom got things to work, with the help of further ideas that I had definitely not had, and by the time he solved the problem I had gone for several years without seriously working on it. So on balance, my excitement at the solution was a lot greater than the disappointment that that particular dream had died.

The cap-set problem was another of my favourite problems, and one I intended to return to. But here I feel oddly un-disappointed. The main reason is that I know that if I had started work on it again, I would have continued to try to push the Fourier methods that have been so thoroughly displaced by the Lev-Croot-Pach lemma, and would probably have got nowhere. So the discovery of this proof has saved me from wasting a lot of time at some point in the future. It’s also an incredible bonus that the proof is so short and easy to understand. I could almost feel my brain expanding as I read Jordan Ellenberg’s preprint and realized that here was a major new technique to add to the toolbox. Of course, the polynomial method is not new, but somehow this application of it, at least for me, feels like one where I can make some headway with understanding why it works, rather than just gasping in admiration at each new application and wondering how on earth anyone thought of it.

Why polynomials?

That brings me neatly on to the next theme of this post. From now on I shall assume familiarity with the argument as presented by Jordan Ellenberg, but here is a very brief recap.

The key to it is the lemma of Croot, Lev and Pach (very slightly modified), which states that if A\subset\mathbb F_3^n and P is a polynomial of degree d in variables x_1,\dots,x_n such that P(x+y)=0 for every pair of distinct elements x,y\in A, then P(2x) is non-zero for at most 2m_{d/2} values of x\in A, where m_{d/2} is the dimension of the space of polynomials in x_1,\dots,x_n of degree at most d/2.

Why does this help? Well, the monomials we consider are of the form x_1^{a_1}\dots x_n^{a_n} where each a_i\in\{0,1,2\}. The expected degree of a random such monomial is n, and for large n the degree is strongly concentrated about its mean. In particular, if we choose d=4n/3, then the probability that a random monomial has degree greater than d is exponentially small, and the probability that a random monomial has degree less than d/2 is also exponentially small.

Therefore, the dimension of the space of polynomials of degree at most d (for this d) is at least 3^n(1-c^n), while the dimension of the space of polynomials of degree at most d/2 is at most 3^nc^n. Here c is some constant less than 1. It follows that if A is a set of density greater than c^n we can find a polynomial of degree d that vanishes everywhere on (2.A)^c and doesn’t vanish on 2.A. Furthermore, if A has density a bit bigger than this — say 6c^n, we can find a polynomial of degree d that vanishes on (2.A)^c and is non-zero at more than 2m_{d/2} points of 2.A. Therefore, by the lemma, it cannot vanish on all x+y with x,y distinct elements of A, which implies that there exist distinct x,y\in A such that x+y=2z for some z\in A.

Now let us think about the Croot-Lev-Pach lemma. It is proved by a linear algebra argument: we define a map \phi:\mathbb F_3^n\to V, where V is a certain vector space over \mathbb F_3 of dimension k=2m_{d/2}, and we also define a bilinear form \langle .,.\rangle on V, with the property that P(x+y)=\langle\phi(x),\phi(y)\rangle for every x,y\in\mathbb F_3^n. Then the conditions on P translate into the condition that \langle\phi(x),\phi(y)\rangle=0 for all distinct x,y\in A. But if P is non-zero at more than k points in 2.A, that gives us x_1,\dots,x_{k+1} such that \langle \phi(x_i),\phi(x_j)\rangle=0 if and only if i\ne j, which implies that \phi(x_1),\dots,\phi(x_{k+1}) are linearly independent, which they can’t be as they all live in the k-dimensional space V.

The crucial thing that makes this lemma useful is that we have a huge space of functions — of almost full dimension — each of which can be represented this way with a very small k.

The question I want to think about is the following. Suppose somebody had realized that they could bound the size of an AP-free set by finding an almost full-dimensional space of functions, each of which had a representation of the form f(x+y)=\langle\phi(x),\phi(y)\rangle, where \phi took values in a low-dimensional vector space V. How might they have come to realize that polynomials could do the job? Answering this question doesn’t solve the mystery of how the proof was discovered, since the above realization seems hard to come by: until you’ve seen it, the idea that almost all functions could be represented very efficiently like that seems somewhat implausible. But at least it’s a start.

Let’s turn the question round. Suppose we know that f has the property that f(x+y)=\langle\phi(x),\phi(y)\rangle for every x,y, with \phi taking values in a k-dimensional space. That is telling us that if we think of f as a matrix — that is, we write F(x,y) for f(x+y) — then that matrix has rank k. So we can ask the following question: given a matrix F(x,y) that happens to be of the special form f(x+y) (where the indexing variables x,y live in \mathbb F_3^n), under what circumstances can it possibly have low rank? That is, what about f makes F have low rank?

We can get some purchase on this question by thinking about how F operates as a linear map on functions defined on \mathbb F_3^n. Indeed, we have that if u is a function defined on \mathbb F_3^n (I’m being a bit vague for the moment about where u takes its values, though the eventual answer will be \mathbb F_3), then we have the formula Fu(x)=\sum_yf(x+y)u(y). Now F has rank k if and only if the functions of the form Fu form a k-dimensional subspace. Note that if u is the function \delta_z, we have that Fu(x)=f(x+z). Since every u is a linear combination of delta functions, we are requiring that the translates of f should span a subspace of dimension k. Of course, we’d settle for a lower dimension, so it’s perhaps more natural to say at most k. I won’t actually write that, but it should be understood that it’s what I basically mean.

What kinds of functions have the nice property that their translates span a low-dimensional subspace? And can we find a huge space of such functions?

The answer that occurs most naturally to me is that characters have this property: if \chi is a character, then every translate of \chi is a multiple of \chi, since \chi(x+z)=\chi(z)\chi(x). So if f is a linear combination of k characters, then its translates span a k-dimensional space. (So now, just to be explicit about it, my functions are taking values in \mathbb C.)

Moreover, the converse is true. What we are asking for is equivalent to asking for the convolutions of f with other functions to live in a k-dimensional subspace. If we take Fourier transforms, we now want the pointwise products of \hat f with other functions to live in a k-dimensional subspace. Well, that’s exactly saying that \hat f takes k non-zero values. Transforming back, that gives us that f needs to be a linear combination of k characters.

But that’s a bit of a disaster. If we want an r-dimensional space of functions such that each one is a linear combination of at most k characters, we cannot do better than to take r=3k/2. The proof is the same as one of the arguments in Ellenberg’s preprint: in an r-dimensional space there must be at least r active coordinates, and then a random element of the space is on average non-zero on at least 2r/3 of those.

So we have failed in our quest to make r exponentially close to 3^n and k exponentially close to zero.

But before we give up, shouldn’t we at least consider backtracking and trying again with a different field of scalars? The complex numbers didn’t work out for us, but there is one other choice that stands out as natural, namely \mathbb F_3.

So now we ask a question that’s exactly analogous to the question we asked earlier: what kinds of functions have the property that they and their translates generate a subspace of dimension k?

Let’s see whether the characters idea works here. Are there functions \phi:\mathbb F_3^n\to\mathbb F_3 with the property that \phi(x+y)=\phi(x)\phi(y)? No there aren’t, or at least not any interesting ones, since that would give us that \phi(0)=\phi(x+x+x)=\phi(x)^3 for every x, which implies that \phi is constant (and because \phi(x)^2=\phi(2x), that constant has to be 0 or 1).

OK, let’s ask a slightly different question. Is there some fairly small space of functions from \mathbb F_3^n to \mathbb F_3 that is closed under taking translates? That is, we would like that if f belongs to the space, then for each y the function x\mapsto f(x+y) also belongs to the space.

One obvious space of functions with this property is linear maps. There aren’t that many of these — just an n-dimensional space of them (or (n+1)-dimensional if we interpret “linear” in the polynomials sense rather than the vector-spaces sense) — sitting inside the 3^n-dimensional space of all functions from \mathbb F_3^n to \mathbb F_3.

It’s not much of a stretch to get from here to noticing that polynomials of degree at most d form another such space. For example, we might think, “What’s the simplest function I can think of that isn’t linear?” and we might then go for something like x\mapsto x_1^2. That and its translates generate the space of all quadratic polynomials that depend on x_1 only. Then we’d start to spot that there are several spaces of functions that are closed under translation. Given any monomial, it and its translates generate the space generated by all smaller monomials. So for example the monomial x_1^2x_2 and its translates generate the space of polynomials of the form ax_1^2x_2+bx_1x_2+cx_2+dx_1^2+ex_1+f. So any down-set of monomials defines a subspace that is closed under translation.

I think, but have not carefully checked, that these are in fact the only subspaces that are closed under translation. Let me try to explain why. Given any function f from \mathbb F_3^n to \mathbb F_3, it must be given by a polynomial P made out of cube-free monomials. That’s simply because the dimension of the space of such polynomials is 3^n. And I think that if you take any polynomial, then the subspace that it and its translates generate is generated by all the monomials that are less than a monomial that occurs in P with a non-zero coefficient.

Actually no, that’s false. If I take the polynomial x_1+\dots+x_n, then every translate of it is of the form x_1+\dots+x_1+C. So without thinking a bit more, I don’t have a characterization of the spaces of functions that are closed under translation. But we can at least say that polynomials give us a rich supply of them.

Is the rank miracle a miracle?

I’m starting this section a day after writing the sections above, and after a good night’s sleep I have clarified in my mind something I sort of knew already, as it’s essential to the whole argument, which is that the conjectures that briefly flitted across my mind two paragraphs ago and that turned out to be false absolutely had to be false. Their falsity is pretty much the whole point of what is going on. So let me come to that now.

Let me call a subspace closed if it is closed under translation. (Just to be completely explicit about this, by “translation” I am referring to operations of the form T_a, which take a function f to the function x\mapsto f(x-a).) Note that the sum of two closed subspaces is closed. Therefore, if we want to find out what closed subspaces are like, we could do a lot worse than thinking about the closed subspaces generated by a single function, which it now seems good to think of as a polynomial.

Unfortunately, it’s not easy to illustrate what I’m about to say with a simple example, because simple examples tend to be too small for the phenomenon to manifest itself. So let us argue in full generality. Let P be a polynomial of degree at most d. We would like to understand the rank of the matrix F(x,y)=P(x+y), which is equal to the dimension of the closed subspace generated by P, or equivalently the subspace generated by all functions of the form P(x+a).

At first sight it looks as though this subspace could contain pretty well all linear combinations of monomials that are dominated by monomials that occur with non-zero coefficients in P. For example, consider the 2-variable polynomial x^2y. In this case we are trying to work out the dimension of the space spanned by the polynomials

(x+a)^2(y+b)=x^2y+bx^2+2axy+2ab+a^2y+a^2b.

These live in the space spanned by six monomials, so we’d like to know whether the vectors of the form (1,b,2a,2ab,a^2,a^2b) span the whole of \mathbb F_3^6 or just some proper subspace. Setting a=0 we see that we can generate the standard basis vectors e_1 and e_2. Setting b=0 it’s not hard to see that we can also get e_3 and e_5. And setting b=1 we see that we can get the fourth and sixth coordinates to be any pair we like. So these do indeed span the full space. Thus, in this particular case one of my false conjectures from earlier happens to be true.

Let’s see why it is false in general. The argument is basically repeating the proof of the Croot-Lev-Pach lemma, but using that proof to prove an equivalent statement (a bound for the rank of the closed subspace generated by P) rather than the precise statement they proved. (I’m not claiming that this is a radically different way of looking at things, but I find it slightly friendlier.)

Let P be a polynomial. One thing that’s pretty clear, and I think this is why I got slightly confused yesterday, is that for every monomial m_1 that’s dominated by a monomial m_2 that occurs non-trivially in P we can find some linear combination of translates P(x+a) such that m_1 occurs with a non-zero coefficient. So if we want to prove that these translates generate a low-dimensional space, we need to show that there are some heavy-duty linear dependences amongst these coefficients. And there are! Here’s how the proof goes. Suppose that P has degree at most d. Then we won’t worry at all about the coefficients of the monomials of degree at most d/2: sure, these generate a subspace of dimension m_{d/2} (that’s the definition of m_{d/2}, by the way), but unless d is very close to 2n, that’s going to be very small.

But what about the coefficients of the monomials of degree greater than d/2? This is where the linear dependences come in. Let x_1^{r_1}\dots x_n^{r_n} be such a monomial. What can we say about its coefficient in the polynomial P(x+a)? Well, if we expand out P(x+a) and write it as a linear combination of monomials, then the coefficient of x_1^{r_1}\dots x_n^{r_n} will work out as a gigantic polynomial in a_1,\dots,a_n. However, and this is the key point, this “gigantic” polynomial will have degree at most d/2. That is, for each such monomial x_1^{r_1}\dots x_n^{r_n}, we have a polynomial Q_{r_1,\dots,r_n} of degree at most d/2 such that Q_{r_1,\dots,r_n}(a) gives the coefficient of x_1^{r_1}\dots x_n^{r_n} in the polynomial P(x+a). But these polynomials all live in the m_{d/2}-dimensional space of polynomials of degree at most d/2, so we can find a spanning subset of them of size at most m_{d/2}. In other words, we can pick out at most m_{d/2} of the polynomials Q_{r_1,\dots,r_n}, and all the rest are linear combinations of those ones. This is the huge linear dependence we wanted, and it shows that the projection of the closed subspace generated by P to the monomials of degree at least d/2 is also at most m_{d/2}.

So in total we get that P and its translates span a space of dimension at most 2m_{d/2}, which for suitable d is much much smaller than m_d. This is what I am referring to when I talk about a “rank miracle”.

Note that we could have phrased the entire discussion in terms of the rank of P(x+y). That is, we could have started with the thought that if f is a function defined on \mathbb F_3^n such that f(x+y)=0 whenever x,y are distinct elements of A, and f(2x)\ne 0 for at least m points x\in A, then the matrix F(x,y)=f(x+y) would have rank at least m, which is the same as saying that f and its translates span a space of dimension at least m. So then we would be on the lookout for a high-dimensional space of functions such that for each function f in the class, f and its translates span a much lower-dimensional space. That is what the polynomials give us, and we don’t have to mention a funny non-linear function \phi from \mathbb F_3^n to a vector space V.

I still haven’t answered the question of whether the rank miracle is a miracle. I actually don’t have a very good answer to this. In the abstract, it is a big surprise that there is a space of functions of dimension that’s exponentially close to the maximal dimension 3^n such that for every single function f in that space, the rank of the matrix F(x,y)=f(x+y) is exponentially small. (Here “exponentially small/close” means as a fraction of 3^n.) And yet, once one has seen the proof, it begins to feel like a fairly familiar concentration of measure argument: it isn’t a surprise that the polynomials of degree at most 4n/3 form a space of almost full dimension and the polynomials of degree at most 2n/3 form a space of tiny dimension. And it’s not completely surprising (again with hindsight) that because in the expansion of P(x+y) you can’t use more than half the degree for both x and y, there might be some way of arguing that the translates of P live in a subspace of dimension closer to m_{d/2}.


This post has got rather long, so this seems like a good place to cut it off. To be continued.