On a conjecture of Marton

What's new 2023-11-14

Tim Gowers, Ben Green, Freddie Manners, and I have just uploaded to the arXiv our paper “On a conjecture of Marton“. This paper establishes a version of the notorious Polynomial Freiman–Ruzsa conjecture (first proposed by Katalin Marton):

Theorem 1 (Polynomial Freiman–Ruzsa conjecture) Let ${A \subset {\bf F}_2^n}$ be such that ${|A+A| \leq K|A|}$ . Then ${A}$ can be covered by at most ${2K^{12}}$ translates of a subspace ${H}$ of ${{\bf F}_2^n}$ of cardinality at most ${A}$ .

The previous best known result towards this conjecture was by Konyagin (as communicated in this paper of Sanders), who obtained a similar result but with ${2K^{12}}$ replaced by ${\exp(O_\varepsilon(\log^{3+\varepsilon} K))}$ for any ${\varepsilon>0}$ (assuming that say ${K \geq 3/2}$ to avoid some degeneracies as ${K}$ approaches ${1}$ , which is not the difficult case of the conjecture). The conjecture (with ${12}$ replaced by an unspecified constant ${C}$ ) has a number of equivalent forms; see this survey of Green, and these papers of Lovett and of Green and myself for some examples; in particular, as discussed in the latter two references, the constants in the inverse ${U^3({\bf F}_2^n)}$ theorem are now polynomial in nature (although we did not try to optimize the constant).

The exponent ${12}$ here was the product of a large number of optimizations to the argument (our original exponent here was closer to ${1000}$ ), but can be improved even further with additional effort (our current argument, for instance, allows one to replace it with ${7+\sqrt{17} = 11.123\dots}$ , but we decided to state our result using integer exponents instead).

In this paper we will focus exclusively on the characteristic ${2}$ case (so we will be cavalier in identifying addition and subtraction), but in a followup paper we will establish similar results in other finite characteristics.

Much of the prior progress on this sort of result has proceeded via Fourier analysis. Perhaps surprisingly, our approach uses no Fourier analysis whatsoever, being conducted instead entirely in “physical space”. Broadly speaking, it follows a natural strategy, which is to induct on the doubling constant ${|A+A|/|A|}$ . Indeed, suppose for instance that one could show that every set ${A}$ of doubling constant ${K}$ was “commensurate” in some sense to a set ${A'}$ of doubling constant at most ${K^{0.99}}$ . One measure of commensurability, for instance, might be the Ruzsa distance ${\log \frac{|A+A'|}{|A|^{1/2} |A'|^{1/2}}}$ , which one might hope to control by ${O(\log K)}$ . Then one could iterate this procedure until doubling constant dropped below say ${3/2}$ , at which point the conjecture is known to hold (there is an elementary argument that if ${A}$ has doubling constant less than ${3/2}$ , then ${A+A}$ is in fact a subspace of ${{\bf F}_2^n}$ ). One can then use several applications of the Ruzsa triangle inequality

$\displaystyle \log \frac{|A+C|}{|A|^{1/2} |C|^{1/2}} \leq \log \frac{|A+B|}{|A|^{1/2} |B|^{1/2}} + \log \frac{|B+C|}{|B|^{1/2} |C|^{1/2}}$

to conclude (the fact that we reduce ${K}$ to ${K^{0.99}}$ means that the various Ruzsa distances that need to be summed are controlled by a convergent geometric series).

There are a number of possible ways to try to “improve” a set ${A}$ of not too large doubling by replacing it with a commensurate set of better doubling. We note two particular potential improvements:

(i) Replacing ${A}$ with ${A+A}$ . For instance, if ${A}$ was a random subset (of density ${1/K}$ ) of a large subspace ${H}$ of ${{\bf F}_2^n}$ , then replacing ${A}$ with ${A+A}$ usually drops the doubling constant from ${K}$ down to nearly ${1}$ (under reasonable choices of parameters).
(ii) Replacing ${A}$ with ${A \cap (A+h)}$ for a “typical” ${h \in A+A}$ . For instance, if ${A}$ was the union of ${K}$ random cosets of a subspace ${H}$ of large codimension, then replacing ${A}$ with ${A \cap (A+h)}$ again usually drops the doubling constant from ${K}$ down to nearly ${1}$ .

Unfortunately, there are sets ${A}$ where neither of the above two operations (i), (ii) significantly improves the doubling constant. For instance, if ${A}$ is a random density ${1/\sqrt{K}}$ subset of ${\sqrt{K}}$ random translates of a medium-sized subspace ${H}$ , one can check that the doubling constant stays close to ${K}$ if one applies either operation (i) or operation (ii). But in this case these operations don’t actually worsen the doubling constant much either, and by applying some combination of (i) and (ii) (either intersecting ${A+A}$ with a translate, or taking a sumset of ${A \cap (A+h)}$ with itself) one can start lowering the doubling constant again.

This begins to suggest a potential strategy: show that at least one of the operations (i) or (ii) will improve the doubling constant, or at least not worsen it too much; and in the latter case, perform some more complicated operation to locate the desired doubling constant improvement.

A sign that this strategy might have a chance of working is provided by the following heuristic argument. If ${A}$ has doubling constant ${K}$ , then the Cartesian product ${A \times A}$ has doubling constant ${K^2}$ . On the other hand, by using the projection map ${\pi: {\bf F}_2^n \times {\bf F}_2^n \rightarrow {\bf F}_2^n}$ defined by ${\pi(x,y) := x+y}$ , we see that ${A \times A}$ projects to ${\pi(A \times A) = A+A}$ , with fibres ${\pi^{-1}(\{h\})}$ being essentially a copy of ${A \cap (A+h)}$ . So, morally, ${A \times A}$ also behaves like a “skew product” of ${A+A}$ and the fibres ${A \cap (A+h)}$ , which suggests (non-rigorously) that the doubling constant ${K^2}$ of ${A \times A}$ is also something like the doubling constant of ${A + A}$ , times the doubling constant of a typical fibre ${A \cap (A+h)}$ . This would imply that at least one of ${A +A}$ and ${A \cap (A+h)}$ would have doubling constant at most ${K}$ , and thus that at least one of operations (i), (ii) would not worsen the doubling constant.

Unfortunately, this argument does not seem to be easily made rigorous using the traditional doubling constant; even just showing the significantly weaker statement that ${A+A}$ has doubling constant at most ${K^2}$ is unclear (and perhaps even false). However, it turns out (as discussed in this recent paper of myself with Green and Manners) that things are much better. Here, the analogue of a subset ${A}$ in ${{\bf F}_2^n}$ is a random variable ${X}$ taking values in ${{\bf F}_2^n}$ , and the analogue of the (logarithmic) doubling constant ${\log \frac{|A+A|}{|A|}}$ is the entropic doubling constant ${d(X;X) := {\bf H}(X_1+X_2)-{\bf H}(X)}$ , where ${X_1,X_2}$ are independent copies of ${X}$ . If ${X}$ is a random variable in some additive group ${G}$ and ${\pi: G \rightarrow H}$ is a homomorphism, one then has what we call the fibring inequality

$\displaystyle d(X;X) \geq d(\pi(X);\pi(X)) + d(X|\pi(X); X|\pi(X)),$

where the conditional doubling constant ${d(X|\pi(X); X|\pi(X))}$ is defined as

$\displaystyle d(X|\pi(X); X|\pi(X)) = {\bf H}(X_1 + X_2 | \pi(X_1), \pi(X_2)) - {\bf H}( X | \pi(X) ).$

Indeed, from the chain rule for Shannon entropy one has

$\displaystyle {\bf H}(X) = {\bf H}(\pi(X)) + {\bf H}(X|\pi(X))$

and

$\displaystyle {\bf H}(X_1+X_2) = {\bf H}(\pi(X_1) + \pi(X_2)) + {\bf H}(X_1 + X_2|\pi(X_1) + \pi(X_2))$

while from the non-negativity of conditional mutual information one has

$\displaystyle {\bf H}(X_1 + X_2|\pi(X_1) + \pi(X_2)) \geq {\bf H}(X_1 + X_2|\pi(X_1), \pi(X_2))$

and it is an easy matter to combine all these identities and inequalities to obtain the claim.

Applying this inequality with ${X}$ replaced by two independent copies ${(X_1,X_2)}$ of itself, and using the addition map ${(x,y) \mapsto x+y}$ for ${\pi}$ , we obtain in particular that

$\displaystyle 2 d(X;X) \geq d(X_1+X_2; X_1+X_2) + d(X_1,X_2|X_1+X_2; X_1,X_2|X_1+X_2)$

or (since ${X_2}$ is determined by ${X_1}$ once one fixes ${X_1+X_2}$ )

$\displaystyle 2 d(X;X) \geq d(X_1+X_2; X_1+X_2) + d(X_1|X_1+X_2; X_1|X_1+X_2).$

So if ${d(X;X) = \log K}$ , then at least one of ${d(X_1+X_2; X_1+X_2)}$ or ${d(X_1|X_1+X_2; X_1|X_1+X_2)}$ will be less than or equal to ${\log K}$ . This is the entropy analogue of at least one of (i) or (ii) improving, or at least not degrading the doubling constant, although there are some minor technicalities involving how one deals with the conditioning to ${X_1+X_2}$ in the second term ${d(X_1|X_1+X_2; X_1|X_1+X_2)}$ that we will gloss over here (one can pigeonhole the instances of ${X_1}$ to different events ${X_1+X_2=x}$ , ${X_1+X_2=x'}$ , and “depolarise” the induction hypothesis to deal with distances ${d(X;Y)}$ between pairs of random variables ${X,Y}$ that do not necessarily have the same distribution). Furthermore, we can even calculate the defect in the above inequality: a careful inspection of the above argument eventually reveals that

$\displaystyle 2 d(X;X) = d(X_1+X_2; X_1+X_2) + d(X_1|X_1+X_2; X_1|X_1+X_2)$

$\displaystyle + {\bf I}( X_1 + X_2 : X_1 + X_3 | X_1 + X_2 + X_3 + X_4)$

where we now take four independent copies ${X_1,X_2,X_3,X_4}$ . This leads (modulo some technicalities) to the following interesting conclusion: if neither (i) nor (ii) leads to an improvement in the entropic doubling constant, then ${X_1+X_2}$ and ${X_2+X_3}$ are conditionally independent relative to ${X_1+X_2+X_3+X_4}$ . This situation (or an approximation to this situation) is what we refer to in the paper as the “endgame”.

A version of this endgame conclusion is in fact valid in any characteristic. But in characteristic ${2}$ , we can take advantage of the identity

$\displaystyle (X_1+X_2) + (X_2+X_3) = X_1 + X_3.$

Conditioning on ${X_1+X_2+X_3+X_4}$ , and using symmetry we now conclude that if we are in the endgame exactly (so that the mutual information is zero), then the independent sum of two copies of ${(X_1+X_2|X_1+X_2+X_3+X_4)}$ has exactly the same distribution; in particular, the entropic doubling constant here is zero, which is certainly a reduction in the doubling constant.

To deal with the situation where the conditional mutual information is small but not completely zero, we have to use an entropic version of the Balog-Szemeredi-Gowers lemma, but fortunately this was already worked out in an old paper of mine (although in order to optimise the final constant, we ended up using a slight variant of that lemma).

I am planning to formalize this paper in the Lean4 language. Further discussion of this project will take place on this Zulip stream, and the project itself will be held at this Github repository.