Marton’s conjecture in abelian groups with bounded torsion

What's new 2024-04-04

Tim Gowers, Ben Green, Freddie Manners, and I have just uploaded to the arXiv our paper “Marton’s conjecture in abelian groups with bounded torsion“. This paper fully resolves a conjecture of Katalin Marton (the bounded torsion case of the Polynomial Freiman–Ruzsa conjecture (first proposed by Katalin Marton):

Theorem 1 (Marton’s conjecture) Let ${G = (G,+)}$ be an abelian ${m}$ -torsion group (thus, ${mx=0}$ for all ${x \in G}$ ), and let ${A \subset G}$ be such that ${|A+A| \leq K|A|}$ . Then ${A}$ can be covered by at most ${(2K)^{O(m^3)}}$ translates of a subgroup ${H}$ of ${G}$ of cardinality at most ${|A|}$ . Moreover, ${H}$ is contained in ${\ell A - \ell A}$ for some ${\ell \ll (2 + m \log K)^{O(m^3 \log m)}}$ .

We had previously established the ${m=2}$ case of this result, with the number of translates bounded by ${(2K)^{12}}$ (which was subsequently improved to ${(2K)^{11}}$ by Jyun-Jie Liao), but without the additional containment ${H \subset \ell A - \ell A}$ . It remains a challenge to replace ${\ell}$ by a bounded constant (such as ${2}$ ); this is essentially the “polynomial Bogulybov conjecture”, which is still open. The ${m=2}$ result has been formalized in the proof assistant language Lean, as discussed in this previous blog post. As a consequence of this result, many of the applications of the previous theorem may now be extended from characteristic ${2}$ to higher characteristic. Our proof techniques are a modification of those in our previous paper, and in particular continue to be based on the theory of Shannon entropy. For inductive purposes, it turns out to be convenient to work with the following version of the conjecture (which, up to ${m}$ -dependent constants, is actually equivalent to the above theorem):

Theorem 2 (Marton’s conjecture, entropy form) Let ${G}$ be an abelian ${m}$ -torsion group, and let ${X_1,\dots,X_m}$ be independent finitely supported random variables on ${G}$ , such that

$\displaystyle {\bf H}[X_1+\dots+X_m] - \frac{1}{m} \sum_{i=1}^m {\bf H}[X_i] \leq \log K,$

where ${{\bf H}}$ denotes Shannon entropy. Then there is a uniform random variable ${U_H}$ on a subgroup ${H}$ of ${G}$ such that

$\displaystyle \frac{1}{m} \sum_{i=1}^m d[X_i; U_H] \ll m^3,$

where ${d}$ denotes the entropic Ruzsa distance (see previous blog post for a definition); furthermore, if all the ${X_i}$ take values in some symmetric set ${S}$ , then ${H}$ lies in ${\ell S}$ for some ${\ell \ll (2 + \log K)^{O(m^3 \log m)}}$ .

As a first approximation, one should think of all the ${X_i}$ as identically distributed, and having the uniform distribution on ${A}$ , as this is the case that is actually relevant for implying Theorem 1; however, the recursive nature of the proof of Theorem 2 requires one to manipulate the ${X_i}$ separately. It also is technically convenient to work with ${m}$ independent variables, rather than just a pair of variables as we did in the ${m=2}$ case; this is perhaps the biggest additional technical complication needed to handle higher characteristics. The strategy, as with the previous paper, is to attempt an entropy decrement argument: to try to locate modifications ${X'_1,\dots,X'_m}$ of ${X_1,\dots,X_m}$ that are reasonably close (in Ruzsa distance) to the original random variables, while decrementing the “multidistance”

$\displaystyle {\bf H}[X_1+\dots+X_m] - \frac{1}{m} \sum_{i=1}^m {\bf H}[X_i]$

which turns out to be a convenient metric for progress (for instance, this quantity is non-negative, and vanishes if and only if the ${X_i}$ are all translates of a uniform random variable ${U_H}$ on a subgroup ${H}$ ). In the previous paper we modified the corresponding functional to minimize by some additional terms in order to improve the exponent ${12}$ , but as we are not attempting to completely optimize the constants, we did not do so in the current paper (and as such, our arguments here give a slightly different way of establishing the ${m=2}$ case, albeit with somewhat worse exponents). As before, we search for such improved random variables ${X'_1,\dots,X'_m}$ by introducing more independent random variables – we end up taking an array of ${m^2}$ random variables ${Y_{i,j}}$ for ${i,j=1,\dots,m}$ , with each ${Y_{i,j}}$ a copy of ${X_i}$ , and forming various sums of these variables and conditioning them against other sums. Thanks to the magic of Shannon entropy inequalities, it turns out that it is guaranteed that at least one of these modifications will decrease the multidistance, except in an “endgame” situation in which certain random variables are nearly (conditionally) independent of each other, in the sense that certain conditional mutual informations are small. In particular, in the endgame scenario, the row sums ${\sum_j Y_{i,j}}$ of our array will end up being close to independent of the column sums ${\sum_i Y_{i,j}}$ , subject to conditioning on the total sum ${\sum_{i,j} Y_{i,j}}$ . Not coincidentally, this type of conditional independence phenomenon also shows up when considering row and column sums of iid independent gaussian random variables, as a specific feature of the gaussian distribution. It is related to the more familiar observation that if ${X,Y}$ are two independent copies of a Gaussian random variable, then ${X+Y}$ and ${X-Y}$ are also independent of each other. Up until now, the argument does not use the ${m}$ -torsion hypothesis, nor the fact that we work with an ${m \times m}$ array of random variables as opposed to some other shape of array. But now the torsion enters in a key role, via the obvious identity

$\displaystyle \sum_{i,j} i Y_{i,j} + \sum_{i,j} j Y_{i,j} + \sum_{i,j} (-i-j) Y_{i,j} = 0.$

In the endgame, the any pair of these three random variables are close to independent (after conditioning on the total sum ${\sum_{i,j} Y_{i,j}}$ ). Applying some “entropic Ruzsa calculus” (and in particular an entropic version of the Balog–Szeméredi–Gowers inequality), one can then arrive at a new random variable ${U}$ of small entropic doubling that is reasonably close to all of the ${X_i}$ in Ruzsa distance, which provides the final way to reduce the multidistance. Besides the polynomial Bogulybov conjecture mentioned above (which we do not know how to address by entropy methods), the other natural question is to try to develop a characteristic zero version of this theory in order to establish the polynomial Freiman–Ruzsa conjecture over torsion-free groups, which in our language asserts (roughly speaking) that random variables of small entropic doubling are close (in Ruzsa distance) to a discrete Gaussian random variable, with good bounds. The above machinery is consistent with this conjecture, in that it produces lots of independent variables related to the original variable, various linear combinations of which obey the same sort of entropy estimates that gaussian random variables would exhibit, but what we are missing is a way to get back from these entropy estimates to an assertion that the random variables really are close to Gaussian in some sense. In continuous settings, Gaussians are known to extremize the entropy for a given variance, and of course we have the central limit theorem that shows that averages of random variables typically converge to a Gaussian, but it is not clear how to adapt these phenomena to the discrete Gaussian setting (without the circular reasoning of assuming the polynoimal Freiman–Ruzsa conjecture to begin with).