An abridged proof of Marton’s conjecture
What's new 2024-06-22
[This post is dedicated to Luca Trevisan, who recently passed away due to cancer. Though far from his most significant contribution to the field, I would like to mention that, as with most of my other blog posts on this site, this page was written with the assistance of Luca’s LaTeX to WordPress converter. Mathematically, his work and insight on pseudorandomness in particular have greatly informed how I myself think about the concept. – T.]
Recently, Timothy Gowers, Ben Green, Freddie Manners, and I were able to establish the following theorem:
Theorem 1 (Marton’s conjecture) Let be non-empty with . Then there exists a subgroup of with such that is covered by at most translates of , for some absolute constant .
We established this result with , although it has since been improved to by Jyun-Jie Liao.
Our proof was written in order to optimize the constant as much as possible; similarly for the more detailed blueprint of the proof that was prepared in order to formalize the result in Lean. I have been asked a few times whether it is possible to present a streamlined and more conceptual version of the proof in which one does not try to establish an explicit constant , but just to show that the result holds for some constant . This is what I will attempt to do in this post, though some of the more routine steps will be outsourced to the aforementioned blueprint.
The key concept here is that of the entropic Ruzsa distance between two random variables taking values , defined as
where are independent copies of , and denotes the Shannon entropy of . This distance is symmetric and non-negative, and obeys the triangle inequality for any random variables ; see the blueprint for a proof. The above theorem then follows from an entropic analogue:Theorem 2 (Entropic Marton’s conjecture) Let be a -valued random variable with . Then there exists a uniform random variable on a subgroup of such that for some absolute constant .
We were able to establish Theorem 2 with , which implies Theorem 1 with by fairly standard additive combinatorics manipulations; see the blueprint for details.
The key proposition needed to establish Theorem 2 is the following distance decrement property:
Proposition 3 (Distance decrement) If are -valued random variables, then one can find -valued random variables such that and for some absolute constants .
Indeed, suppose this proposition held. Starting with both equal to and iterating, one can then find sequences of random variables with ,
and In particular, from the triangle inequality and geometric series By weak compactness, some subsequence of the , converge to some limiting random variables , and by some simple continuity properties of entropic Ruzsa distance, we conclude that and Theorem 2 then follows from the “100% inverse theorem” for entropic Ruzsa distance; see the blueprint for details.To prove Proposition 3, we can reformulate it as follows:
Proposition 4 (Lack of distance decrement implies vanishing) If are -valued random variables, with the property that for all -valued random variables and some sufficiently small absolute constant , then one can derive a contradiction.
Indeed, we may assume from the above proposition that
for some , which will imply Proposition 3 with .The entire game is now to use Shannon entropy inequalities and “entropic Ruzsa calculus” to deduce a contradiction from (1) for small enough. This we will do below the fold, but before doing so, let us first make some adjustments to (1) that will make it more useful for our purposes. Firstly, because conditional entropic Ruzsa distance (see blueprint for definitions) is an average of unconditional entropic Ruzsa distance, we can automatically upgrade (1) to the conditional version
for any random variables that are possibly coupled with respectively. In particular, if we define a “relevant” random variable (conditioned with respect to some auxiliary data ) to be a random variable for which or equivalently (by the triangle inequality) then we have the useful lower bound whenever and are relevant conditioning on respectively. This is quite a useful bound, since the laws of “entropic Ruzsa calculus” will tell us, roughly speaking, that virtually any random variable that we can create from taking various sums of copies of and conditioning against other sums, will be relevant. (Informally: the space of relevant random variables is -separated with respect to the entropic Ruzsa distance.)— 1. Main argument —
Now we derive more and more consequences of (2) – at some point crucially using the hypothesis that we are in characteristic two – before we reach a contradiction.
Right now, our hypothesis (2) only supplies lower bounds on entropic distances. The crucial ingredient that allows us to proceed is what we call the fibring identity, which lets us convert these lower bounds into useful upper bounds as well, which in fact match up very nicely when is small. Informally, the fibring identity captures the intuitive fact that the doubling constant of a set should be at least as large as the doubling constant of the image of that set under a homomorphism, times the doubling constant of a typical fiber of that homomorphism; and furthermore, one should only be close to equality if the fibers “line up” in some sense.
Here is the fibring identity:
Proposition 5 (Fibring identity) Let be a homomorphism. Then for any independent -valued random variables , one has
The proof is of course in the blueprint, but given that it is a central pillar of the argumnt, I reproduce it here.
Proof: Expanding out the definition of Ruzsa distance, and using the conditional entropy chain rule
and it suffices to establish the identity But from the chain rule again we have and from the definition of conditional mutual information (using the fact that is determined both by and by ) one has giving the claim.We will only care about the characteristic setting here, so we will now assume that all groups involved are -torsion, so that we can replace all subtractions with additions. If we specialize the fibring identity to the case where , , is the addition map , and , are pairs of independent random variables in , we obtain the following corollary:
Corollary 6 Let be independent -valued random variables. Then we have the identity
This is a useful and flexible identity, especially when combined with (2). For instance, we can discard the conditional mutual information term as being non-negative, to obtain the inequality
If we let be independent copies of respectively (note the swap in the last two variables!) we obtain From entropic Ruzsa calculus, one can check that , , and are all relevant random variables, so from (2) we now obtain both upper and lower bounds for : A pleasant upshot of this is that we now get to work in the symmetric case without loss of generality. Indeed, if we set , we now have from (2) that whenever are relevant, which by entropic Ruzsa calculus is equivalent to asking thatNow we use the fibring identity again, relabeling as and requiring to be independent copies of . We conclude that
As before, the random variables , , , are all relevant, so from (3) we have We could now also match these lower bounds with upper bounds, but the more important takeaway from this analysis is a really good bound on the conditional mutual information: By the data processing inequality, we can discard some of the randomness here, and conclude Let us introduce the random variables then we have Intuitively, this means that and are very nearly independent given . For sake of argument, let us assume that they are actually independent; one can achieve something resembling this by invoking the entropic Balog-Szemerédi-Gowers theorem, established in the blueprint, after conceding some losses of in the entropy, but we skip over the details for this blog post. The key point now is that because we are in characteristic , has the same form as or : In particular, by permutation symmetry, we have and so by the definition of conditional Ruzsa distance we have a massive distance decrement contradicting (1) as desired. (In reality, we end up decreasing the distance not all the way to zero, but instead to due to losses in the Balog-Szemerédi-Gowers theorem, but this is still enough to reach a contradiction.)Remark 7 A similar argument works in the -torsion case for general . Instead of decrementing the entropic Ruzsa distance, one instead decrements a “multidistance” for independent . By an iterated version of the fibring identity, one can first reduce again to the symmetric case where the random variables are all copies of the same variable . If one then takes , to be an array of copies of , one can get to the point where the row sums and the column sums have small conditional mutual information with respect to the double sum . If we then set and , the data processing inequality again shows that and are nearly independent given . The -torsion now crucially intervenes as before to ensure that has the same form as or , leading to a contradiction as before.