The sunflower lemma via Shannon entropy
What's new 2020-07-21
A family of sets for some
is a sunflower if there is a core set
contained in each of the
such that the petal sets
are disjoint. If
, let
denote the smallest natural number with the property that any family of
distinct sets of cardinality at most
contains
distinct elements
that form a sunflower. The celebrated Erdős-Rado theorem} asserts that
is finite; in fact Erdős and Rado gave the bounds
Rao’s argument used the Shannon noiseless coding theorem. It turns out that the argument can be arranged in the very slightly different language of Shannon entropy, and I would like to present it here. The argument proceeds by locating the core and petals of the sunflower separately (this strategy is also followed in Alweiss-Lovett-Wu-Zhang). In both cases the following definition will be key. In this post all random variables, such as random sets, will be understood to be discrete random variables taking values in a finite range. We always use boldface symbols to denote random variables, and non-boldface for deterministic quantities.
Definition 1 (Spread set) Let. A random set
is said to be
-spread if one has
for all sets
. A family
of sets is said to be
-spread if
is non-empty and the random variable
is
-spread, where
is drawn uniformly from
.
The core can then be selected greedily in such a way that the remainder of a family becomes spread:
Lemma 2 (Locating the core) Letbe a family of subsets of a finite set
, each of cardinality at most
, and let
. Then there exists a “core” set
of cardinality at most
such that the set
has cardinality at least
, and such that the family
is
-spread. Furthermore, if
and the
are distinct, then
.
Proof: We may assume is non-empty, as the claim is trivial otherwise. For any
, define the quantity
Let be the set (3). Since
,
is non-empty. It remains to check that the family
is
-spread. But for any
and
drawn uniformly at random from
one has
In view of the above lemma, the bound (2) will then follow from
Proposition 3 (Locating the petals) Letbe natural numbers, and suppose that
for a sufficiently large constant
. Let
be a finite family of subsets of a finite set
, each of cardinality at most
which is
-spread. Then there exist
such that
is disjoint.
Indeed, to prove (2), we assume that is a family of sets of cardinality greater than
for some
; by discarding redundant elements and sets we may assume that
is finite and that all the
are contained in a common finite set
. Apply Lemma 2 to find a set
of cardinality
such that the family
is
-spread. By Proposition 3 we can find
such that
are disjoint; since these sets have cardinality
, this implies that the
are distinct. Hence
form a sunflower as required.
Remark 4 Proposition 3 is easy to prove if we strengthen the condition onto
. In this case, we have
for every
, hence by the union bound we see that for any
with
there exists
such that
is disjoint from the set
, which has cardinality at most
. Iterating this, we obtain the conclusion of Proposition 3 in this case. This recovers a bound of the form
, and by pursuing this idea a little further one can recover the original upper bound (1) of Erd\H{o}s and Rado.
It remains to prove Proposition 3. In fact we can locate the petals one at a time, placing each petal inside a random set.
Proposition 5 (Locating a single petal) Let the notation and hypotheses be as in Proposition 3. Letbe a random subset of
, such that each
lies in
with an independent probability of
. Then with probability greater than
,
contains one of the
.
To see that Proposition 5 implies Proposition 3, we randomly partition into
by placing each
into one of the
,
chosen uniformly and independently at random. By Proposition 5 and the union bound, we see that with positive probability, it is simultaneously true for all
that each
contains one of the
. Selecting one such
for each
, we obtain the required disjoint petals.
We will prove Proposition 5 by gradually increasing the density of the random set and arranging the sets to get quickly absorbed by this random set. The key iteration step is
Proposition 6 (Refinement inequality) Letand
. Let
be a random subset of a finite set
which is
-spread, and let
be a random subset of
independent of
, such that each
lies in
with an independent probability of
. Then there exists another random subset
of
with the same distribution as
, such that
and
Note that a direct application of the first moment method gives only the bound
One can iterate the above proposition, repeatedly replacing with
(noting that this preserves the
-spread nature
) to conclude
Corollary 7 (Iterated refinement inequality) Let,
, and
. Let
be a random subset of a finite set
which is
-spread, and let
be a random subset of
independent of
, such that each
lies in
with an independent probability of
. Then there exists another random subset
of
with the same distribution as
, such that
Now we can prove Proposition 5. Let be chosen shortly. Applying Corollary 7 with
drawn uniformly at random from the
, and setting
, or equivalently
, we have
It remains to establish Proposition 6. This is the difficult step, and requires a clever way to find the variant of
that has better containment properties in
than
does. The main trick is to make a conditional copy
of
that is conditionally independent of
subject to the constraint
. The point here is that this constrant implies the inclusions
— 1. Shannon entropy —
In this section we lay out all the tools from the theory of Shannon entropy that we will need.
Define an empirical sequence for a random variable taking values in a discrete set
to be a sequence
in
such that the empirical samples of this sequence converge in distribution to
in the sense that
If is a random variable taking values in some set
, its Shannon entropy
is defined by the formula
We record the following standard and easily verified facts:
Lemma 8 (Basic Shannon inequalities) Letbe random variables.
- (i) (Monotonicity) If
is a deterministic function
of
, then
. More generally, if
is a deterministic function
of
and
, then
. If
is a deterministic function of
, then
.
- (ii) (Subadditivity) One has
, with equality iff
,
are independent. More generally, one has
, with equality iff
,
are conditionally independent with respect to
.
- (iii) (Chain rule) One has
. More generally
. In particular
, and
iff
are independent; similarly,
, and
iff
are conditionally independent with respect to
.
- (iv) (Jensen’s inequality) If
takes values in a finite set
then
, with equality iff
is uniformly distributed in
. More generally, if
takes values in a set
that depends on
, then
, with equality iff
is uniformly distributed in
after conditioning on
.
- (v) (Gibbs inequality) If
take values in the same finite set
, then
(we permit the right-hand side to be infinite, which makes the inequality vacuously true).
See this previous blog post for some intuitive analogies to understand Shannon entropy.
Now we establish some inequalities of relevance to random sets.
We first observe that any small random set largely determines any of its subsets. Define a random subset of a random set to be a random set
such that
holds almost surely.
Lemma 9 (Subsets of small sets have small conditional entropy) Letbe a random finite set.
- (i) One has
for any random subset
of
.
- (ii) One has
. If
is almost surely non-empty, we can improve this to
.
Proof: The set takes values in the power set
of
, so the claim (i) follows from Lemma 8(iv). (Note how it is convenient here that we are using the base
for the logarithm.)
For (ii), apply Lemma 8(v) with and
the geometric random variable
for natural numbers
(or
for positive
, if
is non-empty).
Now we encode the property of a random variable being
-spread in the language of Shannon entropy.
Lemma 10 (Information-theoretic interpretation of spread) Letbe a random finite set that is
-spread for some
.
- (i) If
is uniformly distributed amongst some finite collection
of sets, then
for all random subsets
of
.
- (ii) In the general case, if
are an empirical sequence of
, then
as
, where
is drawn uniformly from
and
is a random subset of
.
Informally: large random subsets of an -spread set
necessarily have a lot of mutual information with
. Conversely, one can bound the size of a random subset of an
-spread set by bounding its mutual information with
.
Proof: In case (i), it suffices by Lemma 8(iv) to establish the bound
Given a finite non-empty set and
, let
denote the collection of
-element subsets of
. A uniformly chosen element
of
is thus a random
-element subset of
; we refer to the quantity
as the density of this random subset, and
as a uniformly chosen random subset of
of density
. (Of course, this is only defined when
is an integer multiple of
.) Uniformly chosen random sets have the following information-theoretic relationships to small random sets have the following information-theoretic properties:
Lemma 11 Letbe a finite non-empty set, let
be a uniformly chosen random subset of
of some density
(which is a multiple of
).
- (i) (Spread) If
is a random subset of
, then
- (ii) (Absorption) If
is a random subset of
, then
Proof: To prove (i), it suffices by Lemma 10(i) to show that is
-spread, which amounts to showing that
For (ii), by replacing with
we may assume that
are disjoint. From Lemma 8(iii) and Lemma 9(ii) it suffices to show that
The following “relative product” construction will be important for us. Given a random variable and a deterministic function
of that variable, one can construct a conditionally independent copy
of
subject to the condition
, with the joint distribution
— 2. Proof of refinement inequality —
Now we have enough tools to prove Proposition 6. Let be as in that proposition. On the event when
is empty we can set
so we can instead condition to the event that
is non-empty. In particular
In order to use Lemma 10 we fix an empirical sequence for
. We relabel
as
, and let
be a parameter going off to infinity (so in particular
is identified with a subset of
. We let
be drawn uniformly at random from
, and let
be a uniform random subset of
of density
independent of
. Observe from Stirling’s formula that
converges in distribution to
. Thus it will suffice to find another uniform random variable
from
such that
From we can form the random set
; we then form a conditionally independent copy
of
subject to the constraint
Now, from the constraints (11), (13) we have