Frankl’s union-closed conjecture — a possible Polymath project?

Gowers's Weblog 2018-03-10

Although it was from only a couple of people, I had an enthusiastic response to a very tentative suggestion that it might be rewarding to see whether a polymath project could say anything useful about Frankl’s union-closed conjecture. A potentially worrying aspect of the idea is that the problem is extremely elementary to state, does not seem to yield to any standard techniques, and is rather notorious. But, as one of the commenters said, that is not necessarily an argument against trying it. A notable feature of the polymath experiment has been that it throws up surprises, so while I wouldn’t expect a polymath project to solve Frankl’s union-closed conjecture, I also know that I need to be rather cautious about my expectations — which in this case is an argument in favour of giving it a try.

A less serious problem is what acronym one would use for the project. For the density Hales-Jewett problem we went for DHJ, and for the Erdős discrepancy problem we used EDP. That general approach runs into difficulties with Frankl’s union-closed conjecture, so I suggest FUNC. This post, if the project were to go ahead, could be FUNC0; in general I like the idea that we would be engaged in a funky line of research.

The problem, for anyone who doesn’t know, is this. Suppose you have a family \mathcal{A} that consists of n distinct subsets of a set X. Suppose also that it is union closed, meaning that if A,B\in\mathcal{A}, then A\cup B\in\mathcal{A} as well. Must there be an element of X that belongs to at least n/2 of the sets? This seems like the sort of question that ought to have an easy answer one way or the other, but it has turned out to be surprisingly difficult.

If you are potentially interested, then one good thing to do by way of preparation is look at this survey article by Henning Bruhn and Oliver Schaudt. It is very nicely written and seems to be a pretty comprehensive account of the current state of knowledge about the problem. It includes some quite interesting reformulations (interesting because you don’t just look at them and see that they are trivially equivalent to the original problem).

For the remainder of this post, I want to discuss a couple of failures. The first is a natural idea for generalizing the problem to make it easier that completely fails, at least initially, but can perhaps be rescued, and the second is a failed attempt to produce a counterexample. I’ll present these just in case one or other of them stimulates a useful idea in somebody else.

A proof idea that may not be completely hopeless

An immediate reaction of any probabilistic combinatorialist is likely to be to wonder whether in order to prove that there exists a point in at least half the sets it might be easier to show that in fact an average point belongs to half the sets.

Unfortunately, it is very easy to see that that is false: consider, for example, the three sets \emptyset, \{1\}, and \{1,2,3,4,5,6,7,8,9,10,11,12,13\}. The average (over \{1,2,3,4,5,6,7,8,9,10,11,12,13\}) of the number of sets containing a random element is 14/13<3/2, but there are three sets.

However, this example doesn't feel like a genuine counterexample somehow, because the set system is just a dressed up version of \emptyset,\{1\},\{1,2\}: we replace the singleton \{2\} by the set \{2,3,4,5,6,7,8,9,10,11,12,13\} and that's it. So for this set system it seems more natural to consider a weighted average, or equivalently to take not the uniform distribution on \{1,2,3,4,5,6,7,8,9,10,11,12,13\}, but some other distribution that reflects more naturally the properties of the set system at hand. For example, we could give a probability 1/2 to the element 1 and 1/24 to each of the remaining 12 elements of the set. If we do that, then the average number of sets containing a random element will be the same as it is for the example \emptyset, \{1\},\{1,2\} with the uniform distribution (not that the uniform distribution is obviously the most natural distribution for that example).

This suggests a very slightly more sophisticated version of the averaging-argument idea: does there exist a probability distribution on the elements of the ground set such that the expected number of sets containing a random element (drawn according to that probability distribution) is at least half the number of sets?

With this question we have in a sense the opposite problem. Instead of the answer being a trivial no, it is a trivial yes — if, that is, the union-closed conjecture holds. That’s because if the conjecture holds, then some x belongs to at least half the sets, so we can assign probability 1 to that x and probability zero to all the other elements.

Of course, this still doesn’t feel like a complete demolition of the approach. It just means that for it not to be a trivial reformulation we will have to put conditions on the probability distribution. There are two ways I can imagine getting the approach to work. The first is to insist on some property that the distribution is required to have that means that its existence does not follow easily from the conjecture. That is, the idea would be to prove a stronger statement. It seems paradoxical, but as any experienced mathematician knows, it can sometimes be easier to prove a stronger statement, because there is less room for manoeuvre. In extreme cases, once a statement has been suitably strengthened, you have so little choice about what to do that the proof becomes almost trivial.

A second idea is that there might be a nice way of defining the probability distribution in terms of the set system. This would be a situation rather like the one I discussed in my previous post, on entropy and Sidorenko’s conjecture. There, the basic idea was to prove that a set A had cardinality at least m by proving that there is a probability distribution on A with entropy at least \log_2m. At first, this seems like an unhelpful idea, because if |A|\geq m then the uniform distribution on A will trivially do the job. But it turns out that there is a different distribution for which it is easier to prove that it does the job, even though it usually has lower entropy than the uniform distribution. Perhaps with the union-closed conjecture something like this works too: obviously the best distribution is supported on the set of elements that are contained in a maximal number of sets from the set system, but perhaps one can construct a different distribution out of the set system that gives a smaller average in general but about which it is easier to prove things.

I have no doubt that thoughts of the above kind have occurred to a high percentage of people who have thought about the union-closed conjecture, and can probably be found in the literature as well, but it would be odd not to mention them in this post.

To finish this section, here is a wild guess at a distribution that does the job. Like almost all wild guesses, its chances of being correct are very close to zero, but it gives the flavour of the kind of thing one might hope for.

Given a finite set X and a collection \mathcal{A} of subsets of X, we can pick a random set A\in\mathcal{A} (uniformly from \mathcal{A}) and look at the events x\in A for each x\in X. In general, these events are correlated.

Now let us define a matrix M by M_{xy}=\mathbb{P}[y\in A|x\in A]. We could now try to find a probability distribution \mu on X that minimizes the sum \sum_{x,y}\mu(x)\mu(y)M_{x,y}. That is, in a certain sense we would be trying to make the events x\in A as uncorrelated as possible on average. (There may be much better ways of measuring this — I’m just writing down the first thing that comes into my head that I can’t immediately see is stupid.)

What does this give in the case of the three sets \emptyset, \{1\} and \{1,2,3,4,5,6,7,8,9,10,11,12,13\}? We have that M_{xy}=1 if x=y=1 or x,y\geq 2 or y=1 and x\geq 2. If x=1 and y\geq 2, then M_{xy}=1/2, since if x\in A, then A is one of the two sets \{1\} and \{1,2,3,4,5,6,7,8,9,10,11,12,13\}, with equal probability.

So to minimize the sum \sum_{x,y}\mu(x)\mu(y)M_{xy} we should choose \mu so as to maximize the probability that x=1 and y\geq 2. If \mathbb{P}[x=1]=p, then this probability is p(1-p), which is maximized when p=1/2, so in fact we get the distribution mentioned earlier. In particular, for this distribution the average number of sets containing a random point is 3/2, which is precisely half the total number of sets. (I find this slightly worrying, since for a successful proof of this kind I would expect equality to be achieved only in the case that you have disjoint sets A_1,\dots,A_k and you take all their unions, including the empty set. But since this definition of a probability distribution isn’t supposed to be a serious candidate for a proof of the whole conjecture, I’m not too worried about being worried.)

Just to throw in another thought, perhaps some entropy-based distribution would be good. I wondered, for example, about defining a probability distribution as follows. Given any probability distribution, we obtain weights on the sets A\in\mathcal{A} by taking w(A) to be the probability that a random element (chosen from the distribution) belongs to A. We can then form a probability distribution on \mathcal{A} by taking the probabilities to be proportional to the weights. Finally, we can choose a distribution on the elements to maximize the entropy of the distribution on \mathcal{A}.

If we try that with the example above, and if p is the probability assigned to the element 1, then the three weights are 0, p and 1, so the probabilities we will assign will be 0, p/(1+p) and 1/(1+p). The entropy of this distribution will be maximized when the two non-zero probabilities are equal, which gives us p=1, so in this case we will pick out the element 1. It isn’t completely obvious that that is a bad thing to do for this particular example — indeed, we will do it whenever there is an element that is contained in all the non-empty sets from \mathcal{A}. Again, there is virtually no chance that this rather artificial construction will work, but perhaps after a lot of thought and several modifications and refinements, something like it could be got to work.

A failed attempt at a counterexample

I find the non-example I’m about to present interesting because I don’t have a good conceptual understanding of why it fails — it’s just that the numbers aren’t kind to me. But I think there is a proper understanding to be had. Can anyone give me a simple argument that no construction that is anything like what I tried can possibly work? (I haven’t even checked properly whether the known positive results about the problem ruled out my attempt before I even started.)

The idea was as follows. Let m<n/2 and p be parameters to be chosen later, and let \mathcal{A} be a random set system obtained by choosing each subset of \{1,2,\dots,n\} of size m with probability p, the choices being independent. We then take as our attempted counterexample the set of all unions of sets in \mathcal{A}.

Why might one entertain even for a second the thought that this could be a counterexample? Well, if we choose m to be rather close to n/2, but just slightly less, then a typical pair of sets of size m have a union of size close to 3n/4, and more generally a typical union of sets of size m has size at least this. There are vastly fewer sets of size greater than (1-\epsilon)3n/4 than there are of size m, so we could perhaps dare to hope that almost all the sets in the set system are the ones of size m, so the average size is close to m, which is less than n/2. And since the sets are spread around, the elements are likely to be contained in roughly the same number of sets each, so this gives a counterexample.

Of course, the problem here is that although a typical union is large, there are many atypical unions, so we need to get rid of them somehow — or at least the vast majority of them. This is where choosing a random subset comes in. The hope is that if we choose a fairly sparse random subset, then all the unions will be large rather than merely almost all.

However, this introduces a new problem, which is that if we have passed to a sparse random subset, then it is no longer clear that the size of that subset is bigger than the number of possible unions. So it becomes a question of balance: can we choose p small enough for the unions of those sets to be typical, but still large enough for the sets of size m to dominate the set system? We’re also free to choose m of course.

I usually find when I’m in a situation like this, where I’m hoping for a miracle, that a miracle doesn’t occur, and that indeed seems to be the case here. Let me explain my back-of-envelope calculation.

I’ll write \mathcal{U} for the set of unions of sets in \mathcal{A}. Let us now take s>m and give an upper bound for the expected number of sets in \mathcal{U} of size s. So fix a set B of size s and let us give a bound for the probability that B\in\mathcal{U}. We know that B must contain at least two sets in \mathcal{A}. But the number of pairs of sets of size m contained in B is at most \binom sm^2 and each such pair has a probability p^2 of being a pair of sets in \mathcal{A}, so the probability that B\in\mathcal{U} is at most p^2\binom sm^2. Therefore, the expected number of sets in \mathcal{U} of size s is at most p^2\binom ns\binom sm^2.

As for the expected number of sets in \mathcal{A}, it is p\binom nm, so if we want the example to work, we would very much like it to be the case that when s>n-m, we have the inequality

\displaystyle p\binom nm>p^2\binom ns\binom sm^2.

We can weaken this requirement by observing that the expected number of sets in \mathcal{U} of size s is also trivially at most \binom ns, so it is enough to go for

\displaystyle p\binom nm>p^2\binom ns\binom sm^2\wedge\binom ns.

If the left-hand side is not just greater than the right-hand side, but greater by a factor of say n^2 for each s, then we should be in good shape: the average size of a set in \mathcal{U} will be not much greater than m and we’ll be done.

If s is not much bigger than n/2, then things look quite promising. In this case, \binom ns will be comparable in size to \binom nm, but \binom sm will be quite small — it equals \binom s{s-m}, and s-m is small. A crude estimate says that we’ll be OK provided that p is significantly smaller than n^{-2(s-m)}. And that looks OK, since n^{2(s-m)} is a lot smaller than \binom nm, so we aren’t being made to choose a ridiculously small value of p.

If on the other hand s is quite a lot larger than n/2, then \binom ns is much much smaller than \binom nm, so we’re in great shape as long as we haven’t chosen p so tiny that p\binom nm is also much much smaller than \binom nm.

So what goes wrong? Well, the problem is that the first argument requires smaller and smaller values of p as s gets further and further away from n/2, and the result seems to be that by the time the second regime takes over, p has become too small for the trivial argument to work.

Let me try to be a bit more precise about this. The point at which \binom ns becomes smaller than p^2\binom ns \binom sm^2 is of course the point at which p=\binom sm^{-1}. For that value of p, we require p\binom nm>\binom ns, so we need \binom nm>\binom ns\binom sm. However, an easy calculation reveals that

\displaystyle \binom ns\binom sm\binom nm^{-1}=\binom{n-m}{s-m},

(or observe that if you multiply both sides by \binom nm, then both expressions are equal to the multinomial coefficient that counts the number of ways of writing an n-element set as A\cup B\cup C with |A|=n-s, |B|=s-m and |C|=m). So unfortunately we find that however we choose the value of p there is a value of s such that the number of sets in \mathcal{U} of size s is greater than p\binom nm=|\mathcal{A}|. (I should remark that the estimate p^2\binom sm^2=p^2\binom sm\binom s{s-m} for the number of sets in \mathcal{U} of size s can be improved to p^2\binom sm \binom m{s-m}, but this does not make enough of a difference to rescue the argument.)

So unfortunately it turns out that the middle of the range is worse than the two ends, and indeed worse by enough to kill off the idea. However, it seemed to me to be good to make at least some attempt to find a counterexample in order to understand the problem better.

From here there are two obvious ways to go. One is to try to modify the above idea to give it a better chance of working. The other, which I have already mentioned, is to try to generalize the failure: that is, to explain why that example, and many others like it, had no hope of working. Alternatively, somebody could propose a completely different line of enquiry.


I’ll stop there. Experience with Polymath projects so far seems to suggest that, as with individual projects, it is hard to predict how long they will continue before there is a general feeling of being stuck. So I’m thinking of this as a slightly tentative suggestion, and if it provokes a sufficiently healthy conversation and interesting new (or at least new to me) ideas, then I’ll write another post and launch a project more formally. In particular, only at that point will I call it Polymath11 (or should that be Polymath12? — I don’t know whether the almost instantly successful polynomial-identities project got round to assigning itself a number). Also, for various reasons I don’t want to get properly going on a Polymath project for at least a week, though I realize I may not be in complete control of what happens in response to this post.

Just before I finish, let me remark that Polymath10, attempting to prove Erdős’s sunflower conjecture, is still continuing on Gil Kalai’s blog. What’s more, I think it is still at a stage where a newcomer could catch up with what is going on — it might take a couple of hours to find and digest a few of the more important comments. But Gil and I agree that there may well be room to have more than one Polymath project going at the same time, since a common pattern is for the group of participants to shrink down to a smallish number of “enthusiasts”, and there are enough mathematicians to form many such groups.

And a quick reminder, as maybe some people reading this will be new to the concept of Polymath projects. The aim is to try to make the problem-solving process easier in various ways. One is to have an open discussion, in the form of blog posts and comments, so that anybody can participate, and with luck a process of self-selection will take place that results in a team of enthusiastic people with a good mixture of skills and knowledge. Another is to encourage people to express ideas that may well be half-baked or even wrong, or even completely obviously wrong. (It’s surprising how often a completely obviously wrong idea can stimulate a different idea that turns out to be very useful. Naturally, expressing such an idea can be embarrassing, but it shouldn’t be, as it is an important part of what we do when we think about problems privately.) Another is to provide a mechanism where people can get very quick feedback on their ideas — this too can be extremely stimulating and speed up the process of thought considerably. If you like the problem but don’t feel like pursuing either of the approaches I’ve outlined above, that’s of course fine — your ideas are still welcome and may well be more fruitful than those ones, which are there just to get the discussion started.