Sampling, separators, and adaptivity

Thoughts 2023-12-16

(This replaces an old post, where my understanding was wrong. I am grateful to the authors of [YZ24] for discussions which prompted me to look into this again.)

A recent paper, [YZ24], makes progress on two problems from my work on sampling permutations [Vio20]. First, it improves the cell-probe lower bound for sampling permutations over [n] from about \log \log n to about \log n. Second, they prove a lower bound for sampling a certain distribution which was put forth in my work as a candidate for a separation between adaptive and non-adaptive sampling, thus establishing the separation.

There are several ideas in their work, so it is not immediately clear where they gain from my work. They suggest that a key difference is what they call flowers, which is similar to sunflowers where the kernel just needs to be a little less than trivial, and then such objects start to appear even for smaller families.

Actually, this result was already known, and at times it was called “separator.” I quote a relevant lemma (similar statements appear in other works).

Lemma 1. (Lemma 2.3 in [Vio09]) [Separator] For every n sets Q(1),Q(2),\ldots ,Q(n) of size q each and every desired “gap” g, there is w\in [n/(gq)^{q},n] and a “kernel” set B of size |B|\le w/g such that there are \ge w disjoint sets among Q(1)\setminus B,Q(2)\setminus B,\ldots ,Q(n)\setminus B.

This appears to somewhat improve and simplify the proof of Theorem 2.1 in their paper.

The novelty in their work lies in the other arguments, which I find quite ingenious.

For the first type of results, I always needed that |B|\le w^{2}/n, while they show that in fact you can always work with w slightly larger than |B|, as long as w^{2}>n. So for example, for w around \sqrt {n} I didn’t get much, while they do. Their proof decouples the statistical-distance bound from the size of the kernel, while in mine they are tied together.

For the second result, they reduce sampling two equal sets with non-trivial error to the sampling of the candidate distribution. Then they prove a lower bound for the former (and for this they don’t need any decoupling). The argument is simple and, in my opinion, very cool.

Adaptivity

A favorite open problem of mine in this area is: Can you sample a near-uniform permutation of [n] with 2 adaptive queries (per output element)? Today is your turn to solve this problem. A related obsession of mine over the years has been to somehow port the separator to the adaptive setting. The closest I think I got is the separator in [Vio23], which basically shows that you can restrict the input to the sampler so that many queries are nearly independent. This can be used to prove some new sampling lower bounds which are also strong enough to imply succinct data-structure lower bounds. Alas, the parameters don’t seem strong enough for the permutation problem… or maybe they are, and I am again missing it. It’s worth looking at this problem again and see if their new ideas can help here.

References

[Vio09]   Emanuele Viola. Cell-probe lower bounds for prefix sums, 2009. arXiv:0906.1370v1.

[Vio20]   Emanuele Viola. Sampling lower bounds: boolean average-case and permutations. SIAM J. on Computing, 49(1), 2020. Available at http://www.ccs.neu.edu/home/viola/.

[Vio23]   Emanuele Viola. New sampling lower bounds via the separator. In Conf. on Computational Complexity (CCC), 2023. Available at http://www.ccs.neu.edu/home/viola/.

[YZ24]    Huacheng Yu and Wei Zhan. Sampling, flowers and communication. In ACM Innovations in Theoretical Computer Science conf. (ITCS), 2024.