15 years of complexity of distributions

Thoughts 2024-11-12

This month marks 15 years of complexity of distributions, counting since [Vio09]. This area has seen substantial activity and progress, including [Vio12a, LV12, Vio14, DW11, BIL12, BCS14, Vio12b, Vio20, CGZ21, Vio23b, YZ24, FLRS23, SS24, KOW24]. Also, several connections to other areas have been established, as discussed below. And there’s more coming up! In this post I start with some historical perspective and then discuss the connections to other areas. For more you can see the thread [Viob] on this blog, the section on AC0 sampling in my book [Vio23a], or the (somewhat dated) survey talk I gave at Simons [Vioa].

On my inspiration, and the relationship with previous work..

The original manuscript makes clear what my inspiration has been. The cute (and in hindsight elementary) result [Bab87, BL87] that even a 2-local (NC0) circuit can sample $(x,\text {parity}(x))$ , even though parity is hard to compute even in AC0, has been a great inspiration for me. A first reaction is that this is an isolated example, but an extension in [IN96] shows a similar example for inner product, and this has an application to constructing pseudorandom generators.

When a revision of the manuscript was accepted to stoc/focs [Vio12a], I was pointed to [GGN10] and I didn’t mind adding a reference. In later works I have added even more [Vio12b]:

”The first work on sampling complexity may be the one by Jerrum, Valiant, and Vazirani [JVV86] who define sampling complexity classes and prove reductions among various problems. An unconditional communication complexity lower bound for sampling disjointness appears in the work [ASTS $^{+}$ 03] by Ambainis, Schulman, Ta-Shma, Vazirani, and Wigderson. Goldreich, Goldwasser, and Nussboim study the complexity of sampling in [GGN10] as part of a general study of the implementation of huge random objects. Aaronson proves in [Aar11] a connection between sampling and searching problems.”

Text like this has unfortunately led people to believe—and even assert in papers—that this line of research originated in earlier works. I feel earlier works had little or no influence, and I ask authors to not make such claims unless they have looked at the papers and really believe them.

Let’s discuss the papers that are sometimes mentioned as precursors:

[JVV86] is about reductions; it would be like saying that the parity lower bound for AC0 originates in the NP-completeness of 3Sat.

[ASTS $^{+}$ 03]: The only relevant result is a lower bound on the communication complexity of sampling disjointness. I like this result, which I didn’t know back then, but there is nothing about computation. Some manuscripts claim that local (NC0) sampling lower bounds go back to this paper, but this isn’t true. The bulk of the paper is about quantum protocols.

[GGN10]. Anecdote: GGN was presented during my time at Harvard, but I relied on email announcements which due to a system problem didn’t go out that day, so I missed the talk; symmetrically, [Vio09] was presented at Oberwolfach, with some of the authors of GGN in first row (I am cursed with strong memory) but didn’t elicit a pointer to [GGN10] (until after stoc/focs accept). This paper is long. The most relevant section appears to be 2.5:

2.5. Objects of feasible size. In contrast to the rest of this work, in the current subsection we (shortly) discuss the complexity of generating random objects of feasible size (rather than huge random objects). In other words, we are talking about implementing a distribution on poly(n)-bit long strings, and doing so in poly(n)-time. This problem can be cast in our general formulation by considering specifications that ignore their input (i.e., have output that depend only on their random-tape). In other words, we may view objects of feasible size as constant functions, and consider a specification of such random objects as a distribution on constant functions. Thus, w.l.o.g., the implementation may also ignore its input, and consequently, in this case there is no difference between an implementation by ordinary machine and an implementation by oracle machine with a random oracle. We note that perfect implementations of such distributions were considered before (e.g., in [1, 5, 17]), and distributions for which such implementations exist are called sampleable. In the current context, where the observer sees the entire object, the distinction between perfect implementation and close-implementation seems quite technical. What seems fundamentally different is the study of pseudoimplementations

I think this gives a sense of the (non-)relationship.

If you want to draw a line, an easy one is this: None of these earlier works are concerned with computational lower bounds. The distinction also manifests itself in the following applications and connections to five different research areas:

(1) Randomness extractors..

The results on sampling lower bounds had an impact on breakthrough constructions of two-source extractors: the papers [CZ16, Li16, CS16, Coh16, BDT16] build on models or results in this line. Specifically, [Vio14] introduced a new class of sources (some bits are $k$ -wise uniform, the others adversarially chosen), gave the first extractor for them (majority, analyzed using the PI’s previous work [DGJ $^{+}$ 10]), and finally asked if better extractors exist. Answering the question affirmatively is a main step in the breakthrough construction of two-source extractors for polylogarithmic entropy by Chattopadhyay and Zuckerman [CZ16]. Follow-up work [Li16] gives better yet extractors for these sources. Subsequent papers leading to better and better two-source extractors find it beneficial to use instead the original extractor [Vio14], see [CS16, Coh16, BDT16]. For discussion and the state of the art, see [Li23].

(2) Data structures..

The goal of a succinct data structure is to store data using space (or memory) close to the information-theoretic limit, while at the same time being able to answer many queries. The original papers [Vio09, Vio12a] point out a connection between sampling lower bounds and succinct data structures which has been used in a number of subsequent works, including for example [BIL12, CGZ21, YZ24, KOW24]. This connection is technically simple and shows that if a sampling lower bound is established with statistical distance very close to one, then a data structure lower bound follows. My recent work [Vio23b] uses this connection, in conjunction with new sampling lower bounds, to give a new proof of several data-structure lower bounds (including for the succinct rank problem, obtained with Pătraşcu [PV10]) thus unifying sampling and data-structure lower bounds. (This new proof doesn’t seem too well-known among people working on data-structure lower bounds; presumably the fact that the paper appeared in CCC didn’t help.)

(3) Low-distortion embeddings..

Motivated by sampling lower bounds, Lovett and I asked in [LV12] for explicit subsets of the hypercube of density $1/2$ (corresponding to boolean functions) that are not the image of low-stretch maps. This question has given rise to an autonomous research area, where a number of surprising constructions have been obtained, for example for the set corresponding to Majority [BCS14]. Still, many open questions remain, including our original question. We refer to [BS23] for discussion and pointers.

(4) Quantum-classical separations..

Sampling lower bounds have been used to obtain new separations between classical and quantum models of computation in [BGK17, WP23, KOW24]. A sampling lower bound of mine in [Vio23b] gives the strongest separation – see discussion in [KOW24] and [Viob].

(5) Stochastic codes for computationally bounded channels..

In a recent work which I like very much, Shaltiel and Silbak [SS24] draw an intriguing connection between sampling lower bounds and constructing error-correcting codes for computationally bounded channels. The latter were originally introduced in the seminal work by Guruswami and Smith [GS16] who showed that explicit (stochastic) codes with nearly optimal rate can be built against channels with certain computational limitations (whereas this is not known for standard Hamming channels with no computational limitations). The work [SS24] gives constructions for the most natural class of computational channels: those implemented by small circuits. A main tool is a construction of functions that are hard to sample by small circuits. [SS24] needs a strong type of sampling lower bounds which was in fact already established for AC $^{0}$ in [Vio20]. [SS24] obtains such sampling lower bounds for general circuits under strong computational assumptions.

References

[Aar11] Scott Aaronson. The equivalence of sampling and searching. In Computer Science Symp. in Russia (CSR), pages 1–14, 2011.

[ASTS $^{+}$ 03] Andris Ambainis, Leonard J. Schulman, Amnon Ta-Shma, Umesh V. Vazirani, and Avi Wigderson. The quantum communication complexity of sampling. SIAM J. on Computing, 32(6):1570–1585, 2003.

[Bab87] Lßszl≤ Babai. Random oracles separate $\rm PSPACE$ from the polynomial-time hierarchy. Information Processing Letters, 26(1):51–53, 1987.

[BCS14] Itai Benjamini, Gil Cohen, and Igor Shinkar. Bi-lipschitz bijection between the boolean cube and the hamming ball. In IEEE Symp. on Foundations of Computer Science (FOCS), 2014.

[BDT16] Avraham Ben-Aroya, Dean Doron, and Amnon Ta-Shma. Explicit two-source extractors for near-logarithmic min-entropy. Electronic Colloquium on Computational Complexity (ECCC), 23:88, 2016.

[BGK17] Sergey Bravyi, David Gosset, and Robert K÷nig. Quantum advantage with shallow circuits. CoRR, abs/1704.00690, 2017.

[BIL12] Chris Beck, Russell Impagliazzo, and Shachar Lovett. Large deviation bounds for decision trees and sampling lower bounds for AC0-circuits. Electronic Colloquium on Computational Complexity (ECCC), 19:42, 2012.

[BL87] Ravi Boppana and Jeffrey Lagarias. One-way functions and circuit complexity. Information and Computation, 74(3):226–240, 1987.

[BS23] Lucas Boczkowski and Igor Shinkar. On mappings on the hypercube with small average stretch. Comb. Probab. Comput., 32(2):334–348, 2023.

[CGZ21] Eshan Chattopadhyay, Jesse Goodman, and David Zuckerman. The space complexity of sampling. Electron. Colloquium Comput. Complex., page 106, 2021.

[Coh16] Gil Cohen. Making the most of advice: New correlation breakers and their applications. In IEEE Symp. on Foundations of Computer Science (FOCS), pages 188–196, 2016.

[CS16] Gil Cohen and Leonard J. Schulman. Extractors for near logarithmic min-entropy. Electronic Colloquium on Computational Complexity (ECCC), 23:14, 2016.

[CZ16] Eshan Chattopadhyay and David Zuckerman. Explicit two-source extractors and resilient functions. In ACM Symp. on the Theory of Computing (STOC), pages 670–683, 2016.

[DGJ $^{+}$ 10] Ilias Diakonikolas, Parikshit Gopalan, Ragesh Jaiswal, Rocco A. Servedio, and Emanuele Viola. Bounded independence fools halfspaces. SIAM J. on Computing, 39(8):3441–3462, 2010.

[DW11] Anindya De and Thomas Watson. Extractors and lower bounds for locally samplable sources. In Workshop on Randomization and Computation (RANDOM), 2011.

[FLRS23] Yuval Filmus, Itai Leigh, Artur Riazanov, and Dmitry Sokolov. Sampling and certifying symmetric functions. In APPROX-RANDOM, 2023.

[GGN10] Oded Goldreich, Shafi Goldwasser, and Asaf Nussboim. On the implementation of huge random objects. SIAM J. Comput., 39(7):2761–2822, 2010.

[GS16] Venkatesan Guruswami and Adam D. Smith. Optimal rate code constructions for computationally simple channels. J. ACM, 63(4):35:1–35:37, 2016.

[IN96] Russell Impagliazzo and Moni Naor. Efficient cryptographic schemes provably as secure as subset sum. J. of Cryptology, 9(4):199–216, 1996.

[JVV86] Marc R. Jerrum, Leslie G. Valiant, and Vijay V. Vazirani. Random generation of combinatorial structures from a uniform distribution. Theoretical Computer Science, 43(2–3):169–188, 1986.

[KOW24] Daniel M. Kane, Anthony Ostuni, and Kewen Wu. Locality bounds for sampling hamming slices. In ACM Symp. on the Theory of Computing (STOC), 2024.

[Li16] Xin Li. Improved two-source extractors, and affine extractors for polylogarithmic entropy. In IEEE Symp. on Foundations of Computer Science (FOCS), 2016.

[Li23] Xin Li. Two source extractors for asymptotically optimal entropy, and (many) more. In FOCS, pages 1271–1281. IEEE, 2023.

[LV12] Shachar Lovett and Emanuele Viola. Bounded-depth circuits cannot sample good codes. Computational Complexity, 21(2):245–266, 2012.

[PV10] Mihai Pǎtraşcu and Emanuele Viola. Cell-probe lower bounds for succinct partial sums. In 21th ACM-SIAM Symp. on Discrete Algorithms (SODA), pages 117–122, 2010.

[SS24] Ronen Shaltiel and Jad Silbak. Explicit codes for poly-size circuits and functions that are hard to sample on low entropy distributions. In ACM Symp. on the Theory of Computing (STOC), 2024.

[Vioa] Emanuele Viola. The Complexity of Distributions, Fall 2018 talk at the Simons Institute. https://www.youtube.com/watch?v=O78b085HE3w.

[Viob] Emanuele Viola. Thoughts with tag = the-complexity-of-distributions-2/. https://emanueleviola.wordpress.com/tag/the-complexity-of-distributions-2/.

[Vio09] Emanuele Viola. Are all distributions easy? Electronic Colloquium on Computational Complexity, Technical Report TR09-114, 2009. http://www.eccc.uni-trier.de/.

[Vio12a] Emanuele Viola. The complexity of distributions. SIAM J. on Computing, 41(1):191–218, 2012.

[Vio12b] Emanuele Viola. Extractors for turing-machine sources. In Workshop on Randomization and Computation (RANDOM), 2012.

[Vio14] Emanuele Viola. Extractors for circuit sources. SIAM J. on Computing, 43(2):355–972, 2014.

[Vio20] Emanuele Viola. Sampling lower bounds: boolean average-case and permutations. SIAM J. on Computing, 49(1), 2020. Available at http://www.ccs.neu.edu/home/viola/.

[Vio23a] Emanuele Viola. Mathematics of the impossible: The uncharted complexity of computation. 2023.

[Vio23b] Emanuele Viola. New sampling lower bounds via the separator. In Conf. on Computational Complexity (CCC), 2023. Available at http://www.ccs.neu.edu/home/viola/.

[WP23] Adam Bene Watts and Natalie Parham. Unconditional quantum advantage for sampling with shallow circuits. CoRR, abs/2301.00995, 2023.

[YZ24] Huacheng Yu and Wei Zhan. Sampling, flowers and communication. In ACM Innovations in Theoretical Computer Science conf. (ITCS), 2024.