Source author record

Mark Bun

Mark Bun appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Cryptography and Security Data Structures and Algorithms Computational Complexity Information Theory math.IT quant-ph

Catalog footprint

What is connected

14works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Strong Memory Lower Bounds for Learning Natural Models

We give lower bounds on the amount of memory required by one-pass streaming algorithms for solving several natural learning problems. In a setting where examples lie in $\{0,1\}^d$ and the optimal classifier can be encoded using $κ$ bits, we show that algorithms which learn using a near-minimal number of examples, $\tilde O(κ)$, must use $\tilde Ω( dκ)$ bits of space. Our space bounds match the dimension of the ambient space of the problem's natural parametrization, even when it is quadratic in the size of examples and the final classifier. For instance, in the setting of $d$-sparse linear classifiers over degree-2 polynomial features, for which $κ=Θ(d\log d)$, our space lower bound is $\tildeΩ(d^2)$. Our bounds degrade gracefully with the stream length $N$, generally having the form $\tildeΩ\left(dκ\cdot \fracκ{N}\right)$. Bounds of the form $Ω(dκ)$ were known for learning parity and other problems defined over finite fields. Bounds that apply in a narrow range of sample sizes are also known for linear regression. Ours are the first such bounds for problems of the type commonly seen in recent learning applications that apply for a large range of input sizes.

preprint2021arXiv

Differentially Private Correlation Clustering

Correlation clustering is a widely used technique in unsupervised machine learning. Motivated by applications where individual privacy is a concern, we initiate the study of differentially private correlation clustering. We propose an algorithm that achieves subquadratic additive error compared to the optimal cost. In contrast, straightforward adaptations of existing non-private algorithms all lead to a trivial quadratic error. Finally, we give a lower bound showing that any pure differentially private algorithm for correlation clustering requires additive error of $Ω(n)$.

preprint2021arXiv

Private Hypothesis Selection

We provide a differentially private algorithm for hypothesis selection. Given samples from an unknown probability distribution $P$ and a set of $m$ probability distributions $\mathcal{H}$, the goal is to output, in a $\varepsilon$-differentially private manner, a distribution from $\mathcal{H}$ whose total variation distance to $P$ is comparable to that of the best such distribution (which we denote by $α$). The sample complexity of our basic algorithm is $O\left(\frac{\log m}{α^2} + \frac{\log m}{α\varepsilon}\right)$, representing a minimal cost for privacy when compared to the non-private algorithm. We also can handle infinite hypothesis classes $\mathcal{H}$ by relaxing to $(\varepsilon,δ)$-differential privacy. We apply our hypothesis selection algorithm to give learning algorithms for a number of natural distribution classes, including Gaussians, product distributions, sums of independent random variables, piecewise polynomials, and mixture classes. Our hypothesis selection procedure allows us to generically convert a cover for a class to a learning algorithm, complementing known learning lower bounds which are in terms of the size of the packing number of the class. As the covering and packing numbers are often closely related, for constant $α$, our algorithms achieve the optimal sample complexity for many classes of interest. Finally, we describe an application to private distribution-free PAC learning.

preprint2020arXiv

A Computational Separation between Private Learning and Online Learning

A recent line of work has shown a qualitative equivalence between differentially private PAC learning and online learning: A concept class is privately learnable if and only if it is online learnable with a finite mistake bound. However, both directions of this equivalence incur significant losses in both sample and computational efficiency. Studying a special case of this connection, Gonen, Hazan, and Moran (NeurIPS 2019) showed that uniform or highly sample-efficient pure-private learners can be time-efficiently compiled into online learners. We show that, assuming the existence of one-way functions, such an efficient conversion is impossible even for general pure-private learners with polynomial sample complexity. This resolves a question of Neel, Roth, and Wu (FOCS 2019).

preprint2020arXiv

Efficient, Noise-Tolerant, and Private Learning via Boosting

We introduce a simple framework for designing private boosting algorithms. We give natural conditions under which these algorithms are differentially private, efficient, and noise-tolerant PAC learners. To demonstrate our framework, we use it to construct noise-tolerant and private PAC learners for large-margin halfspaces whose sample complexity does not depend on the dimension. We give two sample complexity bounds for our large-margin halfspace learner. One bound is based only on differential privacy, and uses this guarantee as an asset for ensuring generalization. This first bound illustrates a general methodology for obtaining PAC learners from privacy, which may be of independent interest. The second bound uses standard techniques from the theory of large-margin classification (the fat-shattering dimension) to match the best known sample complexity for differentially private learning of large-margin halfspaces, while additionally tolerating random label noise.

preprint2020arXiv

New Oracle-Efficient Algorithms for Private Synthetic Data Release

We present three new algorithms for constructing differentially private synthetic data---a sanitized version of a sensitive dataset that approximately preserves the answers to a large collection of statistical queries. All three algorithms are \emph{oracle-efficient} in the sense that they are computationally efficient when given access to an optimization oracle. Such an oracle can be implemented using many existing (non-private) optimization tools such as sophisticated integer program solvers. While the accuracy of the synthetic data is contingent on the oracle's optimization performance, the algorithms satisfy differential privacy even in the worst case. For all three algorithms, we provide theoretical guarantees for both accuracy and privacy. Through empirical evaluation, we demonstrate that our methods scale well with both the dimensionality of the data and the number of queries. Compared to the state-of-the-art method High-Dimensional Matrix Mechanism \cite{McKennaMHM18}, our algorithms provide better accuracy in the large workload and high privacy regime (corresponding to low privacy loss $\varepsilon$).

preprint2016arXiv

Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds

"Concentrated differential privacy" was recently introduced by Dwork and Rothblum as a relaxation of differential privacy, which permits sharper analyses of many privacy-preserving computations. We present an alternative formulation of the concept of concentrated differential privacy in terms of the Renyi divergence between the distributions obtained by running an algorithm on neighboring inputs. With this reformulation in hand, we prove sharper quantitative results, establish lower bounds, and raise a few new questions. We also unify this approach with approximate differential privacy by giving an appropriate definition of "approximate concentrated differential privacy."

preprint2016arXiv

Make Up Your Mind: The Price of Online Queries in Differential Privacy

We consider the problem of answering queries about a sensitive dataset subject to differential privacy. The queries may be chosen adversarially from a larger set Q of allowable queries in one of three ways, which we list in order from easiest to hardest to answer: Offline: The queries are chosen all at once and the differentially private mechanism answers the queries in a single batch. Online: The queries are chosen all at once, but the mechanism only receives the queries in a streaming fashion and must answer each query before seeing the next query. Adaptive: The queries are chosen one at a time and the mechanism must answer each query before the next query is chosen. In particular, each query may depend on the answers given to previous queries. Many differentially private mechanisms are just as efficient in the adaptive model as they are in the offline model. Meanwhile, most lower bounds for differential privacy hold in the offline setting. This suggests that the three models may be equivalent. We prove that these models are all, in fact, distinct. Specifically, we show that there is a family of statistical queries such that exponentially more queries from this family can be answered in the offline model than in the online model. We also exhibit a family of search queries such that exponentially more queries from this family can be answered in the online model than in the adaptive model. We also investigate whether such separations might hold for simple queries like threshold queries over the real line.

preprint2015arXiv

Dual Polynomials for Collision and Element Distinctness

The approximate degree of a Boolean function $f: \{-1, 1\}^n \to \{-1, 1\}$ is the minimum degree of a real polynomial that approximates $f$ to within error $1/3$ in the $\ell_\infty$ norm. In an influential result, Aaronson and Shi (J. ACM 2004) proved tight $\tildeΩ(n^{1/3})$ and $\tildeΩ(n^{2/3})$ lower bounds on the approximate degree of the Collision and Element Distinctness functions, respectively. Their proof was non-constructive, using a sophisticated symmetrization argument and tools from approximation theory. More recently, several open problems in the study of approximate degree have been resolved via the construction of dual polynomials. These are explicit dual solutions to an appropriate linear program that captures the approximate degree of any function. We reprove Aaronson and Shi's results by constructing explicit dual polynomials for the Collision and Element Distinctness functions.

preprint2015arXiv

Order-Revealing Encryption and the Hardness of Private Learning

An order-revealing encryption scheme gives a public procedure by which two ciphertexts can be compared to reveal the ordering of their underlying plaintexts. We show how to use order-revealing encryption to separate computationally efficient PAC learning from efficient $(ε, δ)$-differentially private PAC learning. That is, we construct a concept class that is efficiently PAC learnable, but for which every efficient learner fails to be differentially private. This answers a question of Kasiviswanathan et al. (FOCS '08, SIAM J. Comput. '11). To prove our result, we give a generic transformation from an order-revealing encryption scheme into one with strongly correct comparison, which enables the consistent comparison of ciphertexts that are not obtained as the valid encryption of any message. We believe this construction may be of independent interest.

preprint2015arXiv

Simultaneous Private Learning of Multiple Concepts

We investigate the direct-sum problem in the context of differentially private PAC learning: What is the sample complexity of solving $k$ learning tasks simultaneously under differential privacy, and how does this cost compare to that of solving $k$ learning tasks without privacy? In our setting, an individual example consists of a domain element $x$ labeled by $k$ unknown concepts $(c_1,\ldots,c_k)$. The goal of a multi-learner is to output $k$ hypotheses $(h_1,\ldots,h_k)$ that generalize the input examples. Without concern for privacy, the sample complexity needed to simultaneously learn $k$ concepts is essentially the same as needed for learning a single concept. Under differential privacy, the basic strategy of learning each hypothesis independently yields sample complexity that grows polynomially with $k$. For some concept classes, we give multi-learners that require fewer samples than the basic strategy. Unfortunately, however, we also give lower bounds showing that even for very simple concept classes, the sample cost of private multi-learning must grow polynomially in $k$.

preprint2014arXiv

Dual Lower Bounds for Approximate Degree and Markov-Bernstein Inequalities

The $ε$-approximate degree of a Boolean function $f: \{-1, 1\}^n \to \{-1, 1\}$ is the minimum degree of a real polynomial that approximates $f$ to within $ε$ in the $\ell_\infty$ norm. We prove several lower bounds on this important complexity measure by explicitly constructing solutions to the dual of an appropriate linear program. Our first result resolves the $ε$-approximate degree of the two-level AND-OR tree for any constant $ε> 0$. We show that this quantity is $Θ(\sqrt{n})$, closing a line of incrementally larger lower bounds. The same lower bound was recently obtained independently by Sherstov using related techniques. Our second result gives an explicit dual polynomial that witnesses a tight lower bound for the approximate degree of any symmetric Boolean function, addressing a question of Špalek. Our final contribution is to reprove several Markov-type inequalities from approximation theory by constructing explicit dual solutions to natural linear programs. These inequalities underly the proofs of many of the best-known approximate degree lower bounds, and have important uses throughout theoretical computer science.

preprint2014arXiv

Hardness Amplification and the Approximate Degree of Constant-Depth Circuits

We establish a generic form of hardness amplification for the approximability of constant-depth Boolean circuits by polynomials. Specifically, we show that if a Boolean circuit cannot be pointwise approximated by low-degree polynomials to within constant error in a certain one-sided sense, then an OR of disjoint copies of that circuit cannot be pointwise approximated even with very high error. As our main application, we show that for every sequence of degrees $d(n)$, there is an explicit depth-three circuit $F: \{-1,1\}^n \to \{-1,1\}$ of polynomial-size such that any degree-$d$ polynomial cannot pointwise approximate $F$ to error better than $1-\exp\left(-\tildeΩ(nd^{-3/2})\right)$. As a consequence of our main result, we obtain an $\exp\left(-\tildeΩ(n^{2/5})\right)$ upper bound on the the discrepancy of a function in AC$^0$, and an $\exp\left(\tildeΩ(n^{2/5})\right)$ lower bound on the threshold weight of AC$^0$, improving over the previous best results of $\exp\left(-Ω(n^{1/3})\right)$ and $\exp\left(Ω(n^{1/3})\right)$ respectively. Our techniques also yield a new lower bound of $Ω\left(n^{1/2}/\log^{(d-2)/2}(n)\right)$ on the approximate degree of the AND-OR tree of depth $d$, which is tight up to polylogarithmic factors for any constant $d$, as well as new bounds for read-once DNF formulas. In turn, these results imply new lower bounds on the communication and circuit complexity of these classes, and demonstrate strong limitations on existing PAC learning algorithms.

preprint2014arXiv

Weighted Polynomial Approximations: Limits for Learning and Pseudorandomness

Polynomial approximations to boolean functions have led to many positive results in computer science. In particular, polynomial approximations to the sign function underly algorithms for agnostically learning halfspaces, as well as pseudorandom generators for halfspaces. In this work, we investigate the limits of these techniques by proving inapproximability results for the sign function. Firstly, the polynomial regression algorithm of Kalai et al. (SIAM J. Comput. 2008) shows that halfspaces can be learned with respect to log-concave distributions on $\mathbb{R}^n$ in the challenging agnostic learning model. The power of this algorithm relies on the fact that under log-concave distributions, halfspaces can be approximated arbitrarily well by low-degree polynomials. We ask whether this technique can be extended beyond log-concave distributions, and establish a negative result. We show that polynomials of any degree cannot approximate the sign function to within arbitrarily low error for a large class of non-log-concave distributions on the real line, including those with densities proportional to $\exp(-|x|^{0.99})$. Secondly, we investigate the derandomization of Chernoff-type concentration inequalities. Chernoff-type tail bounds on sums of independent random variables have pervasive applications in theoretical computer science. Schmidt et al. (SIAM J. Discrete Math. 1995) showed that these inequalities can be established for sums of random variables with only $O(\log(1/δ))$-wise independence, for a tail probability of $δ$. We show that their results are tight up to constant factors. These results rely on techniques from weighted approximation theory, which studies how well functions on the real line can be approximated by polynomials under various distributions. We believe that these techniques will have further applications in other areas of computer science.

Mark Bun

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Strong Memory Lower Bounds for Learning Natural Models

Differentially Private Correlation Clustering

Private Hypothesis Selection

A Computational Separation between Private Learning and Online Learning

Efficient, Noise-Tolerant, and Private Learning via Boosting

New Oracle-Efficient Algorithms for Private Synthetic Data Release

Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds

Make Up Your Mind: The Price of Online Queries in Differential Privacy

Dual Polynomials for Collision and Element Distinctness

Order-Revealing Encryption and the Hardness of Private Learning

Simultaneous Private Learning of Multiple Concepts

Dual Lower Bounds for Approximate Degree and Markov-Bernstein Inequalities

Hardness Amplification and the Approximate Degree of Constant-Depth Circuits

Weighted Polynomial Approximations: Limits for Learning and Pseudorandomness