Source author record

Aleksandar Nikolov

Aleksandar Nikolov appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Computational Geometry Cryptography and Security Machine Learning math.CO Discrete Mathematics Computational Complexity math.FA Distributed, Parallel, and Cluster Computing Information Theory math.IT math.NT math.OC math.PR

Catalog footprint

What is connected

21works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Private Query Release via the Johnson-Lindenstrauss Transform

We introduce a new method for releasing answers to statistical queries with differential privacy, based on the Johnson-Lindenstrauss lemma. The key idea is to randomly project the query answers to a lower dimensional space so that the distance between any two vectors of feasible query answers is preserved up to an additive error. Then we answer the projected queries using a simple noise-adding mechanism, and lift the answers up to the original dimension. Using this method, we give, for the first time, purely differentially private mechanisms with optimal worst case sample complexity under average error for answering a workload of $k$ queries over a universe of size $N$. As other applications, we give the first purely private efficient mechanisms with optimal sample complexity for computing the covariance of a bounded high-dimensional distribution, and for answering 2-way marginal queries. We also show that, up to the dependence on the error, a variant of our mechanism is nearly optimal for every given query workload.

preprint2020arXiv

Locally Private Hypothesis Selection

We initiate the study of hypothesis selection under local differential privacy. Given samples from an unknown probability distribution $p$ and a set of $k$ probability distributions $\mathcal{Q}$, we aim to output, under the constraints of $\varepsilon$-local differential privacy, a distribution from $\mathcal{Q}$ whose total variation distance to $p$ is comparable to the best such distribution. This is a generalization of the classic problem of $k$-wise simple hypothesis testing, which corresponds to when $p \in \mathcal{Q}$, and we wish to identify $p$. Absent privacy constraints, this problem requires $O(\log k)$ samples from $p$, and it was recently shown that the same complexity is achievable under (central) differential privacy. However, the naive approach to this problem under local differential privacy would require $\tilde O(k^2)$ samples. We first show that the constraint of local differential privacy incurs an exponential increase in cost: any algorithm for this problem requires at least $Ω(k)$ samples. Second, for the special case of $k$-wise simple hypothesis testing, we provide a non-interactive algorithm which nearly matches this bound, requiring $\tilde O(k)$ samples. Finally, we provide sequentially interactive algorithms for the general case, requiring $\tilde O(k)$ samples and only $O(\log \log k)$ rounds of interactivity. Our algorithms are achieved through a reduction to maximum selection with adversarial comparators, a problem of independent interest for which we initiate study in the parallel setting. For this problem, we provide a family of algorithms for each number of allowed rounds of interaction $t$, as well as lower bounds showing that they are near-optimal for every $t$. Notably, our algorithms result in exponential improvements on the round complexity of previous methods.

preprint2020arXiv

Maximizing Determinants under Matroid Constraints

Given vectors $v_1,\dots,v_n\in\mathbb{R}^d$ and a matroid $M=([n],I)$, we study the problem of finding a basis $S$ of $M$ such that $\det(\sum_{i \in S}v_i v_i^\top)$ is maximized. This problem appears in a diverse set of areas such as experimental design, fair allocation of goods, network design, and machine learning. The current best results include an $e^{2k}$-estimation for any matroid of rank $k$ and a $(1+ε)^d$-approximation for a uniform matroid of rank $k\ge d+\frac dε$, where the rank $k\ge d$ denotes the desired size of the optimal set. Our main result is a new approximation algorithm with an approximation guarantee that depends only on the dimension $d$ of the vectors and not on the size $k$ of the output set. In particular, we show an $(O(d))^{d}$-estimation and an $(O(d))^{d^3}$-approximation for any matroid, giving a significant improvement over prior work when $k\gg d$. Our result relies on the existence of an optimal solution to a convex programming relaxation for the problem which has sparse support; in particular, no more than $O(d^2)$ variables of the solution have fractional values. The sparsity results rely on the interplay between the first-order optimality conditions for the convex program and matroid theory. We believe that the techniques introduced to show sparsity of optimal solutions to convex programs will be of independent interest. We also give a randomized algorithm that rounds a sparse fractional solution to a feasible integral solution to the original problem. To show the approximation guarantee, we utilize recent works on strongly log-concave polynomials and show new relationships between different convex programs studied for the problem. Finally, we use the estimation algorithm and sparsity results to give an efficient deterministic approximation algorithm with an approximation guarantee that depends solely on the dimension $d$.

preprint2020arXiv

On the Computational Complexity of Linear Discrepancy

Many problems in computer science and applied mathematics require rounding a vector $\mathbf{w}$ of fractional values lying in the interval $[0,1]$ to a binary vector $\mathbf{x}$ so that, for a given matrix $\mathbf{A}$, $\mathbf{A}\mathbf{x}$ is as close to $\mathbf{A}\mathbf{w}$ as possible. For example, this problem arises in LP rounding algorithms used to approximate $\mathsf{NP}$-hard optimization problems and in the design of uniformly distributed point sets for numerical integration. For a given matrix $\mathbf{A}$, the worst-case error over all choices of $\mathbf{w}$ incurred by the best possible rounding is measured by the linear discrepancy of $\mathbf{A}$, a quantity studied in discrepancy theory, and introduced by Lovasz, Spencer, and Vesztergombi (EJC, 1986). We initiate the study of the computational complexity of linear discrepancy. Our investigation proceeds in two directions: (1) proving hardness results and (2) finding both exact and approximate algorithms to evaluate the linear discrepancy of certain matrices. For (1), we show that linear discrepancy is $\mathsf{NP}$-hard. Thus we do not expect to find an efficient exact algorithm for the general case. Restricting our attention to matrices with a constant number of rows, we present a poly-time exact algorithm for matrices consisting of a single row and matrices with a constant number of rows and entries of bounded magnitude. We also present an exponential-time approximation algorithm for general matrices, and an algorithm that approximates linear discrepancy to within an exponential factor.

preprint2020arXiv

Private Query Release Assisted by Public Data

We study the problem of differentially private query release assisted by access to public data. In this problem, the goal is to answer a large class $\mathcal{H}$ of statistical queries with error no more than $α$ using a combination of public and private samples. The algorithm is required to satisfy differential privacy only with respect to the private samples. We study the limits of this task in terms of the private and public sample complexities. First, we show that we can solve the problem for any query class $\mathcal{H}$ of finite VC-dimension using only $d/α$ public samples and $\sqrt{p}d^{3/2}/α^2$ private samples, where $d$ and $p$ are the VC-dimension and dual VC-dimension of $\mathcal{H}$, respectively. In comparison, with only private samples, this problem cannot be solved even for simple query classes with VC-dimension one, and without any private samples, a larger public sample of size $d/α^2$ is needed. Next, we give sample complexity lower bounds that exhibit tight dependence on $p$ and $α$. For the class of decision stumps, we give a lower bound of $\sqrt{p}/α$ on the private sample complexity whenever the public sample size is less than $1/α^2$. Given our upper bounds, this shows that the dependence on $\sqrt{p}$ is necessary in the private sample complexity. We also give a lower bound of $1/α$ on the public sample complexity for a broad family of query classes, which by our upper bound, is tight in $α$.

preprint2016arXiv

Towards a Constructive Version of Banaszczyk's Vector Balancing Theorem

An important theorem of Banaszczyk (Random Structures & Algorithms `98) states that for any sequence of vectors of $\ell_2$ norm at most $1/5$ and any convex body $K$ of Gaussian measure $1/2$ in $\mathbb{R}^n$, there exists a signed combination of these vectors which lands inside $K$. A major open problem is to devise a constructive version of Banaszczyk's vector balancing theorem, i.e. to find an efficient algorithm which constructs the signed combination. We make progress towards this goal along several fronts. As our first contribution, we show an equivalence between Banaszczyk's theorem and the existence of $O(1)$-subgaussian distributions over signed combinations. For the case of symmetric convex bodies, our equivalence implies the existence of a universal signing algorithm (i.e. independent of the body), which simply samples from the subgaussian sign distribution and checks to see if the associated combination lands inside the body. For asymmetric convex bodies, we provide a novel recentering procedure, which allows us to reduce to the case where the body is symmetric. As our second main contribution, we show that the above framework can be efficiently implemented when the vectors have length $O(1/\sqrt{\log n})$, recovering Banaszczyk's results under this stronger assumption. More precisely, we use random walk techniques to produce the required $O(1)$-subgaussian signing distributions when the vectors have length $O(1/\sqrt{\log n})$, and use a stochastic gradient ascent method to implement the recentering procedure for asymmetric bodies.

preprint2015arXiv

An Improved Private Mechanism for Small Databases

We study the problem of answering a workload of linear queries $\mathcal{Q}$, on a database of size at most $n = o(|\mathcal{Q}|)$ drawn from a universe $\mathcal{U}$ under the constraint of (approximate) differential privacy. Nikolov, Talwar, and Zhang~\cite{NTZ} proposed an efficient mechanism that, for any given $\mathcal{Q}$ and $n$, answers the queries with average error that is at most a factor polynomial in $\log |\mathcal{Q}|$ and $\log |\mathcal{U}|$ worse than the best possible. Here we improve on this guarantee and give a mechanism whose competitiveness ratio is at most polynomial in $\log n$ and $\log |\mathcal{U}|$, and has no dependence on $|\mathcal{Q}|$. Our mechanism is based on the projection mechanism of Nikolov, Talwar, and Zhang, but in place of an ad-hoc noise distribution, we use a distribution which is in a sense optimal for the projection mechanism, and analyze it using convex duality and the restricted invertibility principle.

preprint2015arXiv

Factorization Norms and Hereditary Discrepancy

The $γ_2$ norm of a real $m\times n$ matrix $A$ is the minimum number $t$ such that the column vectors of $A$ are contained in a $0$-centered ellipsoid $E\subseteq\mathbb{R}^m$ which in turn is contained in the hypercube $[-t, t]^m$. We prove that this classical quantity approximates the \emph{hereditary discrepancy} $\mathrm{herdisc}\ A$ as follows: $γ_2(A) = {O(\log m)}\cdot \mathrm{herdisc}\ A$ and $\mathrm{herdisc}\ A = O(\sqrt{\log m}\,)\cdotγ_2(A) $. Since $γ_2$ is polynomial-time computable, this gives a polynomial-time approximation algorithm for hereditary discrepancy. Both inequalities are shown to be asymptotically tight. We then demonstrate on several examples the power of the $γ_2$ norm as a tool for proving lower and upper bounds in discrepancy theory. Most notably, we prove a new lower bound of $Ω(\log^{d-1} n)$ for the \emph{$d$-dimensional Tusnády problem}, asking for the combinatorial discrepancy of an $n$-point set in $\mathbb{R}^d$ with respect to axis-parallel boxes. For $d>2$, this improves the previous best lower bound, which was of order approximately $\log^{(d-1)/2}n$, and it comes close to the best known upper bound of $O(\log^{d+1/2}n)$, for which we also obtain a new, very simple proof.

preprint2015arXiv

On The Hereditary Discrepancy of Homogeneous Arithmetic Progressions

We show that the hereditary discrepancy of homogeneous arithmetic progressions is lower bounded by $n^{1/O(\log \log n)}$. This bound is tight up to the constant in the exponent. Our lower bound goes via proving an exponential lower bound on the discrepancy of set systems of subcubes of the boolean cube $\{0, 1\}^d$.

preprint2015arXiv

Randomized Rounding for the Largest Simplex Problem

The maximum volume $j$-simplex problem asks to compute the $j$-dimensional simplex of maximum volume inside the convex hull of a given set of $n$ points in $\mathbb{Q}^d$. We give a deterministic approximation algorithm for this problem which achieves an approximation ratio of $e^{j/2 + o(j)}$. The problem is known to be $\mathrm{NP}$-hard to approximate within a factor of $c^{j}$ for some constant $c > 1$. Our algorithm also gives a factor $e^{j + o(j)}$ approximation for the problem of finding the principal $j\times j$ submatrix of a rank $d$ positive semidefinite matrix with the largest determinant. We achieve our approximation by rounding solutions to a generalization of the $D$-optimal design problem, or, equivalently, the dual of an appropriate smallest enclosing ellipsoid problem. Our arguments give a short and simple proof of a restricted invertibility principle for determinants.

preprint2014arXiv

Approximating Hereditary Discrepancy via Small Width Ellipsoids

The Discrepancy of a hypergraph is the minimum attainable value, over two-colorings of its vertices, of the maximum absolute imbalance of any hyperedge. The Hereditary Discrepancy of a hypergraph, defined as the maximum discrepancy of a restriction of the hypergraph to a subset of its vertices, is a measure of its complexity. Lovasz, Spencer and Vesztergombi (1986) related the natural extension of this quantity to matrices to rounding algorithms for linear programs, and gave a determinant based lower bound on the hereditary discrepancy. Matousek (2011) showed that this bound is tight up to a polylogarithmic factor, leaving open the question of actually computing this bound. Recent work by Nikolov, Talwar and Zhang (2013) showed a polynomial time $\tilde{O}(\log^3 n)$-approximation to hereditary discrepancy, as a by-product of their work in differential privacy. In this paper, we give a direct simple $O(\log^{3/2} n)$-approximation algorithm for this problem. We show that up to this approximation factor, the hereditary discrepancy of a matrix $A$ is characterized by the optimal value of simple geometric convex program that seeks to minimize the largest $\ell_{\infty}$ norm of any point in a ellipsoid containing the columns of $A$. This characterization promises to be a useful tool in discrepancy theory.

preprint2014arXiv

Parallel Algorithms for Geometric Graph Problems

We give algorithms for geometric graph problems in the modern parallel models inspired by MapReduce. For example, for the Minimum Spanning Tree (MST) problem over a set of points in the two-dimensional space, our algorithm computes a $(1+ε)$-approximate MST. Our algorithms work in a constant number of rounds of communication, while using total space and communication proportional to the size of the data (linear space and near linear time algorithms). In contrast, for general graphs, achieving the same result for MST (or even connectivity) remains a challenging open problem, despite drawing significant attention in recent years. We develop a general algorithmic framework that, besides MST, also applies to Earth-Mover Distance (EMD) and the transportation cost problem. Our algorithmic framework has implications beyond the MapReduce model. For example it yields a new algorithm for computing EMD cost in the plane in near-linear time, $n^{1+o_ε(1)}$. We note that while recently Sharathkumar and Agarwal developed a near-linear time algorithm for $(1+ε)$-approximating EMD, our algorithm is fundamentally different, and, for example, also solves the transportation (cost) problem, raised as an open question in their work. Furthermore, our algorithm immediately gives a $(1+ε)$-approximation algorithm with $n^δ$ space in the streaming-with-sorting model with $1/δ^{O(1)}$ passes. As such, it is tempting to conjecture that the parallel models may also constitute a concrete playground in the quest for efficient algorithms for EMD (and other similar problems) in the vanilla streaming model, a well-known open problem.

preprint2013arXiv

Efficient Algorithms for Privately Releasing Marginals via Convex Relaxations

Consider a database of $n$ people, each represented by a bit-string of length $d$ corresponding to the setting of $d$ binary attributes. A $k$-way marginal query is specified by a subset $S$ of $k$ attributes, and a $|S|$-dimensional binary vector $β$ specifying their values. The result for this query is a count of the number of people in the database whose attribute vector restricted to $S$ agrees with $β$. Privately releasing approximate answers to a set of $k$-way marginal queries is one of the most important and well-motivated problems in differential privacy. Information theoretically, the error complexity of marginal queries is well-understood: the per-query additive error is known to be at least $Ω(\min\{\sqrt{n},d^{\frac{k}{2}}\})$ and at most $\tilde{O}(\min\{\sqrt{n} d^{1/4},d^{\frac{k}{2}}\})$. However, no polynomial time algorithm with error complexity as low as the information theoretic upper bound is known for small $n$. In this work we present a polynomial time algorithm that, for any distribution on marginal queries, achieves average error at most $\tilde{O}(\sqrt{n} d^{\frac{\lceil k/2 \rceil}{4}})$. This error bound is as good as the best known information theoretic upper bounds for $k=2$. This bound is an improvement over previous work on efficiently releasing marginals when $k$ is small and when error $o(n)$ is desirable. Using private boosting we are also able to give nearly matching worst-case error bounds. Our algorithms are based on the geometric techniques of Nikolov, Talwar, and Zhang. The main new ingredients are convex relaxations and careful use of the Frank-Wolfe algorithm for constrained convex minimization. To design our relaxations, we rely on the Grothendieck inequality from functional analysis.

preprint2013arXiv

Nearly Optimal Private Convolution

We study computing the convolution of a private input $x$ with a public input $h$, while satisfying the guarantees of $(ε, δ)$-differential privacy. Convolution is a fundamental operation, intimately related to Fourier Transforms. In our setting, the private input may represent a time series of sensitive events or a histogram of a database of confidential personal information. Convolution then captures important primitives including linear filtering, which is an essential tool in time series analysis, and aggregation queries on projections of the data. We give a nearly optimal algorithm for computing convolutions while satisfying $(ε, δ)$-differential privacy. Surprisingly, we follow the simple strategy of adding independent Laplacian noise to each Fourier coefficient and bounding the privacy loss using the composition theorem of Dwork, Rothblum, and Vadhan. We derive a closed form expression for the optimal noise to add to each Fourier coefficient using convex programming duality. Our algorithm is very efficient -- it is essentially no more computationally expensive than a Fast Fourier Transform. To prove near optimality, we use the recent discrepancy lowerbounds of Muthukrishnan and Nikolov and derive a spectral lower bound using a characterization of discrepancy in terms of determinants.

preprint2013arXiv

The Komlos Conjecture Holds for Vector Colorings

The Komlos conjecture in discrepancy theory states that for some constant K and for any m by n matrix A whose columns lie in the unit ball there exists a +/- 1 vector x such that the infinity norm of Ax is bounded above by K. This conjecture also implies the Beck-Fiala conjecture on the discrepancy of bounded degree hypergraphs. Here we prove a natural relaxation of the Komlos conjecture: if the columns of A are assigned unit real vectors rather than +/- 1 then the Komlos conjecture holds with K=1. Our result rules out the possibility of a counterexample to the conjecture based on semidefinite programming. It also opens the way to proving tighter efficient (polynomial-time computable) upper bounds for the conjecture using semidefinite programming techniques.

preprint2012arXiv

Optimal Private Halfspace Counting via Discrepancy

A range counting problem is specified by a set $P$ of size $|P| = n$ of points in $\mathbb{R}^d$, an integer weight $x_p$ associated to each point $p \in P$, and a range space ${\cal R} \subseteq 2^{P}$. Given a query range $R \in {\cal R}$, the target output is $R(\vec{x}) = \sum_{p \in R}{x_p}$. Range counting for different range spaces is a central problem in Computational Geometry. We study $(ε, δ)$-differentially private algorithms for range counting. Our main results are for the range space given by hyperplanes, that is, the halfspace counting problem. We present an $(ε, δ)$-differentially private algorithm for halfspace counting in $d$ dimensions which achieves $O(n^{1-1/d})$ average squared error. This contrasts with the $Ω(n)$ lower bound established by the classical result of Dinur and Nissim [PODS 2003] for arbitrary subset counting queries. We also show a matching lower bound on average squared error for any $(ε, δ)$-differentially private algorithm for halfspace counting. Both bounds are obtained using discrepancy theory. For the lower bound, we use a modified discrepancy measure and bound approximation of $(ε, δ)$-differentially private algorithms for range counting queries in terms of this discrepancy. We also relate the modified discrepancy measure to classical combinatorial discrepancy, which allows us to exploit known discrepancy lower bounds. This approach also yields a lower bound of $Ω((\log n)^{d-1})$ for $(ε, δ)$-differentially private orthogonal range counting in $d$ dimensions, the first known superconstant lower bound for this problem. For the upper bound, we use an approach inspired by partial coloring methods for proving discrepancy upper bounds, and obtain $(ε, δ)$-differentially private algorithms for range counting with polynomially bounded shatter function range spaces.

preprint2012arXiv

Private Decayed Sum Estimation under Continual Observation

In monitoring applications, recent data is more important than distant data. How does this affect privacy of data analysis? We study a general class of data analyses - computing predicate sums - with privacy. Formally, we study the problem of estimating predicate sums {\em privately}, for sliding windows (and other well-known decay models of data, i.e. exponential and polynomial decay). We extend the recently proposed continual privacy model of Dwork et al. We present algorithms for decayed sum which are $\eps$-differentially private, and are accurate. For window and exponential decay sums, our algorithms are accurate up to additive $1/\eps$ and polylog terms in the range of the computed function; for polynomial decay sums which are technically more challenging because partial solutions do not compose easily, our algorithms incur additional relative error. Further, we show lower bounds, tight within polylog factors and tight with respect to the dependence on the probability of error.

preprint2012arXiv

The Geometry of Differential Privacy: the Sparse and Approximate Cases

In this work, we study trade-offs between accuracy and privacy in the context of linear queries over histograms. This is a rich class of queries that includes contingency tables and range queries, and has been a focus of a long line of work. For a set of $d$ linear queries over a database $x \in \R^N$, we seek to find the differentially private mechanism that has the minimum mean squared error. For pure differential privacy, an $O(\log^2 d)$ approximation to the optimal mechanism is known. Our first contribution is to give an $O(\log^2 d)$ approximation guarantee for the case of $(\eps,δ)$-differential privacy. Our mechanism is simple, efficient and adds correlated Gaussian noise to the answers. We prove its approximation guarantee relative to the hereditary discrepancy lower bound of Muthukrishnan and Nikolov, using tools from convex geometry. We next consider this question in the case when the number of queries exceeds the number of individuals in the database, i.e. when $d > n \triangleq \|x\|_1$. It is known that better mechanisms exist in this setting. Our second main contribution is to give an $(\eps,δ)$-differentially private mechanism which is optimal up to a $\polylog(d,N)$ factor for any given query set $A$ and any given upper bound $n$ on $\|x\|_1$. This approximation is achieved by coupling the Gaussian noise addition approach with a linear regression step. We give an analogous result for the $\eps$-differential privacy setting. We also improve on the mean squared error upper bound for answering counting queries on a database of size $n$ by Blum, Ligett, and Roth, and match the lower bound implied by the work of Dinur and Nissim up to logarithmic factors. The connection between hereditary discrepancy and the privacy mechanism enables us to derive the first polylogarithmic approximation to the hereditary discrepancy of a matrix $A$.

preprint2011arXiv

A counterexample to Beck's conjecture on the discrepancy of three permutations

Given three permutations on the integers 1 through n, consider the set system consisting of each interval in each of the three permutations. Jozsef Beck conjectured (c. 1987) that the discrepancy of this set system is O(1). We give a counterexample to this conjecture: for any positive integer n = 3^k, we exhibit three permutations whose corresponding set system has discrepancy Omega(log(n)). Our counterexample is based on a simple recursive construction, and our proof of the discrepancy lower bound is by induction. This example also disproves a generalization of Beck's conjecture due to Spencer, Srinivasan and Tetali, who conjectured that a set system corresponding to l permutations has discrepancy O(sqrt(l)).

preprint2010arXiv

Limits of Approximation Algorithms: PCPs and Unique Games (DIMACS Tutorial Lecture Notes)

These are the lecture notes for the DIMACS Tutorial "Limits of Approximation Algorithms: PCPs and Unique Games" held at the DIMACS Center, CoRE Building, Rutgers University on 20-21 July, 2009. This tutorial was jointly sponsored by the DIMACS Special Focus on Hardness of Approximation, the DIMACS Special Focus on Algorithmic Foundations of the Internet, and the Center for Computational Intractability with support from the National Security Agency and the National Science Foundation. The speakers at the tutorial were Matthew Andrews, Sanjeev Arora, Moses Charikar, Prahladh Harsha, Subhash Khot, Dana Moshkovitz and Lisa Zhang. The sribes were Ashkan Aazami, Dev Desai, Igor Gorodezky, Geetha Jagannathan, Alexander S. Kulikov, Darakhshan J. Mir, Alantha Newman, Aleksandar Nikolov, David Pritchard and Gwen Spencer.

preprint2010arXiv

Pan-private Algorithms: When Memory Does Not Help

Consider updates arriving online in which the $t$th input is $(i_t,d_t)$, where $i_t$'s are thought of as IDs of users. Informally, a randomized function $f$ is {\em differentially private} with respect to the IDs if the probability distribution induced by $f$ is not much different from that induced by it on an input in which occurrences of an ID $j$ are replaced with some other ID $k$ Recently, this notion was extended to {\em pan-privacy} where the computation of $f$ retains differential privacy, even if the internal memory of the algorithm is exposed to the adversary (say by a malicious break-in or by fiat by the government). This is a strong notion of privacy, and surprisingly, for basic counting tasks such as distinct counts, heavy hitters and others, Dwork et al~\cite{dwork-pan} present pan-private algorithms with reasonable accuracy. The pan-private algorithms are nontrivial, and rely on sampling. We reexamine these basic counting tasks and show improved bounds. In particular, we estimate the distinct count $\Dt$ to within $(1\pm \eps)\Dt \pm O(\polylog m)$, where $m$ is the number of elements in the universe. This uses suitably noisy statistics on sketches known in the streaming literature. We also present the first known lower bounds for pan-privacy with respect to a single intrusion. Our lower bounds show that, even if allowed to work with unbounded memory, pan-private algorithms for distinct counts can not be significantly more accurate than our algorithms. Our lower bound uses noisy decoding. For heavy hitter counts, we present a pan private streaming algorithm that is accurate to within $O(k)$ in worst case; previously known bound for this problem is arbitrarily worse. An interesting aspect of our pan-private algorithms is that, they deliberately use very small (polylogarithmic) space and tend to be streaming algorithms, even though using more space is not forbidden.

Aleksandar Nikolov

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

Private Query Release via the Johnson-Lindenstrauss Transform

Locally Private Hypothesis Selection

Maximizing Determinants under Matroid Constraints

On the Computational Complexity of Linear Discrepancy

Private Query Release Assisted by Public Data

Towards a Constructive Version of Banaszczyk's Vector Balancing Theorem

An Improved Private Mechanism for Small Databases

Factorization Norms and Hereditary Discrepancy

On The Hereditary Discrepancy of Homogeneous Arithmetic Progressions

Randomized Rounding for the Largest Simplex Problem

Approximating Hereditary Discrepancy via Small Width Ellipsoids

Parallel Algorithms for Geometric Graph Problems

Efficient Algorithms for Privately Releasing Marginals via Convex Relaxations

Nearly Optimal Private Convolution

The Komlos Conjecture Holds for Vector Colorings

Optimal Private Halfspace Counting via Discrepancy

Private Decayed Sum Estimation under Continual Observation

The Geometry of Differential Privacy: the Sparse and Approximate Cases

A counterexample to Beck's conjecture on the discrepancy of three permutations

Limits of Approximation Algorithms: PCPs and Unique Games (DIMACS Tutorial Lecture Notes)

Pan-private Algorithms: When Memory Does Not Help