Source author record

Kunal Talwar

Kunal Talwar appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Machine Learning Cryptography and Security math.OC Computational Geometry math.CO Artificial Intelligence Computational Complexity Computer Science and Game Theory Computer Vision Discrete Mathematics Distributed, Parallel, and Cluster Computing math.NT

Catalog footprint

What is connected

29works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

FLAIR: Federated Learning Annotated Image Repository

Cross-device federated learning is an emerging machine learning (ML) paradigm where a large population of devices collectively train an ML model while the data remains on the devices. This research field has a unique set of practical challenges, and to systematically make advances, new datasets curated to be compatible with this paradigm are needed. Existing federated learning benchmarks in the image domain do not accurately capture the scale and heterogeneity of many real-world use cases. We introduce FLAIR, a challenging large-scale annotated image dataset for multi-label classification suitable for federated learning. FLAIR has 429,078 images from 51,414 Flickr users and captures many of the intricacies typically encountered in federated learning, such as heterogeneous user data and a long-tailed label distribution. We implement multiple baselines in different learning setups for different tasks on this dataset. We believe FLAIR can serve as a challenging benchmark for advancing the state-of-the art in federated learning. Dataset access and the code for the benchmark are available at \url{https://github.com/apple/ml-flair}.

preprint2022arXiv

Optimal Algorithms for Mean Estimation under Local Differential Privacy

We study the problem of mean estimation of $\ell_2$-bounded vectors under the constraint of local differential privacy. While the literature has a variety of algorithms that achieve the asymptotically optimal rates for this problem, the performance of these algorithms in practice can vary significantly due to varying (and often large) hidden constants. In this work, we investigate the question of designing the protocol with the smallest variance. We show that PrivUnit (Bhowmick et al. 2018) with optimized parameters achieves the optimal variance among a large family of locally private randomizers. To prove this result, we establish some properties of local randomizers, and use symmetrization arguments that allow us to write the optimal randomizer as the optimizer of a certain linear program. These structural results, which should extend to other problems, then allow us to show that the optimal randomizer belongs to the PrivUnit family. We also develop a new variant of PrivUnit based on the Gaussian distribution which is more amenable to mathematical analysis and enjoys the same optimality guarantees. This allows us to establish several useful properties on the exact constants of the optimal error as well as to numerically estimate these constants.

preprint2022arXiv

Private Frequency Estimation via Projective Geometry

In this work, we propose a new algorithm ProjectiveGeometryResponse (PGR) for locally differentially private (LDP) frequency estimation. For a universe size of $k$ and with $n$ users, our $\varepsilon$-LDP algorithm has communication cost $\lceil\log_2k\rceil$ bits in the private coin setting and $\varepsilon\log_2 e + O(1)$ in the public coin setting, and has computation cost $O(n + k\exp(\varepsilon) \log k)$ for the server to approximately reconstruct the frequency histogram, while achieving the state-of-the-art privacy-utility tradeoff. In many parameter settings used in practice this is a significant improvement over the $ O(n+k^2)$ computation cost that is achieved by the recent PI-RAPPOR algorithm (Feldman and Talwar; 2021). Our empirical evaluation shows a speedup of over 50x over PI-RAPPOR while using approximately 75x less memory for practically relevant parameter settings. In addition, the running time of our algorithm is within an order of magnitude of HadamardResponse (Acharya, Sun, and Zhang; 2019) and RecursiveHadamardResponse (Chen, Kairouz, and Ozgur; 2020) which have significantly worse reconstruction error. The error of our algorithm essentially matches that of the communication- and time-inefficient but utility-optimal SubsetSelection (SS) algorithm (Ye and Barg; 2017). Our new algorithm is based on using Projective Planes over a finite field to define a small collection of sets that are close to being pairwise independent and a dynamic programming algorithm for approximate histogram reconstruction on the server side. We also give an extension of PGR, which we call HybridProjectiveGeometryResponse, that allows trading off computation time with utility smoothly.

preprint2021arXiv

Cops, Robbers, and Threatening Skeletons: Padded Decomposition for Minor-Free Graphs

We prove that any graph excluding $K_r$ as a minor has can be partitioned into clusters of diameter at most $Δ$ while removing at most $O(r/Δ)$ fraction of the edges. This improves over the results of Fakcharoenphol and Talwar, who building on the work of Klein, Plotkin and Rao gave a partitioning that required to remove $O(r^2/Δ)$ fraction of the edges. Our result is obtained by a new approach to relate the topological properties (excluding a minor) of a graph to its geometric properties (the induced shortest path metric). Specifically, we show that techniques used by Andreae in his investigation of the cops-and-robbers game on excluded-minor graphs can be used to construct padded decompositions of the metrics induced by such graphs. In particular, we get probabilistic partitions with padding parameter $O(r)$ and strong-diameter partitions with padding parameter $O(r^2)$ for $K_r$-free graphs, padding $O(k)$ for graphs with treewidth $k$, and padding $O(\log g)$ for graphs with genus $g$.

preprint2021arXiv

Lossless Compression of Efficient Private Local Randomizers

Locally Differentially Private (LDP) Reports are commonly used for collection of statistics and machine learning in the federated setting. In many cases the best known LDP algorithms require sending prohibitively large messages from the client device to the server (such as when constructing histograms over large domain or learning a high-dimensional model). This has led to significant efforts on reducing the communication cost of LDP algorithms. At the same time LDP reports are known to have relatively little information about the user's data due to randomization. Several schemes are known that exploit this fact to design low-communication versions of LDP algorithm but all of them do so at the expense of a significant loss in utility. Here we demonstrate a general approach that, under standard cryptographic assumptions, compresses every efficient LDP algorithm with negligible loss in privacy and utility guarantees. The practical implication of our result is that in typical applications the message can be compressed to the size of the server's pseudo-random generator seed. More generally, we relate the properties of an LDP randomizer to the power of a pseudo-random generator that suffices for compressing the LDP randomizer. From this general approach we derive low-communication algorithms for the problems of frequency estimation and high-dimensional mean estimation. Our algorithms are simpler and more accurate than existing low-communication LDP algorithms for these well-studied problems.

preprint2021arXiv

Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$ Geometry

Stochastic convex optimization over an $\ell_1$-bounded domain is ubiquitous in machine learning applications such as LASSO but remains poorly understood when learning with differential privacy. We show that, up to logarithmic factors the optimal excess population loss of any $(\varepsilon,δ)$-differentially private optimizer is $\sqrt{\log(d)/n} + \sqrt{d}/\varepsilon n.$ The upper bound is based on a new algorithm that combines the iterative localization approach of~\citet{FeldmanKoTa20} with a new analysis of private regularized mirror descent. It applies to $\ell_p$ bounded domains for $p\in [1,2]$ and queries at most $n^{3/2}$ gradients improving over the best previously known algorithm for the $\ell_2$ case which needs $n^2$ gradients. Further, we show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $\sqrt{\log(d)/n} + (\log(d)/\varepsilon n)^{2/3}.$ This bound is achieved by a new variance-reduced version of the Frank-Wolfe algorithm that requires just a single pass over the data. We also show that the lower bound in this case is the minimum of the two rates mentioned above.

preprint2020arXiv

Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity

Sensitive statistics are often collected across sets of users, with repeated collection of reports done over time. For example, trends in users' private preferences or software usage may be monitored via such reports. We study the collection of such statistics in the local differential privacy (LDP) model, and describe an algorithm whose privacy cost is polylogarithmic in the number of changes to a user's value. More fundamentally---by building on anonymity of the users' reports---we also demonstrate how the privacy cost of our LDP algorithm can actually be much lower when viewed in the central model of differential privacy. We show, via a new and general privacy amplification technique, that any permutation-invariant algorithm satisfying $\varepsilon$-local differential privacy will satisfy $(O(\varepsilon \sqrt{\log(1/δ)/n}), δ)$-central differential privacy. By this, we explain how the high noise and $\sqrt{n}$ overhead of LDP protocols is a consequence of them being significantly more private in the central model. As a practical corollary, our results imply that several LDP-based industrial deployments may have much lower privacy cost than their advertised $\varepsilon$ would indicate---at least if reports are anonymized.

preprint2020arXiv

Encode, Shuffle, Analyze Privacy Revisited: Formalizations and Empirical Evaluation

Recently, a number of approaches and techniques have been introduced for reporting software statistics with strong privacy guarantees. These range from abstract algorithms to comprehensive systems with varying assumptions and built upon local differential privacy mechanisms and anonymity. Based on the Encode-Shuffle-Analyze (ESA) framework, notable results formally clarified large improvements in privacy guarantees without loss of utility by making reports anonymous. However, these results either comprise of systems with seemingly disparate mechanisms and attack models, or formal statements with little guidance to practitioners. Addressing this, we provide a formal treatment and offer prescriptive guidelines for privacy-preserving reporting with anonymity. We revisit the ESA framework with a simple, abstract model of attackers as well as assumptions covering it and other proposed systems of anonymity. In light of new formal privacy bounds, we examine the limitations of sketch-based encodings and ESA mechanisms such as data-dependent crowds. We also demonstrate how the ESA notion of fragmentation (reporting data aspects in separate, unlinkable messages) improves privacy/utility tradeoffs both in terms of local and central differential-privacy guarantees. Finally, to help practitioners understand the applicability and limitations of privacy-preserving reporting, we report on a large number of empirical experiments. We use real-world datasets with heavy-tailed or near-flat distributions, which pose the greatest difficulty for our techniques; in particular, we focus on data drawn from images that can be easily visualized in a way that highlights reconstruction errors. Showing the promise of the approach, and of independent interest, we also report on experiments using anonymous, privacy-preserving reporting to train high-accuracy deep neural networks on standard tasks---MNIST and CIFAR-10.

preprint2020arXiv

Private Stochastic Convex Optimization: Optimal Rates in Linear Time

We study differentially private (DP) algorithms for stochastic convex optimization: the problem of minimizing the population loss given i.i.d. samples from a distribution over convex loss functions. A recent work of Bassily et al. (2019) has established the optimal bound on the excess population loss achievable given $n$ samples. Unfortunately, their algorithm achieving this bound is relatively inefficient: it requires $O(\min\{n^{3/2}, n^{5/2}/d\})$ gradient computations, where $d$ is the dimension of the optimization problem. We describe two new techniques for deriving DP convex optimization algorithms both achieving the optimal bound on excess loss and using $O(\min\{n, n^2/d\})$ gradient computations. In particular, the algorithms match the running time of the optimal non-private algorithms. The first approach relies on the use of variable batch sizes and is analyzed using the privacy amplification by iteration technique of Feldman et al. (2018). The second approach is based on a general reduction to the problem of localizing an approximately optimal solution with differential privacy. Such localization, in turn, can be achieved using existing (non-private) uniformly stable optimization algorithms. As in the earlier work, our algorithms require a mild smoothness assumption. We also give a linear-time algorithm achieving the optimal bound on the excess loss for the strongly convex case, as well as a faster algorithm for the non-smooth case.

preprint2020arXiv

Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses

Uniform stability is a notion of algorithmic stability that bounds the worst case change in the model output by the algorithm when a single data point in the dataset is replaced. An influential work of Hardt et al. (2016) provides strong upper bounds on the uniform stability of the stochastic gradient descent (SGD) algorithm on sufficiently smooth convex losses. These results led to important progress in understanding of the generalization properties of SGD and several applications to differentially private convex optimization for smooth losses. Our work is the first to address uniform stability of SGD on {\em nonsmooth} convex losses. Specifically, we provide sharp upper and lower bounds for several forms of SGD and full-batch GD on arbitrary Lipschitz nonsmooth convex losses. Our lower bounds show that, in the nonsmooth case, (S)GD can be inherently less stable than in the smooth case. On the other hand, our upper bounds show that (S)GD is sufficiently stable for deriving new and useful bounds on generalization error. Most notably, we obtain the first dimension-independent generalization bounds for multi-pass SGD in the nonsmooth case. In addition, our bounds allow us to derive a new algorithm for differentially private nonsmooth stochastic convex optimization with optimal excess population risk. Our algorithm is simpler and more efficient than the best known algorithm for the nonsmooth case Feldman et al. (2020).

preprint2016arXiv

LAST but not Least: Online Spanners for Buy-at-Bulk

The online (uniform) buy-at-bulk network design problem asks us to design a network, where the edge-costs exhibit economy-of-scale. Previous approaches to this problem used tree- embeddings, giving us randomized algorithms. Moreover, the optimal results with a logarithmic competitive ratio requires the metric on which the network is being built to be known up-front; the competitive ratios then depend on the size of this metric (which could be much larger than the number of terminals that arrive). We consider the buy-at-bulk problem in the least restrictive model where the metric is not known in advance, but revealed in parts along with the demand points seeking connectivity arriving online. For the single sink buy-at-bulk problem, we give a deterministic online algorithm with competitive ratio that is logarithmic in k, the number of terminals that have arrived, matching the lower bound known even for the online Steiner tree problem. In the oblivious case when the buy-at-bulk function used to compute the edge-costs of the network is not known in advance (but is the same across all edges), we give a deterministic algorithm with competitive ratio polylogarithmic in k, the number of terminals. At the heart of our algorithms are optimal constructions for online Light Approximate Shortest-path Trees (LASTs) and spanners, and their variants. We give constructions that have optimal trade-offs in terms of cost and stretch. We also define and give constructions for a new notion of LASTs where the set of roots (in addition to the points) expands over time. We expect these techniques will find applications in other online network-design problems.

preprint2016arXiv

Private Empirical Risk Minimization Beyond the Worst Case: The Effect of the Constraint Set Geometry

Empirical Risk Minimization (ERM) is a standard technique in machine learning, where a model is selected by minimizing a loss function over constraint set. When the training dataset consists of private information, it is natural to use a differentially private ERM algorithm, and this problem has been the subject of a long line of work started with Chaudhuri and Monteleoni 2008. A private ERM algorithm outputs an approximate minimizer of the loss function and its error can be measured as the difference from the optimal value of the loss function. When the constraint set is arbitrary, the required error bounds are fairly well understood \cite{BassilyST14}. In this work, we show that the geometric properties of the constraint set can be used to derive significantly better results. Specifically, we show that a differentially private version of Mirror Descent leads to error bounds of the form $\tilde{O}(G_{\mathcal{C}}/n)$ for a lipschitz loss function, improving on the $\tilde{O}(\sqrt{p}/n)$ bounds in Bassily, Smith and Thakurta 2014. Here $p$ is the dimensionality of the problem, $n$ is the number of data points in the training set, and $G_{\mathcal{C}}$ denotes the Gaussian width of the constraint set that we optimize over. We show similar improvements for strongly convex functions, and for smooth functions. In addition, we show that when the loss function is Lipschitz with respect to the $\ell_1$ norm and $\mathcal{C}$ is $\ell_1$-bounded, a differentially private version of the Frank-Wolfe algorithm gives error bounds of the form $\tilde{O}(n^{-2/3})$. This captures the important and common case of sparse linear regression (LASSO), when the data $x_i$ satisfies $|x_i|_{\infty} \leq 1$ and we optimize over the $\ell_1$ ball. We show new lower bounds for this setting, that together with known bounds, imply that all our upper bounds are tight.

preprint2016arXiv

Sketching and Neural Networks

High-dimensional sparse data present computational and statistical challenges for supervised learning. We propose compact linear sketches for reducing the dimensionality of the input, followed by a single layer neural network. We show that any sparse polynomial function can be computed, on nearly all sparse binary vectors, by a single layer neural network that takes a compact sketch of the vector as input. Consequently, when a set of sparse binary vectors is approximately separable using a sparse polynomial, there exists a single-layer neural network that takes a short sketch as input and correctly classifies nearly all the points. Previous work has proposed using sketches to reduce dimensionality while preserving the hypothesis class. However, the sketch size has an exponential dependence on the degree in the case of polynomial classifiers. In stark contrast, our approach of using improper learning, using a larger hypothesis class allows the sketch size to have a logarithmic dependence on the degree. Even in the linear case, our approach allows us to improve on the pesky $O({1}/{γ^2})$ dependence of random projections, on the margin $γ$. We empirically show that our approach leads to more compact neural networks than related methods such as feature hashing at equal or better performance.

preprint2016arXiv

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.

preprint2015arXiv

Factorization Norms and Hereditary Discrepancy

The $γ_2$ norm of a real $m\times n$ matrix $A$ is the minimum number $t$ such that the column vectors of $A$ are contained in a $0$-centered ellipsoid $E\subseteq\mathbb{R}^m$ which in turn is contained in the hypercube $[-t, t]^m$. We prove that this classical quantity approximates the \emph{hereditary discrepancy} $\mathrm{herdisc}\ A$ as follows: $γ_2(A) = {O(\log m)}\cdot \mathrm{herdisc}\ A$ and $\mathrm{herdisc}\ A = O(\sqrt{\log m}\,)\cdotγ_2(A) $. Since $γ_2$ is polynomial-time computable, this gives a polynomial-time approximation algorithm for hereditary discrepancy. Both inequalities are shown to be asymptotically tight. We then demonstrate on several examples the power of the $γ_2$ norm as a tool for proving lower and upper bounds in discrepancy theory. Most notably, we prove a new lower bound of $Ω(\log^{d-1} n)$ for the \emph{$d$-dimensional Tusnády problem}, asking for the combinatorial discrepancy of an $n$-point set in $\mathbb{R}^d$ with respect to axis-parallel boxes. For $d>2$, this improves the previous best lower bound, which was of order approximately $\log^{(d-1)/2}n$, and it comes close to the best known upper bound of $O(\log^{d+1/2}n)$, for which we also obtain a new, very simple proof.

preprint2015arXiv

On The Hereditary Discrepancy of Homogeneous Arithmetic Progressions

We show that the hereditary discrepancy of homogeneous arithmetic progressions is lower bounded by $n^{1/O(\log \log n)}$. This bound is tight up to the constant in the exponent. Our lower bound goes via proving an exponential lower bound on the discrepancy of set systems of subcubes of the boolean cube $\{0, 1\}^d$.

preprint2015arXiv

Smooth Boolean functions are easy: efficient algorithms for low-sensitivity functions

A natural measure of smoothness of a Boolean function is its sensitivity (the largest number of Hamming neighbors of a point which differ from it in function value). The structure of smooth or equivalently low-sensitivity functions is still a mystery. A well-known conjecture states that every such Boolean function can be computed by a shallow decision tree. While this conjecture implies that smooth functions are easy to compute in the simplest computational model, to date no non-trivial upper bounds were known for such functions in any computational model, including unrestricted Boolean circuits. Even a bound on the description length of such functions better than the trivial $2^n$ does not seem to have been known. In this work, we establish the first computational upper bounds on smooth Boolean functions: 1) We show that every sensitivity s function is uniquely specified by its values on a Hamming ball of radius 2s. We use this to show that such functions can be computed by circuits of size $n^{O(s)}$. 2) We show that sensitivity s functions satisfy a strong pointwise noise-stability guarantee for random noise of rate O(1/s). We use this to show that these functions have formulas of depth O(s log n). 3) We show that sensitivity s functions can be (locally) self-corrected from worst-case noise of rate $\exp(-O(s))$. All our results are simple, and follow rather directly from (variants of) the basic fact that that the function value at few points in small neighborhoods of a given point determine its function value via a majority vote. Our results confirm various consequences of the conjecture. They may be viewed as providing a new form of evidence towards its validity, as well as new directions towards attacking it.

preprint2014arXiv

Approximating Hereditary Discrepancy via Small Width Ellipsoids

The Discrepancy of a hypergraph is the minimum attainable value, over two-colorings of its vertices, of the maximum absolute imbalance of any hyperedge. The Hereditary Discrepancy of a hypergraph, defined as the maximum discrepancy of a restriction of the hypergraph to a subset of its vertices, is a measure of its complexity. Lovasz, Spencer and Vesztergombi (1986) related the natural extension of this quantity to matrices to rounding algorithms for linear programs, and gave a determinant based lower bound on the hereditary discrepancy. Matousek (2011) showed that this bound is tight up to a polylogarithmic factor, leaving open the question of actually computing this bound. Recent work by Nikolov, Talwar and Zhang (2013) showed a polynomial time $\tilde{O}(\log^3 n)$-approximation to hereditary discrepancy, as a by-product of their work in differential privacy. In this paper, we give a direct simple $O(\log^{3/2} n)$-approximation algorithm for this problem. We show that up to this approximation factor, the hereditary discrepancy of a matrix $A$ is characterized by the optimal value of simple geometric convex program that seeks to minimize the largest $\ell_{\infty}$ norm of any point in a ellipsoid containing the columns of $A$. This characterization promises to be a useful tool in discrepancy theory.

preprint2014arXiv

Changing Bases: Multistage Optimization for Matroids and Matchings

This paper is motivated by the fact that many systems need to be maintained continually while the underlying costs change over time. The challenge is to continually maintain near-optimal solutions to the underlying optimization problems, without creating too much churn in the solution itself. We model this as a multistage combinatorial optimization problem where the input is a sequence of cost functions (one for each time step); while we can change the solution from step to step, we incur an additional cost for every such change. We study the multistage matroid maintenance problem, where we need to maintain a base of a matroid in each time step under the changing cost functions and acquisition costs for adding new elements. The online version of this problem generalizes online paging. E.g., given a graph, we need to maintain a spanning tree $T_t$ at each step: we pay $c_t(T_t)$ for the cost of the tree at time $t$, and also $| T_t\setminus T_{t-1} |$ for the number of edges changed at this step. Our main result is an $O(\log m \log r)$-approximation, where $m$ is the number of elements/edges and $r$ is the rank of the matroid. We also give an $O(\log m)$ approximation for the offline version of the problem. These bounds hold when the acquisition costs are non-uniform, in which caseboth these results are the best possible unless P=NP. We also study the perfect matching version of the problem, where we must maintain a perfect matching at each step under changing cost functions and costs for adding new elements. Surprisingly, the hardness drastically increases: for any constant $ε>0$, there is no $O(n^{1-ε})$-approximation to the multistage matching maintenance problem, even in the offline case.

preprint2014arXiv

Consistent Weighted Sampling Made Fast, Small, and Easy

Document sketching using Jaccard similarity has been a workable effective technique in reducing near-duplicates in Web page and image search results, and has also proven useful in file system synchronization, compression and learning applications. Min-wise sampling can be used to derive an unbiased estimator for Jaccard similarity and taking a few hundred independent consistent samples leads to compact sketches which provide good estimates of pairwise-similarity. Subsequent works extended this technique to weighted sets and show how to produce samples with only a constant number of hash evaluations for any element, independent of its weight. Another improvement by Li et al. shows how to speedup sketch computations by computing many (near-)independent samples in one shot. Unfortunately this latter improvement works only for the unweighted case. In this paper we give a simple, fast and accurate procedure which reduces weighted sets to unweighted sets with small impact on the Jaccard similarity. This leads to compact sketches consisting of many (near-)independent weighted samples which can be computed with just a small constant number of hash function evaluations per weighted element. The size of the produced unweighted set is furthermore a tunable parameter which enables us to run the unweighted scheme of Li et al. in the regime where it is most efficient. Even when the sets involved are unweighted, our approach gives a simple solution to the densification problem that other works attempted to address. Unlike previously known schemes, ours does not result in an unbiased estimator. However, we prove that the bias introduced by our reduction is negligible and that the standard deviation is comparable to the unweighted case. We also empirically evaluate our scheme and show that it gives significant gains in computational efficiency, without any measurable loss in accuracy.

preprint2014arXiv

Vertex Sparsifiers: New Results from Old Techniques

Given a capacitated graph $G = (V,E)$ and a set of terminals $K \subseteq V$, how should we produce a graph $H$ only on the terminals $K$ so that every (multicommodity) flow between the terminals in $G$ could be supported in $H$ with low congestion, and vice versa? (Such a graph $H$ is called a flow-sparsifier for $G$.) What if we want $H$ to be a "simple" graph? What if we allow $H$ to be a convex combination of simple graphs? Improving on results of Moitra [FOCS 2009] and Leighton and Moitra [STOC 2010], we give efficient algorithms for constructing: (a) a flow-sparsifier $H$ that maintains congestion up to a factor of $O(\log k/\log \log k)$, where $k = |K|$, (b) a convex combination of trees over the terminals $K$ that maintains congestion up to a factor of $O(\log k)$, and (c) for a planar graph $G$, a convex combination of planar graphs that maintains congestion up to a constant factor. This requires us to give a new algorithm for the 0-extension problem, the first one in which the preimages of each terminal are connected in $G$. Moreover, this result extends to minor-closed families of graphs. Our improved bounds immediately imply improved approximation guarantees for several terminal-based cut and ordering problems.

preprint2013arXiv

Balanced Allocations: A Simple Proof for the Heavily Loaded Case

We provide a relatively simple proof that the expected gap between the maximum load and the average load in the two choice process is bounded by $(1+o(1))\log \log n$, irrespective of the number of balls thrown. The theorem was first proven by Berenbrink et al. Their proof uses heavy machinery from Markov-Chain theory and some of the calculations are done using computers. In this manuscript we provide a significantly simpler proof that is not aided by computers and is self contained. The simplification comes at a cost of weaker bounds on the low order terms and a weaker tail bound for the probability of deviating from the expectation.

preprint2013arXiv

Efficient Algorithms for Privately Releasing Marginals via Convex Relaxations

Consider a database of $n$ people, each represented by a bit-string of length $d$ corresponding to the setting of $d$ binary attributes. A $k$-way marginal query is specified by a subset $S$ of $k$ attributes, and a $|S|$-dimensional binary vector $β$ specifying their values. The result for this query is a count of the number of people in the database whose attribute vector restricted to $S$ agrees with $β$. Privately releasing approximate answers to a set of $k$-way marginal queries is one of the most important and well-motivated problems in differential privacy. Information theoretically, the error complexity of marginal queries is well-understood: the per-query additive error is known to be at least $Ω(\min\{\sqrt{n},d^{\frac{k}{2}}\})$ and at most $\tilde{O}(\min\{\sqrt{n} d^{1/4},d^{\frac{k}{2}}\})$. However, no polynomial time algorithm with error complexity as low as the information theoretic upper bound is known for small $n$. In this work we present a polynomial time algorithm that, for any distribution on marginal queries, achieves average error at most $\tilde{O}(\sqrt{n} d^{\frac{\lceil k/2 \rceil}{4}})$. This error bound is as good as the best known information theoretic upper bounds for $k=2$. This bound is an improvement over previous work on efficiently releasing marginals when $k$ is small and when error $o(n)$ is desirable. Using private boosting we are also able to give nearly matching worst-case error bounds. Our algorithms are based on the geometric techniques of Nikolov, Talwar, and Zhang. The main new ingredients are convex relaxations and careful use of the Frank-Wolfe algorithm for constrained convex minimization. To design our relaxations, we rely on the Grothendieck inequality from functional analysis.

preprint2013arXiv

Random Rates for 0-Extension and Low-Diameter Decompositions

Consider the problem of partitioning an arbitrary metric space into pieces of diameter at most Δ, such every pair of points is separated with relatively low probability. We propose a rate-based algorithm inspired by multiplicatively-weighted Voronoi diagrams, and prove it has optimal trade-offs. This also gives us another logarithmic approximation algorithm for the 0-extension problem.

preprint2013arXiv

Sparsest Cut on Bounded Treewidth Graphs: Algorithms and Hardness Results

We give a 2-approximation algorithm for Non-Uniform Sparsest Cut that runs in time $n^{O(k)}$, where $k$ is the treewidth of the graph. This improves on the previous $2^{2^k}$-approximation in time $\poly(n) 2^{O(k)}$ due to Chlamtáč et al. To complement this algorithm, we show the following hardness results: If the Non-Uniform Sparsest Cut problem has a $ρ$-approximation for series-parallel graphs (where $ρ\geq 1$), then the Max Cut problem has an algorithm with approximation factor arbitrarily close to $1/ρ$. Hence, even for such restricted graphs (which have treewidth 2), the Sparsest Cut problem is NP-hard to approximate better than $17/16 - ε$ for $ε> 0$; assuming the Unique Games Conjecture the hardness becomes $1/α_{GW} - ε$. For graphs with large (but constant) treewidth, we show a hardness result of $2 - ε$ assuming the Unique Games Conjecture. Our algorithm rounds a linear program based on (a subset of) the Sherali-Adams lift of the standard Sparsest Cut LP. We show that even for treewidth-2 graphs, the LP has an integrality gap close to 2 even after polynomially many rounds of Sherali-Adams. Hence our approach cannot be improved even on such restricted graphs without using a stronger relaxation.

preprint2012arXiv

On Privacy-Preserving Histograms

We advance the approach initiated by Chawla et al. for sanitizing (census) data so as to preserve the privacy of respondents while simultaneously extracting "useful" statistical information. First, we extend the scope of their techniques to a broad and rich class of distributions, specifically, mixtures of highdimensional balls, spheres, Gaussians, and other "nice" distributions. Second, we randomize the histogram constructions to preserve spatial characteristics of the data, allowing us to approximate various quantities of interest, e.g., cost of the minimum spanning tree on the data, in a privacy-preserving fashion.

preprint2012arXiv

The Geometry of Differential Privacy: the Sparse and Approximate Cases

In this work, we study trade-offs between accuracy and privacy in the context of linear queries over histograms. This is a rich class of queries that includes contingency tables and range queries, and has been a focus of a long line of work. For a set of $d$ linear queries over a database $x \in \R^N$, we seek to find the differentially private mechanism that has the minimum mean squared error. For pure differential privacy, an $O(\log^2 d)$ approximation to the optimal mechanism is known. Our first contribution is to give an $O(\log^2 d)$ approximation guarantee for the case of $(\eps,δ)$-differential privacy. Our mechanism is simple, efficient and adds correlated Gaussian noise to the answers. We prove its approximation guarantee relative to the hereditary discrepancy lower bound of Muthukrishnan and Nikolov, using tools from convex geometry. We next consider this question in the case when the number of queries exceeds the number of individuals in the database, i.e. when $d > n \triangleq \|x\|_1$. It is known that better mechanisms exist in this setting. Our second main contribution is to give an $(\eps,δ)$-differentially private mechanism which is optimal up to a $\polylog(d,N)$ factor for any given query set $A$ and any given upper bound $n$ on $\|x\|_1$. This approximation is achieved by coupling the Gaussian noise addition approach with a linear regression step. We give an analogous result for the $\eps$-differential privacy setting. We also improve on the mean squared error upper bound for answering counting queries on a database of size $n$ by Blum, Ligett, and Roth, and match the lower bound implied by the work of Dinur and Nissim up to logarithmic factors. The connection between hereditary discrepancy and the privacy mechanism enables us to derive the first polylogarithmic approximation to the hereditary discrepancy of a matrix $A$.

preprint2010arXiv

Constrained Non-Monotone Submodular Maximization: Offline and Secretary Algorithms

Constrained submodular maximization problems have long been studied, with near-optimal results known under a variety of constraints when the submodular function is monotone. The case of non-monotone submodular maximization is less understood: the first approximation algorithms even for the unconstrainted setting were given by Feige et al. (FOCS '07). More recently, Lee et al. (STOC '09, APPROX '09) show how to approximately maximize non-monotone submodular functions when the constraints are given by the intersection of p matroid constraints; their algorithm is based on local-search procedures that consider p-swaps, and hence the running time may be n^Omega(p), implying their algorithm is polynomial-time only for constantly many matroids. In this paper, we give algorithms that work for p-independence systems (which generalize constraints given by the intersection of p matroids), where the running time is poly(n,p). Our algorithm essentially reduces the non-monotone maximization problem to multiple runs of the greedy algorithm previously used in the monotone case. Our idea of using existing algorithms for monotone functions to solve the non-monotone case also works for maximizing a submodular function with respect to a knapsack constraint: we get a simple greedy-based constant-factor approximation for this problem. With these simpler algorithms, we are able to adapt our approach to constrained non-monotone submodular maximization to the (online) secretary setting, where elements arrive one at a time in random order, and the algorithm must make irrevocable decisions about whether or not to select each element as it arrives. We give constant approximations in this secretary setting when the algorithm is constrained subject to a uniform matroid or a partition matroid, and give an O(log k) approximation when it is constrained by a general matroid of rank k.

preprint2010arXiv

Lower Bounds on Near Neighbor Search via Metric Expansion

In this paper we show how the complexity of performing nearest neighbor (NNS) search on a metric space is related to the expansion of the metric space. Given a metric space we look at the graph obtained by connecting every pair of points within a certain distance $r$ . We then look at various notions of expansion in this graph relating them to the cell probe complexity of NNS for randomized and deterministic, exact and approximate algorithms. For example if the graph has node expansion $Φ$ then we show that any deterministic $t$-probe data structure for $n$ points must use space $S$ where $(St/n)^t > Φ$. We show similar results for randomized algorithms as well. These relationships can be used to derive most of the known lower bounds in the well known metric spaces such as $l_1$, $l_2$, $l_\infty$ by simply computing their expansion. In the process, we strengthen and generalize our previous results (FOCS 2008). Additionally, we unify the approach in that work and the communication complexity based approach. Our work reduces the problem of proving cell probe lower bounds of near neighbor search to computing the appropriate expansion parameter. In our results, as in all previous results, the dependence on $t$ is weak; that is, the bound drops exponentially in $t$. We show a much stronger (tight) time-space tradeoff for the class of dynamic low contention data structures. These are data structures that supports updates in the data set and that do not look up any single cell too often.

Kunal Talwar

What is connected

Connect this record

See the researcher in context

Building this map preview

29 published item(s)

FLAIR: Federated Learning Annotated Image Repository

Optimal Algorithms for Mean Estimation under Local Differential Privacy

Private Frequency Estimation via Projective Geometry

Cops, Robbers, and Threatening Skeletons: Padded Decomposition for Minor-Free Graphs

Lossless Compression of Efficient Private Local Randomizers

Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$ Geometry

Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity

Encode, Shuffle, Analyze Privacy Revisited: Formalizations and Empirical Evaluation

Private Stochastic Convex Optimization: Optimal Rates in Linear Time

Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses

LAST but not Least: Online Spanners for Buy-at-Bulk

Private Empirical Risk Minimization Beyond the Worst Case: The Effect of the Constraint Set Geometry

Sketching and Neural Networks

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

Factorization Norms and Hereditary Discrepancy

On The Hereditary Discrepancy of Homogeneous Arithmetic Progressions

Smooth Boolean functions are easy: efficient algorithms for low-sensitivity functions

Approximating Hereditary Discrepancy via Small Width Ellipsoids

Changing Bases: Multistage Optimization for Matroids and Matchings

Consistent Weighted Sampling Made Fast, Small, and Easy

Vertex Sparsifiers: New Results from Old Techniques

Balanced Allocations: A Simple Proof for the Heavily Loaded Case

Efficient Algorithms for Privately Releasing Marginals via Convex Relaxations

Random Rates for 0-Extension and Low-Diameter Decompositions

Sparsest Cut on Bounded Treewidth Graphs: Algorithms and Hardness Results

On Privacy-Preserving Histograms

The Geometry of Differential Privacy: the Sparse and Approximate Cases

Constrained Non-Monotone Submodular Maximization: Offline and Secretary Algorithms

Lower Bounds on Near Neighbor Search via Metric Expansion