Source author record

Adam Smith

Adam Smith appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

29works

20topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Variational quantum eigensolver for chemical molecules

Solving interacting multi-particle systems is a central challenge in quantum chemistry and condensed matter physics. In this work, we investigate the computation of ground states and ground-state energies for the He-H+ and H2O molecules using quantum computing techniques. We employ the variational quantum eigensolver (VQE), implemented both on a quantum computer simulator and on an IBM quantum device. The resulting energies are benchmarked against exact ground-state energies obtained via classical methods. Simulations of the H2O molecule were performed on Nottingham's High Performance Computing (HPC) facilities.

preprint2023arXiv

Improved Differential Privacy for SGD via Optimal Private Linear Operators on Adaptive Streams

Motivated by recent applications requiring differential privacy over adaptive streams, we investigate the question of optimal instantiations of the matrix mechanism in this setting. We prove fundamental theoretical results on the applicability of matrix factorizations to adaptive streams, and provide a parameter-free fixed-point algorithm for computing optimal factorizations. We instantiate this framework with respect to concrete matrices which arise naturally in machine learning, and train user-level differentially private models with the resulting optimal mechanisms, yielding significant improvements in a notable problem in federated learning with user-level differential privacy.

preprint2022arXiv

Crossing a topological phase transition with a quantum computer

Quantum computers promise to perform computations beyond the reach of modern computers with profound implications for scientific research. Due to remarkable technological advances, small scale devices are now becoming available for use. One of the most apparent applications for such a device is the study of complex many-body quantum systems, where classical computers are unable to deal with the generic exponential complexity of quantum states. Even zero-temperature equilibrium phases of matter and the transitions between them have yet to be fully classified, with topologically protected phases presenting major difficulties. We construct and measure a continuously parametrized family of states crossing a symmetry protected topological phase transition on the IBM Q quantum computers. We present two complementary methods for measuring string order parameters that reveal the transition, and additionally analyse the effects of noise in the device using simple error models. The simulation that we perform is easily scalable and is a practical demonstration of the utility of near-term quantum computers for the study of quantum phases of matter and their transitions.

preprint2022arXiv

Finite-depth scaling of infinite quantum circuits for quantum critical points

The scaling of the entanglement entropy at a quantum critical point allows us to extract universal properties of the state, e.g., the central charge of a conformal field theory. With the rapid improvement of noisy intermediate-scale quantum (NISQ) devices, these quantum computers present themselves as a powerful tool to study critical many-body systems. We use finite-depth quantum circuits suitable for NISQ devices as a variational ansatz to represent ground states of critical, infinite systems. We find universal finite-depth scaling relations for these circuits and verify them numerically at two different critical points, i.e., the critical Ising model with an additional symmetry-preserving term and the critical XXZ model.

preprint2022arXiv

Strong Memory Lower Bounds for Learning Natural Models

We give lower bounds on the amount of memory required by one-pass streaming algorithms for solving several natural learning problems. In a setting where examples lie in $\{0,1\}^d$ and the optimal classifier can be encoded using $κ$ bits, we show that algorithms which learn using a near-minimal number of examples, $\tilde O(κ)$, must use $\tilde Ω( dκ)$ bits of space. Our space bounds match the dimension of the ambient space of the problem's natural parametrization, even when it is quadratic in the size of examples and the final classifier. For instance, in the setting of $d$-sparse linear classifiers over degree-2 polynomial features, for which $κ=Θ(d\log d)$, our space lower bound is $\tildeΩ(d^2)$. Our bounds degrade gracefully with the stream length $N$, generally having the form $\tildeΩ\left(dκ\cdot \fracκ{N}\right)$. Bounds of the form $Ω(dκ)$ were known for learning parity and other problems defined over finite fields. Bounds that apply in a narrow range of sample sizes are also known for linear regression. Ours are the first such bounds for problems of the type commonly seen in recent learning applications that apply for a large range of input sizes.

preprint2022arXiv

The Price of Differential Privacy under Continual Observation

We study the accuracy of differentially private mechanisms in the continual release model. A continual release mechanism receives a sensitive dataset as a stream of $T$ inputs and produces, after receiving each input, an accurate output on the obtained inputs. In contrast, a batch algorithm receives the data as one batch and produces a single output. We provide the first strong lower bounds on the error of continual release mechanisms. In particular, for two fundamental problems that are widely studied and used in the batch model, we show that the worst case error of every continual release algorithm is $\tilde Ω(T^{1/3})$ times larger than that of the best batch algorithm. Previous work shows only a polylogarithimic (in $T$) gap between the worst case error achievable in these two models; further, for many problems, including the summation of binary attributes, the polylogarithmic gap is tight (Dwork et al., 2010; Chan et al., 2010). Our results show that problems closely related to summation -- specifically, those that require selecting the largest of a set of sums -- are fundamentally harder in the continual release model than in the batch model. Our lower bounds assume only that privacy holds for streams fixed in advance (the "nonadaptive" setting). However, we provide matching upper bounds that hold in a model where privacy is required even for adaptively selected streams. This model may be of independent interest.

preprint2021arXiv

Identifying Correlation Clusters in Many-Body Localized Systems

We introduce techniques for analysing the structure of quantum states of many-body localized (MBL) spin chains by identifying correlation clusters from pairwise correlations. These techniques proceed by interpreting pairwise correlations in the state as a weighted graph, which we analyse using an established graph theoretic clustering algorithm. We validate our approach by studying the eigenstates of a disordered XXZ spin chain across the MBL to ergodic transition, as well as the non-equilibrium dyanmics in the MBL phase following a global quantum quench. We successfully reproduce theoretical predictions about the MBL transition obtained from renormalization group schemes. Furthermore, we identify a clear signature of many-body dynamics analogous to the logarithmic growth of entanglement. The techniques that we introduce are computationally inexpensive and in combination with matrix product state methods allow for the study of large scale localized systems. Moreover, the correlation functions we use are directly accessible in a range of experimental settings including cold atoms.

preprint2020arXiv

Differentially Private Simple Linear Regression

Economics and social science research often require analyzing datasets of sensitive personal information at fine granularity, with models fit to small subsets of the data. Unfortunately, such fine-grained analysis can easily reveal sensitive individual information. We study algorithms for simple linear regression that satisfy differential privacy, a constraint which guarantees that an algorithm's output reveals little about any individual input data record, even to an attacker with arbitrary side information about the dataset. We consider the design of differentially private algorithms for simple linear regression for small datasets, with tens to hundreds of datapoints, which is a particularly challenging regime for differential privacy. Focusing on a particular application to small-area analysis in economics research, we study the performance of a spectrum of algorithms we adapt to the setting. We identify key factors that affect their performance, showing through a range of experiments that algorithms based on robust estimators (in particular, the Theil-Sen estimator) perform well on the smallest datasets, but that other more standard algorithms do better as the dataset size increases.

preprint2020arXiv

Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis

We design a general framework for answering adaptive statistical queries that focuses on providing explicit confidence intervals along with point estimates. Prior work in this area has either focused on providing tight confidence intervals for specific analyses, or providing general worst-case bounds for point estimates. Unfortunately, as we observe, these worst-case bounds are loose in many settings --- often not even beating simple baselines like sample splitting. Our main contribution is to design a framework for providing valid, instance-specific confidence intervals for point estimates that can be generated by heuristics. When paired with good heuristics, this method gives guarantees that are orders of magnitude better than the best worst-case bounds. We provide a Python library implementing our method.

preprint2020arXiv

Noninteractive Locally Private Learning of Linear Models via Polynomial Approximations

Minimizing a convex risk function is the main step in many basic learning algorithms. We study protocols for convex optimization which provably leak very little about the individual data points that constitute the loss function. Specifically, we consider differentially private algorithms that operate in the local model, where each data record is stored on a separate user device and randomization is performed locally by those devices. We give new protocols for \emph{noninteractive} LDP convex optimization---i.e., protocols that require only a single randomized report from each user to an untrusted aggregator. We study our algorithms' performance with respect to expected loss---either over the data set at hand (empirical risk) or a larger population from which our data set is assumed to be drawn. Our error bounds depend on the form of individuals' contribution to the expected loss. For the case of \emph{generalized linear losses} (such as hinge and logistic losses), we give an LDP algorithm whose sample complexity is only linear in the dimensionality $p$ and quasipolynomial in other terms (the privacy parameters $ε$ and $δ$, and the desired excess risk $α$). This is the first algorithm for nonsmooth losses with sub-exponential dependence on $p$. For the Euclidean median problem, where the loss is given by the Euclidean distance to a given data point, we give a protocol whose sample complexity grows quasipolynomially in $p$. This is the first protocol with sub-exponential dependence on $p$ for a loss that is not a generalized linear loss . Our result for the hinge loss is based on a technique, dubbed polynomial of inner product approximation, which may be applicable to other problems. Our results for generalized linear losses and the Euclidean median are based on new reductions to the case of hinge loss.

preprint2016arXiv

Differentially Private Model Selection with Penalized and Constrained Likelihood

In statistical disclosure control, the goal of data analysis is twofold: The released information must provide accurate and useful statistics about the underlying population of interest, while minimizing the potential for an individual record to be identified. In recent years, the notion of differential privacy has received much attention in theoretical computer science, machine learning, and statistics. It provides a rigorous and strong notion of protection for individuals' sensitive information. A fundamental question is how to incorporate differential privacy into traditional statistical inference procedures. In this paper we study model selection in multivariate linear regression under the constraint of differential privacy. We show that model selection procedures based on penalized least squares or likelihood can be made differentially private by a combination of regularization and randomization, and propose two algorithms to do so. We show that our private procedures are consistent under essentially the same conditions as the corresponding non-private procedures. We also find that under differential privacy, the procedure becomes more sensitive to the tuning parameters. We illustrate and evaluate our method using simulation studies and two real data examples.

preprint2016arXiv

Max-Information, Differential Privacy, and Post-Selection Hypothesis Testing

In this paper, we initiate a principled study of how the generalization properties of approximate differential privacy can be used to perform adaptive hypothesis testing, while giving statistically valid $p$-value corrections. We do this by observing that the guarantees of algorithms with bounded approximate max-information are sufficient to correct the $p$-values of adaptively chosen hypotheses, and then by proving that algorithms that satisfy $(ε,δ)$-differential privacy have bounded approximate max information when their inputs are drawn from a product distribution. This substantially extends the known connection between differential privacy and max-information, which previously was only known to hold for (pure) $(ε,0)$-differential privacy. It also extends our understanding of max-information as a partially unifying measure controlling the generalization properties of adaptive data analyses. We also show a lower bound, proving that (despite the strong composition properties of max-information), when data is drawn from a product distribution, $(ε,δ)$-differentially private algorithms can come first in a composition with other algorithms satisfying max-information bounds, but not necessarily second if the composition is required to itself satisfy a nontrivial max-information bound. This, in particular, implies that the connection between $(ε,δ)$-differential privacy and max-information holds only for inputs drawn from product distributions, unlike the connection between $(ε,0)$-differential privacy and max-information.

preprint2016arXiv

When is Nontrivial Estimation Possible for Graphons and Stochastic Block Models?

Block graphons (also called stochastic block models) are an important and widely-studied class of models for random networks. We provide a lower bound on the accuracy of estimators for block graphons with a large number of blocks. We show that, given only the number $k$ of blocks and an upper bound $ρ$ on the values (connection probabilities) of the graphon, every estimator incurs error at least on the order of $\min(ρ, \sqrt{ρk^2/n^2})$ in the $δ_2$ metric with constant probability, in the worst case over graphons. In particular, our bound rules out any nontrivial estimation (that is, with $δ_2$ error substantially less than $ρ$) when $k\geq n\sqrtρ$. Combined with previous upper and lower bounds, our results characterize, up to logarithmic terms, the minimax accuracy of graphon estimation in the $δ_2$ metric. A similar lower bound to ours was obtained independently by Klopp, Tsybakov and Verzelen (2016).

preprint2015arXiv

Algorithmic Stability for Adaptive Data Analysis

Adaptivity is an important feature of data analysis---the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC, 2015) and Hardt and Ullman (FOCS, 2014) initiated the formal study of this problem, and gave the first upper and lower bounds on the achievable generalization error for adaptive data analysis. Specifically, suppose there is an unknown distribution $\mathbf{P}$ and a set of $n$ independent samples $\mathbf{x}$ is drawn from $\mathbf{P}$. We seek an algorithm that, given $\mathbf{x}$ as input, accurately answers a sequence of adaptively chosen queries about the unknown distribution $\mathbf{P}$. How many samples $n$ must we draw from the distribution, as a function of the type of queries, the number of queries, and the desired level of accuracy? In this work we make two new contributions: (i) We give upper bounds on the number of samples $n$ that are needed to answer statistical queries. The bounds improve and simplify the work of Dwork et al. (STOC, 2015), and have been applied in subsequent work by those authors (Science, 2015, NIPS, 2015). (ii) We prove the first upper bounds on the number of samples required to answer more general families of queries. These include arbitrary low-sensitivity queries and an important class of optimization queries. As in Dwork et al., our algorithms are based on a connection with algorithmic stability in the form of differential privacy. We extend their work by giving a quantitatively optimal, more general, and simpler proof of their main theorem that stability implies low generalization error. We also study weaker stability guarantees such as bounded KL divergence and total variation distance.

preprint2015arXiv

Classical Cryptographic Protocols in a Quantum World

Cryptographic protocols, such as protocols for secure function evaluation (SFE), have played a crucial role in the development of modern cryptography. The extensive theory of these protocols, however, deals almost exclusively with classical attackers. If we accept that quantum information processing is the most realistic model of physically feasible computation, then we must ask: what classical protocols remain secure against quantum attackers? Our main contribution is showing the existence of classical two-party protocols for the secure evaluation of any polynomial-time function under reasonable computational assumptions (for example, it suffices that the learning with errors problem be hard for quantum polynomial time). Our result shows that the basic two-party feasibility picture from classical cryptography remains unchanged in a quantum world.

preprint2015arXiv

Efficient Lipschitz Extensions for High-Dimensional Graph Statistics and Node Private Degree Distributions

Lipschitz extensions were recently proposed as a tool for designing node differentially private algorithms. However, efficiently computable Lipschitz extensions were known only for 1-dimensional functions (that is, functions that output a single real value). In this paper, we study efficiently computable Lipschitz extensions for multi-dimensional (that is, vector-valued) functions on graphs. We show that, unlike for 1-dimensional functions, Lipschitz extensions of higher-dimensional functions on graphs do not always exist, even with a non-unit stretch. We design Lipschitz extensions with small stretch for the sorted degree list and for the degree distribution of a graph. Crucially, our extensions are efficiently computable. We also develop new tools for employing Lipschitz extensions in the design of differentially private algorithms. Specifically, we generalize the exponential mechanism, a widely used tool in data privacy. The exponential mechanism is given a collection of score functions that map datasets to real values. It attempts to return the name of the function with nearly minimum value on the data set. Our generalized exponential mechanism provides better accuracy when the sensitivity of an optimal score function is much smaller than the maximum sensitivity of score functions. We use our Lipschitz extension and the generalized exponential mechanism to design a node-differentially private algorithm for releasing an approximation to the degree distribution of a graph. Our algorithm is much more accurate than algorithms from previous work.

preprint2015arXiv

Local, Private, Efficient Protocols for Succinct Histograms

We give efficient protocols and matching accuracy lower bounds for frequency estimation in the local model for differential privacy. In this model, individual users randomize their data themselves, sending differentially private reports to an untrusted server that aggregates them. We study protocols that produce a succinct histogram representation of the data. A succinct histogram is a list of the most frequent items in the data (often called "heavy hitters") along with estimates of their frequencies; the frequency of all other items is implicitly estimated as 0. If there are $n$ users whose items come from a universe of size $d$, our protocols run in time polynomial in $n$ and $\log(d)$. With high probability, they estimate the accuracy of every item up to error $O\left(\sqrt{\log(d)/(ε^2n)}\right)$ where $ε$ is the privacy parameter. Moreover, we show that this much error is necessary, regardless of computational efficiency, and even for the simple setting where only one item appears with significant frequency in the data set. Previous protocols (Mishra and Sandler, 2006; Hsu, Khanna and Roth, 2012) for this task either ran in time $Ω(d)$ or had much worse error (about $\sqrt[6]{\log(d)/(ε^2n)}$), and the only known lower bound on error was $Ω(1/\sqrt{n})$. We also adapt a result of McGregor et al (2010) to the local setting. In a model with public coins, we show that each user need only send 1 bit to the server. For all known local protocols (including ours), the transformation preserves computational efficiency.

preprint2015arXiv

More General Queries and Less Generalization Error in Adaptive Data Analysis

Adaptivity is an important feature of data analysis---typically the choice of questions asked about a dataset depends on previous interactions with the same dataset. However, generalization error is typically bounded in a non-adaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC '15) and Hardt and Ullman (FOCS '14) initiated the formal study of this problem, and gave the first upper and lower bounds on the achievable generalization error for adaptive data analysis. Specifically, suppose there is an unknown distribution $\mathcal{P}$ and a set of $n$ independent samples $x$ is drawn from $\mathcal{P}$. We seek an algorithm that, given $x$ as input, "accurately" answers a sequence of adaptively chosen "queries" about the unknown distribution $\mathcal{P}$. How many samples $n$ must we draw from the distribution, as a function of the type of queries, the number of queries, and the desired level of accuracy? In this work we make two new contributions towards resolving this question: *We give upper bounds on the number of samples $n$ that are needed to answer statistical queries that improve over the bounds of Dwork et al. *We prove the first upper bounds on the number of samples required to answer more general families of queries. These include arbitrary low-sensitivity queries and the important class of convex risk minimization queries. As in Dwork et al., our algorithms are based on a connection between differential privacy and generalization error, but we feel that our analysis is simpler and more modular, which may be useful for studying these questions in the future.

preprint2015arXiv

Private Graphon Estimation for Sparse Graphs

We design algorithms for fitting a high-dimensional statistical model to a large, sparse network without revealing sensitive information of individual members. Given a sparse input graph $G$, our algorithms output a node-differentially-private nonparametric block model approximation. By node-differentially-private, we mean that our output hides the insertion or removal of a vertex and all its adjacent edges. If $G$ is an instance of the network obtained from a generative nonparametric model defined in terms of a graphon $W$, our model guarantees consistency, in the sense that as the number of vertices tends to infinity, the output of our algorithm converges to $W$ in an appropriate version of the $L_2$ norm. In particular, this means we can estimate the sizes of all multi-way cuts in $G$. Our results hold as long as $W$ is bounded, the average degree of $G$ grows at least like the log of the number of vertices, and the number of blocks goes to infinity at an appropriate rate. We give explicit error bounds in terms of the parameters of the model; in several settings, our bounds improve on or match known nonprivate results.

preprint2014arXiv

Causal Erasure Channels

We consider the communication problem over binary causal adversarial erasure channels. Such a channel maps $n$ input bits to $n$ output symbols in $\{0,1,\wedge\}$, where $\wedge$ denotes erasure. The channel is causal if, for every $i$, the channel adversarially decides whether to erase the $i$th bit of its input based on inputs $1,...,i$, before it observes bits $i+1$ to $n$. Such a channel is $p$-bounded if it can erase at most a $p$ fraction of the input bits over the whole transmission duration. Causal channels provide a natural model for channels that obey basic physical restrictions but are otherwise unpredictable or highly variable. For a given erasure rate $p$, our goal is to understand the optimal rate (the capacity) at which a randomized (stochastic) encoder/decoder can transmit reliably across all causal $p$-bounded erasure channels. In this paper, we introduce the causal erasure model and provide new upper bounds and lower bounds on the achievable rate. Our bounds separate the achievable rate in the causal erasures setting from the rates achievable in two related models: random erasure channels (strictly weaker) and fully adversarial erasure channels (strictly stronger). Specifically, we show: - A strict separation between random and causal erasures for all constant erasure rates $p\in(0,1)$. - A strict separation between causal and fully adversarial erasures for $p\in(0,ϕ)$ where $ϕ\approx 0.348$. - For $p\in[ϕ,1/2)$, we show codes for causal erasures that have higher rate than the best known constructions for fully adversarial channels. Our results contrast with existing results on correcting causal bit-flip errors (as opposed to erasures) [Dey et. al 2008, 2009], [Haviv-Langberg 2011]. For the separations we provide, the analogous separations for bit-flip models are either not known at all or much weaker.

preprint2014arXiv

Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds

In this paper, we initiate a systematic investigation of differentially private algorithms for convex empirical risk minimization. Various instantiations of this problem have been studied before. We provide new algorithms and matching lower bounds for private ERM assuming only that each data point's contribution to the loss function is Lipschitz bounded and that the domain of optimization is bounded. We provide a separate set of algorithms and matching lower bounds for the setting in which the loss functions are known to also be strongly convex. Our algorithms run in polynomial time, and in some cases even match the optimal non-private running time (as measured by oracle complexity). We give separate algorithms (and lower bounds) for $(ε,0)$- and $(ε,δ)$-differential privacy; perhaps surprisingly, the techniques used for designing optimal algorithms in the two cases are completely different. Our lower bounds apply even to very simple, smooth function families, such as linear and quadratic functions. This implies that algorithms from previous work can be used to obtain optimal error rates, under the additional assumption that the contributions of each data point to the loss function is smooth. We show that simple approaches to smoothing arbitrary loss functions (in order to apply previous techniques) do not yield optimal error rates. In particular, optimal algorithms were not previously known for problems such as training support vector machines and the high-dimensional median.

preprint2014arXiv

Privacy-Preserving Public Information for Sequential Games

In settings with incomplete information, players can find it difficult to coordinate to find states with good social welfare. For example, in financial settings, if a collection of financial firms have limited information about each other's strategies, some large number of them may choose the same high-risk investment in hopes of high returns. While this might be acceptable in some cases, the economy can be hurt badly if many firms make investments in the same risky market segment and it fails. One reason why many firms might end up choosing the same segment is that they do not have information about other firms' investments (imperfect information may lead to `bad' game states). Directly reporting all players' investments, however, raises confidentiality concerns for both individuals and institutions. In this paper, we explore whether information about the game-state can be publicly announced in a manner that maintains the privacy of the actions of the players, and still suffices to deter players from reaching bad game-states. We show that in many games of interest, it is possible for players to avoid these bad states with the help of privacy-preserving, publicly-announced information. We model behavior of players in this imperfect information setting in two ways -- greedy and undominated strategic behaviours, and we prove guarantees on social welfare that certain kinds of privacy-preserving information can help attain. Furthermore, we design a counter with improved privacy guarantees under continual observation.

preprint2013arXiv

Optimal-Rate Code Constructions for Computationally Simple Channels

We consider coding schemes for computationally bounded channels, which can introduce an arbitrary set of errors as long as (a) the fraction of errors is bounded with high probability by a parameter $p$ and (b) the process which adds the errors can be described by a sufficiently simple circuit. Codes for such channel models are attractive since, like codes for standard adversarial errors, they can handle channels whose true behavior is unknown or varying over time. For two classes of channels, we provide explicit, efficiently encodable/decodable codes of optimal rate where only inefficiently decodable codes were previously known. In each case, we provide one encoder/decoder that works for every channel in the class. The encoders are randomized, and probabilities are taken over the (local, unknown to the decoder) coins of the encoder and those of the channel. (1) Unique decoding for additive errors: We give the first construction of a polynomial-time encodable/decodable code for additive (a.k.a. oblivious) channels that achieve the Shannon capacity $1-H(p)$. These channels add an arbitrary error vector $e\in\{0,1\}^N$ of weight at most $pN$ to the transmitted word; the vector $e$ can depend on the code but not on the particular transmitted word. (2) List-decoding for polynomial-time channels: For every constant $c>0$, we give a Monte Carlo construction of an code with optimal rate (arbitrarily close to $1-H(p)$) that efficiently recovers a short list containing the correct message with high probability for channels describable by circuits of size at most $N^c$. We justify the relaxation to list-decoding by showing that even with bounded channels, uniquely decodable codes cannot have positive rate for $p>1/4$.

preprint2012arXiv

The Power of Linear Reconstruction Attacks

We consider the power of linear reconstruction attacks in statistical data privacy, showing that they can be applied to a much wider range of settings than previously understood. Linear attacks have been studied before (Dinur and Nissim PODS'03, Dwork, McSherry and Talwar STOC'07, Kasiviswanathan, Rudelson, Smith and Ullman STOC'10, De TCC'12, Muthukrishnan and Nikolov STOC'12) but have so far been applied only in settings with releases that are obviously linear. Consider a database curator who manages a database of sensitive information but wants to release statistics about how a sensitive attribute (say, disease) in the database relates to some nonsensitive attributes (e.g., postal code, age, gender, etc). We show one can mount linear reconstruction attacks based on any release that gives: a) the fraction of records that satisfy a given non-degenerate boolean function. Such releases include contingency tables (previously studied by Kasiviswanathan et al., STOC'10) as well as more complex outputs like the error rate of classifiers such as decision trees; b) any one of a large class of M-estimators (that is, the output of empirical risk minimization algorithms), including the standard estimators for linear and logistic regression. We make two contributions: first, we show how these types of releases can be transformed into a linear format, making them amenable to existing polynomial-time reconstruction algorithms. This is already perhaps surprising, since many of the above releases (like M-estimators) are obtained by solving highly nonlinear formulations. Second, we show how to analyze the resulting attacks under various distributional assumptions on the data. Specifically, we consider a setting in which the same statistic (either a) or b) above) is released about how the sensitive attribute relates to all subsets of size k (out of a total of d) nonsensitive boolean attributes.

preprint2010arXiv

Explicit Capacity-achieving Codes for Worst-Case Additive Errors

For every p in (0,1/2), we give an explicit construction of binary codes of rate approaching "capacity" 1-H(p) that enable reliable communication in the presence of worst-case additive errors}, caused by a channel oblivious to the codeword (but not necessarily the message). Formally, we give an efficient "stochastic" encoding E(\cdot,\cdot) of messages combined with a small number of auxiliary random bits, such that for every message m and every error vector e (that could depend on m) that contains at most a fraction p of ones, w.h.p over the random bits r chosen by the encoder, m can be efficiently recovered from the corrupted codeword E(m,r) + e by a decoder without knowledge of the encoder's randomness r. Our construction for additive errors also yields explicit deterministic codes of rate approaching 1-H(p) for the "average error" criterion: for every error vector e of at most p fraction 1's, most messages m can be efficiently (uniquely) decoded from the corrupted codeword C(m)+e. Note that such codes cannot be linear, as the bad error patterns for all messages are the same in a linear code. We also give a new proof of the existence of such codes based on list decoding and certain algebraic manipulation detection codes. Our proof is simpler than the previous proofs from the literature on arbitrarily varying channels.

preprint2010arXiv

Leftover Hashing Against Quantum Side Information

The Leftover Hash Lemma states that the output of a two-universal hash function applied to an input with sufficiently high entropy is almost uniformly random. In its standard formulation, the lemma refers to a notion of randomness that is (usually implicitly) defined with respect to classical side information. Here, we prove a (strictly) more general version of the Leftover Hash Lemma that is valid even if side information is represented by the state of a quantum system. Furthermore, our result applies to arbitrary delta-almost two-universal families of hash functions. The generalized Leftover Hash Lemma has applications in cryptography, e.g., for key agreement in the presence of an adversary who is not restricted to classical information processing.

preprint2010arXiv

What Can We Learn Privately?

Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large real-life data sets. We ask: what concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or specific training example? More precisely, we investigate learning algorithms that satisfy differential privacy, a notion that provides strong confidentiality guarantees in contexts where aggregate information is released about a database containing sensitive information about individuals. We demonstrate that, ignoring computational constraints, it is possible to privately agnostically learn any concept class using a sample size approximately logarithmic in the cardinality of the concept class. Therefore, almost anything learnable is learnable privately: specifically, if a concept class is learnable by a (non-private) algorithm with polynomial sample complexity and output size, then it can be learned privately using a polynomial number of samples. We also present a computationally efficient private PAC learner for the class of parity functions. Local (or randomized response) algorithms are a practical class of private algorithms that have received extensive investigation. We provide a precise characterization of local private learning algorithms. We show that a concept class is learnable by a local algorithm if and only if it is learnable in the statistical query (SQ) model. Finally, we present a separation between the power of interactive and noninteractive local learning algorithms.

preprint2008arXiv

Secure Multiparty Quantum Computation with (Only) a Strict Honest Majority

Secret sharing and multiparty computation (also called "secure function evaluation") are fundamental primitives in modern cryptography, allowing a group of mutually distrustful players to perform correct, distributed computations under the sole assumption that some number of them will follow the protocol honestly. This paper investigates how much trust is necessary -- that is, how many players must remain honest -- in order for distributed quantum computations to be possible. We present a verifiable quantum secret sharing (VQSS) protocol, and a general secure multiparty quantum computation (MPQC) protocol, which can tolerate any (n-1)/2 (rounded down) cheaters among n players. Previous protocols for these tasks tolerated (n-1)/4 (rounded down) and (n-1)/6 (rounded down) cheaters, respectively. The threshold we achieve is tight - even in the classical case, ``fair'' multiparty computation is not possible if any set of n/2 players can cheat. Our protocols rely on approximate quantum error-correcting codes, which can tolerate a larger fraction of errors than traditional, exact codes. We introduce new families of authentication schemes and approximate codes tailored to the needs of our protocols, as well as new state purification techniques along the lines of those used in fault-tolerant quantum circuits.

preprint2002arXiv

Authentication of Quantum Messages

Authentication is a well-studied area of classical cryptography: a sender S and a receiver R sharing a classical private key want to exchange a classical message with the guarantee that the message has not been modified by any third party with control of the communication line. In this paper we define and investigate the authentication of messages composed of quantum states. Assuming S and R have access to an insecure quantum channel and share a private, classical random key, we provide a non-interactive scheme that enables S both to encrypt and to authenticate (with unconditional security) an m qubit message by encoding it into m+s qubits, where the failure probability decreases exponentially in the security parameter s. The classical private key is 2m+O(s) bits. To achieve this, we give a highly efficient protocol for testing the purity of shared EPR pairs. We also show that any scheme to authenticate quantum messages must also encrypt them. (In contrast, one can authenticate a classical message while leaving it publicly readable.) This has two important consequences: On one hand, it allows us to give a lower bound of 2m key bits for authenticating m qubits, which makes our protocol asymptotically optimal. On the other hand, we use it to show that digitally signing quantum states is impossible, even with only computational security.

Adam Smith

What is connected

Connect this record

See the researcher in context

Building this map preview

29 published item(s)

Variational quantum eigensolver for chemical molecules

Improved Differential Privacy for SGD via Optimal Private Linear Operators on Adaptive Streams

Crossing a topological phase transition with a quantum computer

Finite-depth scaling of infinite quantum circuits for quantum critical points

Strong Memory Lower Bounds for Learning Natural Models

The Price of Differential Privacy under Continual Observation

Identifying Correlation Clusters in Many-Body Localized Systems

Differentially Private Simple Linear Regression

Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis

Noninteractive Locally Private Learning of Linear Models via Polynomial Approximations

Differentially Private Model Selection with Penalized and Constrained Likelihood

Max-Information, Differential Privacy, and Post-Selection Hypothesis Testing

When is Nontrivial Estimation Possible for Graphons and Stochastic Block Models?

Algorithmic Stability for Adaptive Data Analysis

Classical Cryptographic Protocols in a Quantum World

Efficient Lipschitz Extensions for High-Dimensional Graph Statistics and Node Private Degree Distributions

Local, Private, Efficient Protocols for Succinct Histograms

More General Queries and Less Generalization Error in Adaptive Data Analysis

Private Graphon Estimation for Sparse Graphs

Causal Erasure Channels

Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds

Privacy-Preserving Public Information for Sequential Games

Optimal-Rate Code Constructions for Computationally Simple Channels

The Power of Linear Reconstruction Attacks

Explicit Capacity-achieving Codes for Worst-Case Additive Errors

Leftover Hashing Against Quantum Side Information

What Can We Learn Privately?

Secure Multiparty Quantum Computation with (Only) a Strict Honest Majority

Authentication of Quantum Messages