Source author record

Emmanuel J. Candes

Emmanuel J. Candes appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT math.NA math.ST Methodology Statistics Theory Machine Learning math.OC Applications math.CA

Catalog footprint

What is connected

16works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

An Adaptively Resized Parametric Bootstrap for Inference in High-dimensional Generalized Linear Models

Accurate statistical inference in logistic regression models remains a critical challenge when the ratio between the number of parameters and sample size is not negligible. This is because approximations based on either classical asymptotic theory or bootstrap calculations are grossly off the mark. This paper introduces a resized bootstrap method to infer model parameters in arbitrary dimensions. As in the parametric bootstrap, we resample observations from a distribution, which depends on an estimated regression coefficient sequence. The novelty is that this estimate is actually far from the maximum likelihood estimate (MLE). This estimate is informed by recent theory studying properties of the MLE in high dimensions, and is obtained by appropriately shrinking the MLE towards the origin. We demonstrate that the resized bootstrap method yields valid confidence intervals in both simulated and real data examples. Our methods extend to other high-dimensional generalized linear models.

preprint2020arXiv

Conformal Prediction Under Covariate Shift

We extend conformal prediction methodology beyond the case of exchangeable data. In particular, we show that a weighted version of conformal prediction can be used to compute distribution-free prediction intervals for problems in which the test and training covariate distributions differ, but the likelihood ratio between these two distributions is known---or, in practice, can be estimated accurately with access to a large set of unlabeled data (test covariate points). Our weighted extension of conformal prediction also applies more generally, to settings in which the data satisfies a certain weighted notion of exchangeability. We discuss other potential applications of our new conformal methodology, including latent variable and missing data problems.

preprint2020arXiv

Predictive inference with the jackknife+

This paper introduces the jackknife+, which is a novel method for constructing predictive confidence intervals. Whereas the jackknife outputs an interval centered at the predicted response of a test point, with the width of the interval determined by the quantiles of leave-one-out residuals, the jackknife+ also uses the leave-one-out predictions at the test point to account for the variability in the fitted regression function. Assuming exchangeable training samples, we prove that this crucial modification permits rigorous coverage guarantees regardless of the distribution of the data points, for any algorithm that treats the training points symmetrically. Such guarantees are not possible for the original jackknife and we demonstrate examples where the coverage rate may actually vanish. Our theoretical and empirical analysis reveals that the jackknife and the jackknife+ intervals achieve nearly exact coverage and have similar lengths whenever the fitting algorithm obeys some form of stability. Further, we extend the jackknife+ to K-fold cross validation and similarly establish rigorous coverage properties. Our methods are related to cross-conformal prediction proposed by Vovk [2015] and we discuss connections.

preprint2018arXiv

A modern maximum-likelihood theory for high-dimensional logistic regression

Every student in statistics or data science learns early on that when the sample size largely exceeds the number of variables, fitting a logistic model produces estimates that are approximately unbiased. Every student also learns that there are formulas to predict the variability of these estimates which are used for the purpose of statistical inference; for instance, to produce p-values for testing the significance of regression coefficients. Although these formulas come from large sample asymptotics, we are often told that we are on reasonably safe grounds when $n$ is large in such a way that $n \ge 5p$ or $n \ge 10p$. This paper shows that this is far from the case, and consequently, inferences routinely produced by common software packages are often unreliable. Consider a logistic model with independent features in which $n$ and $p$ become increasingly large in a fixed ratio. Then we show that (1) the MLE is biased, (2) the variability of the MLE is far greater than classically predicted, and (3) the commonly used likelihood-ratio test (LRT) is not distributed as a chi-square. The bias of the MLE is extremely problematic as it yields completely wrong predictions for the probability of a case based on observed values of the covariates. We develop a new theory, which asymptotically predicts (1) the bias of the MLE, (2) the variability of the MLE, and (3) the distribution of the LRT. We empirically also demonstrate that these predictions are extremely accurate in finite samples. Further, an appealing feature is that these novel predictions depend on the unknown sequence of regression coefficients only through a single scalar, the overall strength of the signal. This suggests very concrete procedures to adjust inference; we describe one such procedure learning a single parameter from data and producing accurate inference

preprint2016arXiv

Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems

We consider the fundamental problem of solving quadratic systems of equations in $n$ variables, where $y_i = |\langle \boldsymbol{a}_i, \boldsymbol{x} \rangle|^2$, $i = 1, \ldots, m$ and $\boldsymbol{x} \in \mathbb{R}^n$ is unknown. We propose a novel method, which starting with an initial guess computed by means of a spectral method, proceeds by minimizing a nonconvex functional as in the Wirtinger flow approach. There are several key distinguishing features, most notably, a distinct objective functional and novel update rules, which operate in an adaptive fashion and drop terms bearing too much influence on the search direction. These careful selection rules provide a tighter initial guess, better descent directions, and thus enhanced practical performance. On the theoretical side, we prove that for certain unstructured models of quadratic systems, our algorithms return the correct solution in linear time, i.e. in time proportional to reading the data $\{\boldsymbol{a}_i\}$ and $\{y_i\}$ as soon as the ratio $m/n$ between the number of equations and unknowns exceeds a fixed numerical constant. We extend the theory to deal with noisy systems in which we only have $y_i \approx |\langle \boldsymbol{a}_i, \boldsymbol{x} \rangle|^2$ and prove that our algorithms achieve a statistical accuracy, which is nearly un-improvable. We complement our theoretical study with numerical examples showing that solving random quadratic systems is both computationally and statistically not much harder than solving linear systems of the same size---hence the title of this paper. For instance, we demonstrate empirically that the computational cost of our algorithm is about four times that of solving a least-squares problem of the same size.

preprint2015arXiv

A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights

We derive a second-order ordinary differential equation (ODE) which is the limit of Nesterov's accelerated gradient method. This ODE exhibits approximate equivalence to Nesterov's scheme and thus can serve as a tool for analysis. We show that the continuous time ODE allows for a better understanding of Nesterov's scheme. As a byproduct, we obtain a family of schemes with similar convergence rates. The ODE interpretation also suggests restarting Nesterov's scheme leading to an algorithm, which can be rigorously proven to converge at a linear rate whenever the objective is strongly convex.

preprint2015arXiv

Super-Resolution of Positive Sources: the Discrete Setup

In single-molecule microscopy it is necessary to locate with high precision point sources from noisy observations of the spectrum of the signal at frequencies capped by $f_c$, which is just about the frequency of natural light. This paper rigorously establishes that this super-resolution problem can be solved via linear programming in a stable manner. We prove that the quality of the reconstruction crucially depends on the Rayleigh regularity of the support of the signal; that is, on the maximum number of sources that can occur within a square of side length about $1/f_c$. The theoretical performance guarantee is complemented with a converse result showing that our simple convex program convex is nearly optimal. Finally, numerical experiments illustrate our methods.

preprint2012arXiv

On the Fundamental Limits of Adaptive Sensing

Suppose we can sequentially acquire arbitrary linear measurements of an n-dimensional vector x resulting in the linear model y = Ax + z, where z represents measurement noise. If the signal is known to be sparse, one would expect the following folk theorem to be true: choosing an adaptive strategy which cleverly selects the next row of A based on what has been previously observed should do far better than a nonadaptive strategy which sets the rows of A ahead of time, thus not trying to learn anything about the signal in between observations. This paper shows that the folk theorem is false. We prove that the advantages offered by clever adaptive strategies and sophisticated estimation procedures---no matter how intractable---over classical compressed acquisition/recovery schemes are, in general, minimal.

preprint2012arXiv

Solving Quadratic Equations via PhaseLift when There Are About As Many Equations As Unknowns

This note shows that we can recover a complex vector x in C^n exactly from on the order of n quadratic equations of the form |<a_i, x>|^2 = b_i, i = 1, ..., m, by using a semidefinite program known as PhaseLift. This improves upon earlier bounds in [3], which required the number of equations to be at least on the order of n log n. We also demonstrate optimal recovery results from noisy quadratic measurements; these results are much sharper than previously known results.

preprint2012arXiv

Unbiased Risk Estimates for Singular Value Thresholding and Spectral Estimators

In an increasing number of applications, it is of interest to recover an approximately low-rank data matrix from noisy observations. This paper develops an unbiased risk estimate---holding in a Gaussian model---for any spectral estimator obeying some mild regularity assumptions. In particular, we give an unbiased risk estimate formula for singular value thresholding (SVT), a popular estimation strategy which applies a soft-thresholding rule to the singular values of the noisy observations. Among other things, our formulas offer a principled and automated way of selecting regularization parameters in a variety of problems. In particular, we demonstrate the utility of the unbiased risk estimation for SVT-based denoising of real clinical cardiac MRI series data. We also give new results concerning the differentiability of certain matrix-valued functions.

preprint2011arXiv

Phase Retrieval via Matrix Completion

This paper develops a novel framework for phase retrieval, a problem which arises in X-ray crystallography, diffraction imaging, astronomical imaging and many other applications. Our approach combines multiple structured illuminations together with ideas from convex programming to recover the phase from intensity measurements, typically from the modulus of the diffracted wave. We demonstrate empirically that any complex-valued object can be recovered from the knowledge of the magnitude of just a few diffracted patterns by solving a simple convex optimization problem inspired by the recent literature on matrix completion. More importantly, we also demonstrate that our noise-aware algorithms are stable in the sense that the reconstruction degrades gracefully as the signal-to-noise ratio decreases. Finally, we introduce some theory showing that one can design very simple structured illumination patterns such that three diffracted figures uniquely determine the phase of the object we wish to recover.

preprint2011arXiv

PhaseLift: Exact and Stable Signal Recovery from Magnitude Measurements via Convex Programming

Suppose we wish to recover a signal x in C^n from m intensity measurements of the form |<x,z_i>|^2, i = 1, 2,..., m; that is, from data in which phase information is missing. We prove that if the vectors z_i are sampled independently and uniformly at random on the unit sphere, then the signal x can be recovered exactly (up to a global phase factor) by solving a convenient semidefinite program---a trace-norm minimization problem; this holds with large probability provided that m is on the order of n log n, and without any assumption about the signal whatsoever. This novel result demonstrates that in some instances, the combinatorial phase retrieval problem can be solved by convex programming techniques. Finally, we also prove that our methodology is robust vis a vis additive noise.

preprint2010arXiv

A probabilistic and RIPless theory of compressed sensing

This paper introduces a simple and very general theory of compressive sensing. In this theory, the sensing mechanism simply selects sensing vectors independently at random from a probability distribution F; it includes all models - e.g. Gaussian, frequency measurements - discussed in the literature, but also provides a framework for new measurement strategies as well. We prove that if the probability distribution F obeys a simple incoherence property and an isotropy property, one can faithfully recover approximately sparse signals from a minimal number of noisy measurements. The novelty is that our recovery results do not require the restricted isometry property (RIP) - they make use of a much weaker notion - or a random model for the signal. As an example, the paper shows that a signal with s nonzero entries can be faithfully recovered from about s log n Fourier coefficients that are contaminated with noise.

preprint2010arXiv

Compressed Sensing with Coherent and Redundant Dictionaries

This article presents novel results concerning the recovery of signals from undersampled data in the common situation where such signals are not sparse in an orthonormal basis or incoherent dictionary, but in a truly redundant dictionary. This work thus bridges a gap in the literature and shows not only that compressed sensing is viable in this context, but also that accurate recovery is possible via an L1-analysis optimization problem. We introduce a condition on the measurement/sensing matrix, which is a natural generalization of the now well-known restricted isometry property, and which guarantees accurate recovery of signals that are nearly sparse in (possibly) highly overcomplete and coherent dictionaries. This condition imposes no incoherence restriction on the dictionary and our results may be the first of this kind. We discuss practical examples and the implications of our results on those applications, and complement our study by demonstrating the potential of L1-analysis for such problems.

preprint2010arXiv

Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit

We consider the problem of recovering a low-rank matrix when some of its entries, whose locations are not known a priori, are corrupted by errors of arbitrarily large magnitude. It has recently been shown that this problem can be solved efficiently and effectively by a convex program named Principal Component Pursuit (PCP), provided that the fraction of corrupted entries and the rank of the matrix are both sufficiently small. In this paper, we extend that result to show that the same convex program, with a slightly improved weighting parameter, exactly recovers the low-rank matrix even if "almost all" of its entries are arbitrarily corrupted, provided the signs of the errors are random. We corroborate our result with simulations on randomly generated matrices and errors.

preprint2010arXiv

Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements

This paper presents several novel theoretical results regarding the recovery of a low-rank matrix from just a few measurements consisting of linear combinations of the matrix entries. We show that properly constrained nuclear-norm minimization stably recovers a low-rank matrix from a constant number of noisy measurements per degree of freedom; this seems to be the first result of this nature. Further, the recovery error from noisy data is within a constant of three targets: 1) the minimax risk, 2) an oracle error that would be available if the column space of the matrix were known, and 3) a more adaptive oracle error which would be available with the knowledge of the column space corresponding to the part of the matrix that stands above the noise. Lastly, the error bounds regarding low-rank matrices are extended to provide an error bound when the matrix has full rank with decaying singular values. The analysis in this paper is based on the restricted isometry property (RIP) introduced in [6] for vectors, and in [22] for matrices.

Emmanuel J. Candes

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

An Adaptively Resized Parametric Bootstrap for Inference in High-dimensional Generalized Linear Models

Conformal Prediction Under Covariate Shift

Predictive inference with the jackknife+

A modern maximum-likelihood theory for high-dimensional logistic regression

Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems

A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights

Super-Resolution of Positive Sources: the Discrete Setup

On the Fundamental Limits of Adaptive Sensing

Solving Quadratic Equations via PhaseLift when There Are About As Many Equations As Unknowns

Unbiased Risk Estimates for Singular Value Thresholding and Spectral Estimators

Phase Retrieval via Matrix Completion

PhaseLift: Exact and Stable Signal Recovery from Magnitude Measurements via Convex Programming

A probabilistic and RIPless theory of compressed sensing

Compressed Sensing with Coherent and Redundant Dictionaries

Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit

Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements