Source author record

Afonso S. Bandeira

Afonso S. Bandeira appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

35works

23topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A remark on Kashin's discrepancy argument and partial coloring in the Komlós conjecture

In this expository note, we discuss an early partial coloring result of B. Kashin [C. R. Acad. Bulgare Sci., 1985]. Although this result only implies Spencer's six standard deviations [Trans. Amer. Math. Soc., 1985] up to a $\log\log n$ factor, Kashin's argument gives a simple proof of the existence of a constant discrepancy partial coloring in the setup of Komlós conjecture.

preprint2022arXiv

Community Detection with a Subsampled Semidefinite Program

Semidefinite programming is an important tool to tackle several problems in data science and signal processing, including clustering and community detection. However, semidefinite programs are often slow in practice, so speed up techniques such as sketching are often considered. In the context of community detection in the stochastic block model, Mixon and Xie \cite{mixon2020sketching} have recently proposed a sketching framework in which a semidefinite program is solved only on a subsampled subgraph of the network, giving rise to significant computational savings. In this short paper, we provide a positive answer to a conjecture of Mixon and Xie about the statistical limits of this technique for the stochastic block model with two balanced communities.

preprint2022arXiv

Dual bounds for the positive definite functions approach to mutually unbiased bases

A long-standing open problem asks if there can exist 7 mutually unbiased bases (MUBs) in $\mathbb{C}^6$, or, more generally, $d + 1$ MUBs in $\mathbb{C}^d$ for any $d$ that is not a prime power. The recent work of Kolountzakis, Matolcsi, and Weiner (2016) proposed an application of the method of positive definite functions (a relative of Delsarte's method in coding theory and Lovász's semidefinite programming relaxation of the independent set problem) as a means of answering this question in the negative. Namely, they ask whether there exists a polynomial of a unitary matrix input satisfying various properties which, through the method of positive definite functions, would show the non-existence of 7 MUBs in $\mathbb{C}^6$. Using a convex duality argument, we prove that such a polynomial of degree at most 6 cannot exist. We also propose a general dual certificate which we conjecture to certify that this method can never show that there exist strictly fewer than $d + 1$ MUBs in $\mathbb{C}^d$.

preprint2022arXiv

Subexponential-Time Algorithms for Sparse PCA

We study the computational cost of recovering a unit-norm sparse principal component $x \in \mathbb{R}^n$ planted in a random matrix, in either the Wigner or Wishart spiked model (observing either $W + λxx^\top$ with $W$ drawn from the Gaussian orthogonal ensemble, or $N$ independent samples from $\mathcal{N}(0, I_n + βxx^\top)$, respectively). Prior work has shown that when the signal-to-noise ratio ($λ$ or $β\sqrt{N/n}$, respectively) is a small constant and the fraction of nonzero entries in the planted vector is $\|x\|_0 / n = ρ$, it is possible to recover $x$ in polynomial time if $ρ\lesssim 1/\sqrt{n}$. While it is possible to recover $x$ in exponential time under the weaker condition $ρ\ll 1$, it is believed that polynomial-time recovery is impossible unless $ρ\lesssim 1/\sqrt{n}$. We investigate the precise amount of time required for recovery in the "possible but hard" regime $1/\sqrt{n} \ll ρ\ll 1$ by exploring the power of subexponential-time algorithms, i.e., algorithms running in time $\exp(n^δ)$ for some constant $δ\in (0,1)$. For any $1/\sqrt{n} \ll ρ\ll 1$, we give a recovery algorithm with runtime roughly $\exp(ρ^2 n)$, demonstrating a smooth tradeoff between sparsity and runtime. Our family of algorithms interpolates smoothly between two existing algorithms: the polynomial-time diagonal thresholding algorithm and the $\exp(ρn)$-time exhaustive search algorithm. Furthermore, by analyzing the low-degree likelihood ratio, we give rigorous evidence suggesting that the tradeoff achieved by our algorithms is optimal.

preprint2020arXiv

Spectral Planting and the Hardness of Refuting Cuts, Colorability, and Communities in Random Graphs

We study the problem of efficiently refuting the k-colorability of a graph, or equivalently certifying a lower bound on its chromatic number. We give formal evidence of average-case computational hardness for this problem in sparse random regular graphs, showing optimality of a simple spectral certificate. This evidence takes the form of a computationally-quiet planting: we construct a distribution of d-regular graphs that has significantly smaller chromatic number than a typical regular graph drawn uniformly at random, while providing evidence that these two distributions are indistinguishable by a large class of algorithms. We generalize our results to the more general problem of certifying an upper bound on the maximum k-cut. This quiet planting is achieved by minimizing the effect of the planted structure (e.g. colorings or cuts) on the graph spectrum. Specifically, the planted structure corresponds exactly to eigenvectors of the adjacency matrix. This avoids the pushout effect of random matrix theory, and delays the point at which the planting becomes visible in the spectrum or local statistics. To illustrate this further, we give similar results for a Gaussian analogue of this problem: a quiet version of the spiked model, where we plant an eigenspace rather than adding a generic low-rank perturbation. Our evidence for computational hardness of distinguishing two distributions is based on three different heuristics: stability of belief propagation, the local statistics hierarchy, and the low-degree likelihood ratio. Of independent interest, our results include general-purpose bounds on the low-degree likelihood ratio for multi-spiked matrix models, and an improved low-degree analysis of the stochastic block model.

preprint2020arXiv

Spurious Valleys in Two-layer Neural Network Optimization Landscapes

Neural networks provide a rich class of high-dimensional, non-convex optimization problems. Despite their non-convexity, gradient-descent methods often successfully optimize these models. This has motivated a recent spur in research attempting to characterize properties of their loss surface that may explain such success. In this paper, we address this phenomenon by studying a key topological property of the loss: the presence or absence of spurious valleys, defined as connected components of sub-level sets that do not include a global minimum. Focusing on a class of two-layer neural networks defined by smooth (but generally non-linear) activation functions, we identify a notion of intrinsic dimension and show that it provides necessary and sufficient conditions for the absence of spurious valleys. More concretely, finite intrinsic dimension guarantees that for sufficiently overparametrised models no spurious valleys exist, independently of the data distribution. Conversely, infinite intrinsic dimension implies that spurious valleys do exist for certain data distributions, independently of model overparametrisation. Besides these positive and negative results, we show that, although spurious valleys may exist in general, they are confined to low risk levels and avoided with high probability on overparametrised models.

preprint2016arXiv

A polynomial-time relaxation of the Gromov-Hausdorff distance

The Gromov-Hausdorff distance provides a metric on the set of isometry classes of compact metric spaces. Unfortunately, computing this metric directly is believed to be computationally intractable. Motivated by applications in shape matching and point-cloud comparison, we study a semidefinite programming relaxation of the Gromov-Hausdorff metric. This relaxation can be computed in polynomial time, and somewhat surprisingly is itself a pseudometric. We describe the induced topology on the set of compact metric spaces. Finally, we demonstrate the numerical performance of various algorithms for computing the relaxed distance and apply these algorithms to several relevant data sets. In particular we propose a greedy algorithm for finding the best correspondence between finite metric spaces that can handle hundreds of points.

preprint2016arXiv

On the low-rank approach for semidefinite programs arising in synchronization and community detection

To address difficult optimization problems, convex relaxations based on semidefinite programming are now common place in many fields. Although solvable in polynomial time, large semidefinite programs tend to be computationally challenging. Over a decade ago, exploiting the fact that in many applications of interest the desired solutions are low rank, Burer and Monteiro proposed a heuristic to solve such semidefinite programs by restricting the search space to low-rank matrices. The accompanying theory does not explain the extent of the empirical success. We focus on Synchronization and Community Detection problems and provide theoretical guarantees shedding light on the remarkable efficiency of this heuristic.

preprint2016arXiv

Optimality and Sub-optimality of PCA for Spiked Random Matrices and Synchronization

A central problem of random matrix theory is to understand the eigenvalues of spiked random matrix models, in which a prominent eigenvector is planted into a random matrix. These distributions form natural statistical models for principal component analysis (PCA) problems throughout the sciences. Baik, Ben Arous and Péché showed that the spiked Wishart ensemble exhibits a sharp phase transition asymptotically: when the signal strength is above a critical threshold, it is possible to detect the presence of a spike based on the top eigenvalue, and below the threshold the top eigenvalue provides no information. Such results form the basis of our understanding of when PCA can detect a low-rank signal in the presence of noise. However, not all the information about the spike is necessarily contained in the spectrum. We study the fundamental limitations of statistical methods, including non-spectral ones. Our results include: I) For the Gaussian Wigner ensemble, we show that PCA achieves the optimal detection threshold for a variety of benign priors for the spike. We extend previous work on the spherically symmetric and i.i.d. Rademacher priors through an elementary, unified analysis. II) For any non-Gaussian Wigner ensemble, we show that PCA is always suboptimal for detection. However, a variant of PCA achieves the optimal threshold (for benign priors) by pre-transforming the matrix entries according to a carefully designed function. This approach has been stated before, and we give a rigorous and general analysis. III) For both the Gaussian Wishart ensemble and various synchronization problems over groups, we show that inefficient procedures can work below the threshold where PCA succeeds, whereas no known efficient algorithm achieves this. This conjectural gap between what is statistically possible and what can be done efficiently remains open.

preprint2016arXiv

Sharp nonasymptotic bounds on the norm of random matrices with independent entries

We obtain nonasymptotic bounds on the spectral norm of random matrices with independent entries that improve significantly on earlier results. If $X$ is the $n\times n$ symmetric matrix with $X_{ij}\sim N(0,b_{ij}^2)$, we show that \[\mathbf{E}\Vert X\Vert \lesssim\max_i\sqrt{\sum_jb_{ij}^2}+\max _{ij}\vert b_{ij}\vert \sqrt{\log n}.\] This bound is optimal in the sense that a matching lower bound holds under mild assumptions, and the constants are sufficiently sharp that we can often capture the precise edge of the spectrum. Analogous results are obtained for rectangular matrices and for more general sub-Gaussian or heavy-tailed distributions of the entries, and we derive tail bounds in addition to bounds on the expected norm. The proofs are based on a combination of the moment method and geometric functional analysis techniques. As an application, we show that our bounds immediately yield the correct phase transition behavior of the spectral edge of random band matrices and of sparse Wigner matrices. We also recover a result of Seginer on the norm of Rademacher matrices.

preprint2016arXiv

Tightness of the maximum likelihood semidefinite relaxation for angular synchronization

Maximum likelihood estimation problems are, in general, intractable optimization problems. As a result, it is common to approximate the maximum likelihood estimator (MLE) using convex relaxations. In some cases, the relaxation is tight: it recovers the true MLE. Most tightness proofs only apply to situations where the MLE exactly recovers a planted solution (known to the analyst). It is then sufficient to establish that the optimality conditions hold at the planted signal. In this paper, we study an estimation problem (angular synchronization) for which the MLE is not a simple function of the planted solution, yet for which the convex relaxation is tight. To establish tightness in this context, the proof is less direct because the point at which to verify optimality conditions is not known explicitly. Angular synchronization consists in estimating a collection of $n$ phases, given noisy measurements of the pairwise relative phases. The MLE for angular synchronization is the solution of a (hard) non-bipartite Grothendieck problem over the complex numbers. We consider a stochastic model for the data: a planted signal (that is, a ground truth set of phases) is corrupted with non-adversarial random noise. Even though the MLE does not coincide with the planted signal, we show that the classical semidefinite relaxation for it is tight, with high probability. This holds even for high levels of noise.

preprint2015arXiv

A note on Probably Certifiably Correct algorithms

Many optimization problems of interest are known to be intractable, and while there are often heuristics that are known to work on typical instances, it is usually not easy to determine a posteriori whether the optimal solution was found. In this short note, we discuss algorithms that not only solve the problem on typical instances, but also provide a posteriori certificates of optimality, probably certifiably correct (PCC) algorithms. As an illustrative example, we present a fast PCC algorithm for minimum bisection under the stochastic block model and briefly discuss other examples.

preprint2015arXiv

Approximating the Little Grothendieck Problem over the Orthogonal and Unitary Groups

The little Grothendieck problem consists of maximizing $\sum_{ij}C_{ij}x_ix_j$ over binary variables $x_i\in\{\pm1\}$, where C is a positive semidefinite matrix. In this paper we focus on a natural generalization of this problem, the little Grothendieck problem over the orthogonal group. Given C a dn x dn positive semidefinite matrix, the objective is to maximize $\sum_{ij}Tr (C_{ij}^TO_iO_j^T)$ restricting $O_i$ to take values in the group of orthogonal matrices, where $C_{ij}$ denotes the (ij)-th d x d block of C. We propose an approximation algorithm, which we refer to as Orthogonal-Cut, to solve this problem and show a constant approximation ratio. Our method is based on semidefinite programming. For a given $d\geq 1$, we show a constant approximation ratio of $α_{R}(d)^2$, where $α_{R}(d)$ is the expected average singular value of a d x d matrix with random Gaussian $N(0,1/d)$ i.i.d. entries. For d=1 we recover the known $α_{R}(1)^2=2/π$ approximation guarantee for the classical little Grothendieck problem. Our algorithm and analysis naturally extends to the complex valued case also providing a constant approximation ratio for the analogous problem over the Unitary Group. Orthogonal-Cut also serves as an approximation algorithm for several applications, including the Procrustes problem where it improves over the best previously known approximation ratio of~$\frac1{2\sqrt{2}}$. The little Grothendieck problem falls under the class of problems approximated by a recent algorithm proposed in the context of the non-commutative Grothendieck inequality. Nonetheless, our approach is simpler and it provides a more efficient algorithm with better approximation ratios and matching integrality gaps. Finally, we also provide an improved approximation algorithm for the more general little Grothendieck problem over the orthogonal (or unitary) group with rank constraints.

preprint2015arXiv

Linear Boolean classification, coding and "the critical problem"

The problem of constructing a minimal rank matrix over GF(2) whose kernel does not intersect a given set S is considered. In the case where S is a Hamming ball centered at 0, this is equivalent to finding linear codes of largest dimension. For a general set, this is an instance of "the critical problem" posed by Crapo and Rota in 1970. This work focuses on the case where S is an annulus. As opposed to balls, it is shown that an optimal kernel is composed not only of dense but also of sparse vectors, and the optimal mixture is identified in various cases. These findings corroborate a proposed conjecture that for annulus of inner and outer radius nq and np respectively, the optimal relative rank is given by (1-q)H(p/(1-q)), an extension of the Gilbert-Varshamov bound H(p) conjectured for Hamming balls of radius np.

preprint2015arXiv

Multisection in the Stochastic Block Model using Semidefinite Programming

We consider the problem of identifying underlying community-like structures in graphs. Towards this end we study the Stochastic Block Model (SBM) on $k$-clusters: a random model on $n=km$ vertices, partitioned in $k$ equal sized clusters, with edges sampled independently across clusters with probability $q$ and within clusters with probability $p$, $p>q$. The goal is to recover the initial "hidden" partition of $[n]$. We study semidefinite programming (SDP) based algorithms in this context. In the regime $p = \frac{α\log(m)}{m}$ and $q = \frac{β\log(m)}{m}$ we show that a certain natural SDP based algorithm solves the problem of {\em exact recovery} in the $k$-community SBM, with high probability, whenever $\sqrtα - \sqrtβ > \sqrt{1}$, as long as $k=o(\log n)$. This threshold is known to be the information theoretically optimal. We also study the case when $k=θ(\log(n))$. In this case however we achieve recovery guarantees that no longer match the optimal condition $\sqrtα - \sqrtβ > \sqrt{1}$, thus leaving achieving optimality for this range an open question.

preprint2015arXiv

Non-unique games over compact groups and orientation estimation in cryo-EM

Let $\mathcal{G}$ be a compact group and let $f_{ij} \in L^2(\mathcal{G})$. We define the Non-Unique Games (NUG) problem as finding $g_1,\dots,g_n \in \mathcal{G}$ to minimize $\sum_{i,j=1}^n f_{ij} \left( g_i g_j^{-1}\right)$. We devise a relaxation of the NUG problem to a semidefinite program (SDP) by taking the Fourier transform of $f_{ij}$ over $\mathcal{G}$, which can then be solved efficiently. The NUG framework can be seen as a generalization of the little Grothendieck problem over the orthogonal group and the Unique Games problem and includes many practically relevant problems, such as the maximum likelihood estimator} to registering bandlimited functions over the unit sphere in $d$-dimensions and orientation estimation in cryo-Electron Microscopy.

preprint2015arXiv

Random Laplacian matrices and convex relaxations

The largest eigenvalue of a matrix is always larger or equal than its largest diagonal entry. We show that for a large class of random Laplacian matrices, this bound is essentially tight: the largest eigenvalue is, up to lower order terms, often the size of the largest diagonal entry. Besides being a simple tool to obtain precise estimates on the largest eigenvalue of a large class of random Laplacian matrices, our main result settles a number of open problems related to the tightness of certain convex relaxation-based algorithms. It easily implies the optimality of the semidefinite relaxation approaches to problems such as $\mathbb{Z}_2$ Synchronization and Stochastic Block Model recovery. Interestingly, this result readily implies the connectivity threshold for Erdős-Rényi graphs and suggests that these three phenomena are manifestations of the same underlying principle. The main tool is a recent estimate on the spectral norm of matrices with independent entries by van Handel and the author.

preprint2015arXiv

Relax, no need to round: integrality of clustering formulations

We study exact recovery conditions for convex relaxations of point cloud clustering problems, focusing on two of the most common optimization problems for unsupervised clustering: $k$-means and $k$-median clustering. Motivations for focusing on convex relaxations are: (a) they come with a certificate of optimality, and (b) they are generic tools which are relatively parameter-free, not tailored to specific assumptions over the input. More precisely, we consider the distributional setting where there are $k$ clusters in $\mathbb{R}^m$ and data from each cluster consists of $n$ points sampled from a symmetric distribution within a ball of unit radius. We ask: what is the minimal separation distance between cluster centers needed for convex relaxations to exactly recover these $k$ clusters as the optimal integral solution? For the $k$-median linear programming relaxation we show a tight bound: exact recovery is obtained given arbitrarily small pairwise separation $ε> 0$ between the balls. In other words, the pairwise center separation is $Δ> 2+ε$. Under the same distributional model, the $k$-means LP relaxation fails to recover such clusters at separation as large as $Δ= 4$. Yet, if we enforce PSD constraints on the $k$-means LP, we get exact cluster recovery at center separation $Δ> 2\sqrt2(1+\sqrt{1/m})$. In contrast, common heuristics such as Lloyd's algorithm (a.k.a. the $k$-means algorithm) can fail to recover clusters in this setting; even with arbitrarily large cluster separation, k-means++ with overseeding by any constant factor fails with high probability at exact cluster recovery. To complement the theoretical analysis, we provide an experimental study of the recovery guarantees for these various methods, and discuss several open problems which these experiments suggest.

preprint2014arXiv

A conditional construction of restricted isometries

We study the restricted isometry property of a matrix that is built from the discrete Fourier transform matrix by collecting rows indexed by quadratic residues. We find an $ε>0$ such that, conditioned on a folklore conjecture in number theory, this matrix satisfies the restricted isometry property with sparsity parameter $K=Ω(M^{1/2+ε})$, where $M$ is the number of rows.

preprint2014arXiv

Compressive classification and the rare eclipse problem

This paper addresses the fundamental question of when convex sets remain disjoint after random projection. We provide an analysis using ideas from high-dimensional convex geometry. For ellipsoids, we provide a bound in terms of the distance between these ellipsoids and simple functions of their polynomial coefficients. As an application, this theorem provides bounds for compressive classification of convex sets. Rather than assuming that the data to be classified is sparse, our results show that the data can be acquired via very few measurements yet will remain linearly separable. We demonstrate the feasibility of this approach in the context of hyperspectral imaging.

preprint2014arXiv

Decoding binary node labels from censored edge measurements: Phase transition and efficient recovery

We consider the problem of clustering a graph $G$ into two communities by observing a subset of the vertex correlations. Specifically, we consider the inverse problem with observed variables $Y=B_G x \oplus Z$, where $B_G$ is the incidence matrix of a graph $G$, $x$ is the vector of unknown vertex variables (with a uniform prior) and $Z$ is a noise vector with Bernoulli$(\varepsilon)$ i.i.d. entries. All variables and operations are Boolean. This model is motivated by coding, synchronization, and community detection problems. In particular, it corresponds to a stochastic block model or a correlation clustering problem with two communities and censored edges. Without noise, exact recovery (up to global flip) of $x$ is possible if and only the graph $G$ is connected, with a sharp threshold at the edge probability $\log(n)/n$ for Erdős-Rényi random graphs. The first goal of this paper is to determine how the edge probability $p$ needs to scale to allow exact recovery in the presence of noise. Defining the degree (oversampling) rate of the graph by $α=np/\log(n)$, it is shown that exact recovery is possible if and only if $α>2/(1-2\varepsilon)^2+ o(1/(1-2\varepsilon)^2)$. In other words, $2/(1-2\varepsilon)^2$ is the information theoretic threshold for exact recovery at low-SNR. In addition, an efficient recovery algorithm based on semidefinite programming is proposed and shown to succeed in the threshold regime up to twice the optimal rate. For a deterministic graph $G$, defining the degree rate as $α=d/\log(n)$, where $d$ is the minimum degree of the graph, it is shown that the proposed method achieves the rate $α> 4((1+λ)/(1-λ)^2)/(1-2\varepsilon)^2+ o(1/(1-2\varepsilon)^2)$, where $1-λ$ is the spectral gap of the graph $G$.

preprint2014arXiv

Derandomizing restricted isometries via the Legendre symbol

The restricted isometry property (RIP) is an important matrix condition in compressed sensing, but the best matrix constructions to date use randomness. This paper leverages pseudorandom properties of the Legendre symbol to reduce the number of random bits in an RIP matrix with Bernoulli entries. In this regard, the Legendre symbol is not special---our main result naturally generalizes to any small-bias sample space. We also conjecture that no random bits are necessary for our Legendre symbol--based construction.

preprint2014arXiv

Exact Recovery in the Stochastic Block Model

The stochastic block model (SBM) with two communities, or equivalently the planted bisection model, is a popular model of random graph exhibiting a cluster behaviour. In the symmetric case, the graph has two equally sized clusters and vertices connect with probability $p$ within clusters and $q$ across clusters. In the past two decades, a large body of literature in statistics and computer science has focused on providing lower-bounds on the scaling of $|p-q|$ to ensure exact recovery. In this paper, we identify a sharp threshold phenomenon for exact recovery: if $α=pn/\log(n)$ and $β=qn/\log(n)$ are constant (with $α>β$), recovering the communities with high probability is possible if $\frac{α+β}{2} - \sqrt{αβ}>1$ and impossible if $\frac{α+β}{2} - \sqrt{αβ}<1$. In particular, this improves the existing bounds. This also sets a new line of sight for efficient clustering algorithms. While maximum likelihood (ML) achieves the optimal threshold (by definition), it is in the worst-case NP-hard. This paper proposes an efficient algorithm based on a semidefinite programming relaxation of ML, which is proved to succeed in recovering the communities close to the threshold, while numerical experiments suggest it may achieve the threshold. An efficient algorithm which succeeds all the way down to the threshold is also obtained using a partial recovery algorithm combined with a local improvement procedure.

preprint2014arXiv

Open problem: Tightness of maximum likelihood semidefinite relaxations

We have observed an interesting, yet unexplained, phenomenon: Semidefinite programming (SDP) based relaxations of maximum likelihood estimators (MLE) tend to be tight in recovery problems with noisy data, even when MLE cannot exactly recover the ground truth. Several results establish tightness of SDP based relaxations in the regime where exact recovery from MLE is possible. However, to the best of our knowledge, their tightness is not understood beyond this regime. As an illustrative example, we focus on the generalized Procrustes problem.

preprint2013arXiv

A Cheeger Inequality for the Graph Connection Laplacian

The O(d) Synchronization problem consists of estimating a set of unknown orthogonal transformations O_i from noisy measurements of a subset of the pairwise ratios O_iO_j^{-1}. We formulate and prove a Cheeger-type inequality that relates a measure of how well it is possible to solve the O(d) synchronization problem with the spectra of an operator, the graph Connection Laplacian. We also show how this inequality provides a worst case performance guarantee for a spectral method to solve this problem.

preprint2013arXiv

Computation of sparse low degree interpolating polynomials and their application to derivative-free optimization

Interpolation-based trust-region methods are an important class of algorithms for Derivative-Free Optimization which rely on locally approximating an objective function by quadratic polynomial interpolation models, frequently built from less points than there are basis components. Often, in practical applications, the contribution of the problem variables to the objective function is such that many pairwise correlations between variables are negligible, implying, in the smooth case, a sparse structure in the Hessian matrix. To be able to exploit Hessian sparsity, existing optimization approaches require the knowledge of the sparsity structure. The goal of this paper is to develop and analyze a method where the sparse models are constructed automatically. The sparse recovery theory developed recently in the field of compressed sensing characterizes conditions under which a sparse vector can be accurately recovered from few random measurements. Such a recovery is achieved by minimizing the l1-norm of a vector subject to the measurements constraints. We suggest an approach for building sparse quadratic polynomial interpolation models by minimizing the l1-norm of the entries of the model Hessian subject to the interpolation conditions. We show that this procedure recovers accurate models when the function Hessian is sparse, using relatively few randomly selected sample points. Motivated by this result, we developed a practical interpolation-based trust-region method using deterministic sample sets and minimum l1-norm quadratic models. Our computational results show that the new approach exhibits a promising numerical performance both in the general case and in the sparse one.

preprint2013arXiv

Convergence of trust-region methods based on probabilistic models

In this paper we consider the use of probabilistic or random models within a classical trust-region framework for optimization of deterministic smooth general nonlinear functions. Our method and setting differs from many stochastic optimization approaches in two principal ways. Firstly, we assume that the value of the function itself can be computed without noise, in other words, that the function is deterministic. Secondly, we use random models of higher quality than those produced by usual stochastic gradient methods. In particular, a first order model based on random approximation of the gradient is required to provide sufficient quality of approximation with probability greater than or equal to 1/2. This is in contrast with stochastic gradient approaches, where the model is assumed to be "correct" only in expectation. As a result of this particular setting, we are able to prove convergence, with probability one, of a trust-region method which is almost identical to the classical method. Hence we show that a standard optimization framework can be used in cases when models are random and may or may not provide good approximations, as long as "good" models are more likely than "bad" models. Our results are based on the use of properties of martingales. Our motivation comes from using random sample sets and interpolation models in derivative-free optimization. However, our framework is general and can be applied with any source of uncertainty in the model. We discuss various applications for our methods in the paper.

preprint2013arXiv

Multireference Alignment using Semidefinite Programming

The multireference alignment problem consists of estimating a signal from multiple noisy shifted observations. Inspired by existing Unique-Games approximation algorithms, we provide a semidefinite program (SDP) based relaxation which approximates the maximum likelihood estimator (MLE) for the multireference alignment problem. Although we show that the MLE problem is Unique-Games hard to approximate within any constant, we observe that our poly-time approximation algorithm for the MLE appears to perform quite well in typical instances, outperforming existing methods. In an attempt to explain this behavior we provide stability guarantees for our SDP under a random noise model on the observations. This case is more challenging to analyze than traditional semi-random instances of Unique-Games: the noise model is on vertices of a graph and translates into dependent noise on the edges. Interestingly, we show that if certain positivity constraints in the SDP are dropped, its solution becomes equivalent to performing phase correlation, a popular method used for pairwise alignment in imaging applications. Finally, we show how symmetry reduction techniques from matrix representation theory can simplify the analysis and computation of the SDP, greatly decreasing its computational cost.

preprint2013arXiv

Near-optimal phase retrieval of sparse vectors

In many areas of imaging science, it is difficult to measure the phase of linear measurements. As such, one often wishes to reconstruct a signal from intensity measurements, that is, perform phase retrieval. In several applications the signal in question is believed to be sparse. In this paper, we use ideas from the recently developed polarization method for phase retrieval and provide an algorithm that is guaranteed to recover a sparse signal from a number of phaseless linear measurements that scales linearly with the sparsity of the signal (up to logarithmic factors). This is particularly remarkable since it is known that a certain popular class of convex methods is not able to perform recovery unless the number of measurements scales with the square of the sparsity of the signal. This is a shorter version of a more complete publication that will appear elsewhere.

preprint2013arXiv

On partial sparse recovery

We consider the problem of recovering a partially sparse solution of an underdetermined system of linear equations by minimizing the $\ell_1$-norm of the part of the solution vector which is known to be sparse. Such a problem is closely related to a classical problem in Compressed Sensing where the $\ell_1$-norm of the whole solution vector is minimized. We introduce analogues of restricted isometry and null space properties for the recovery of partially sparse vectors and show that these new properties are implied by their original counterparts. We show also how to extend recovery under noisy measurements to the partially sparse case.

preprint2013arXiv

Phase retrieval from power spectra of masked signals

In diffraction imaging, one is tasked with reconstructing a signal from its power spectrum. To resolve the ambiguity in this inverse problem, one might invoke prior knowledge about the signal, but phase retrieval algorithms in this vein have found limited success. One alternative is to create redundancy in the measurement process by illuminating the signal multiple times, distorting the signal each time with a different mask. Despite several recent advances in phase retrieval, the community has yet to construct an ensemble of masks which uniquely determines all signals and admits an efficient reconstruction algorithm. In this paper, we leverage the recently proposed polarization method to construct such an ensemble. We also present numerical simulations to illustrate the stability of the polarization method in this setting. In comparison to a state-of-the-art phase retrieval algorithm known as PhaseLift, we find that polarization is much faster with comparable stability.

preprint2013arXiv

Phase retrieval with polarization

In many areas of imaging science, it is difficult to measure the phase of linear measurements. As such, one often wishes to reconstruct a signal from intensity measurements, that is, perform phase retrieval. In this paper, we provide a novel measurement design which is inspired by interferometry and exploits certain properties of expander graphs. We also give an efficient phase retrieval procedure, and use recent results in spectral graph theory to produce a stable performance guarantee which rivals the guarantee for PhaseLift in [Candes et al. 2011]. We use numerical simulations to illustrate the performance of our phase retrieval procedure, and we compare reconstruction error and runtime with a common alternating-projections-type procedure.

preprint2013arXiv

Saving phase: Injectivity and stability for phase retrieval

Recent advances in convex optimization have led to new strides in the phase retrieval problem over finite-dimensional vector spaces. However, certain fundamental questions remain: What sorts of measurement vectors uniquely determine every signal up to a global phase factor, and how many are needed to do so? Furthermore, which measurement ensembles lend stability? This paper presents several results that address each of these questions. We begin by characterizing injectivity, and we identify that the complement property is indeed a necessary condition in the complex case. We then pose a conjecture that 4M-4 generic measurement vectors are both necessary and sufficient for injectivity in M dimensions, and we prove this conjecture in the special cases where M=2,3. Next, we shift our attention to stability, both in the worst and average cases. Here, we characterize worst-case stability in the real case by introducing a numerical version of the complement property. This new property bears some resemblance to the restricted isometry property of compressed sensing and can be used to derive a sharp lower Lipschitz bound on the intensity measurement mapping. Localized frames are shown to lack this property (suggesting instability), whereas Gaussian random measurements are shown to satisfy this property with high probability. We conclude by presenting results that use a stochastic noise model in both the real and complex cases, and we leverage Cramer-Rao lower bounds to identify stability with stronger versions of the injectivity characterizations.

preprint2012arXiv

The road to deterministic matrices with the restricted isometry property

The restricted isometry property (RIP) is a well-known matrix condition that provides state-of-the-art reconstruction guarantees for compressed sensing. While random matrices are known to satisfy this property with high probability, deterministic constructions have found less success. In this paper, we consider various techniques for demonstrating RIP deterministically, some popular and some novel, and we evaluate their performance. In evaluating some techniques, we apply random matrix theory and inadvertently find a simple alternative proof that certain random matrices are RIP. Later, we propose a particular class of matrices as candidates for being RIP, namely, equiangular tight frames (ETFs). Using the known correspondence between real ETFs and strongly regular graphs, we investigate certain combinatorial implications of a real ETF being RIP. Specifically, we give probabilistic intuition for a new bound on the clique number of Paley graphs of prime order, and we conjecture that the corresponding ETFs are RIP in a manner similar to random matrices.

preprint2011arXiv

Landau's necessary density conditions for the Hankel transform

We will prove an analogue of Landau's necessary conditions [Necessary density conditions for sampling and interpolation of certain entire functions, Acta Math. 117 (1967).] for spaces of functions whose Hankel transform is supported in a measurable subset S of the positive semi-axis. As a special case, necessary density conditions for the existence of Fourier-Bessel frames are obtained.

Afonso S. Bandeira

What is connected

Connect this record

See the researcher in context

Building this map preview

35 published item(s)

A remark on Kashin's discrepancy argument and partial coloring in the Komlós conjecture

Community Detection with a Subsampled Semidefinite Program

Dual bounds for the positive definite functions approach to mutually unbiased bases

Subexponential-Time Algorithms for Sparse PCA

Spectral Planting and the Hardness of Refuting Cuts, Colorability, and Communities in Random Graphs

Spurious Valleys in Two-layer Neural Network Optimization Landscapes

A polynomial-time relaxation of the Gromov-Hausdorff distance

On the low-rank approach for semidefinite programs arising in synchronization and community detection

Optimality and Sub-optimality of PCA for Spiked Random Matrices and Synchronization

Sharp nonasymptotic bounds on the norm of random matrices with independent entries

Tightness of the maximum likelihood semidefinite relaxation for angular synchronization

A note on Probably Certifiably Correct algorithms

Approximating the Little Grothendieck Problem over the Orthogonal and Unitary Groups

Linear Boolean classification, coding and "the critical problem"

Multisection in the Stochastic Block Model using Semidefinite Programming

Non-unique games over compact groups and orientation estimation in cryo-EM

Random Laplacian matrices and convex relaxations

Relax, no need to round: integrality of clustering formulations

A conditional construction of restricted isometries

Compressive classification and the rare eclipse problem

Decoding binary node labels from censored edge measurements: Phase transition and efficient recovery

Derandomizing restricted isometries via the Legendre symbol

Exact Recovery in the Stochastic Block Model

Open problem: Tightness of maximum likelihood semidefinite relaxations

A Cheeger Inequality for the Graph Connection Laplacian

Computation of sparse low degree interpolating polynomials and their application to derivative-free optimization

Convergence of trust-region methods based on probabilistic models

Multireference Alignment using Semidefinite Programming

Near-optimal phase retrieval of sparse vectors

On partial sparse recovery

Phase retrieval from power spectra of masked signals

Phase retrieval with polarization

Saving phase: Injectivity and stability for phase retrieval

The road to deterministic matrices with the restricted isometry property

Landau's necessary density conditions for the Hankel transform