Source author record

Armin Eftekhari

Armin Eftekhari appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT math.OC Machine Learning math.NA math.ST Statistics Theory Computational Complexity Computer Vision eess.IV math.DS math.PR Numerical Analysis

Catalog footprint

What is connected

17works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

An Inexact Augmented Lagrangian Framework for Nonconvex Optimization with Nonlinear Constraints

We propose a practical inexact augmented Lagrangian method (iALM) for nonconvex problems with nonlinear constraints. We characterize the total computational complexity of our method subject to a verifiable geometric condition, which is closely related to the Polyak-Lojasiewicz and Mangasarian-Fromowitz conditions. In particular, when a first-order solver is used for the inner iterates, we prove that iALM finds a first-order stationary point with $\tilde{\mathcal{O}}(1/ε^4)$ calls to the first-order oracle. If, in addition, the problem is smooth and a second-order solver is used for the inner iterates, iALM finds a second-order stationary point with $\tilde{\mathcal{O}}(1/ε^5)$ calls to the second-order oracle, which matches the known theoretical complexity result in the literature. We also provide strong numerical evidence on large-scale machine learning problems, including the Burer-Monteiro factorization of semidefinite programs, and a novel nonconvex relaxation of the standard basis pursuit template. For these examples, we also show how to verify our geometric condition.

preprint2022arXiv

The Forward-Backward Envelope for Sampling with the Overdamped Langevin Algorithm

In this paper, we analyse a proximal method based on the idea of forward-backward splitting for sampling from distributions with densities that are not necessarily smooth. In particular, we study the non-asymptotic properties of the Euler-Maruyama discretization of the Langevin equation, where the forward-backward envelope is used to deal with the non-smooth part of the dynamics. An advantage of this envelope, when compared to widely-used Moreu-Yoshida one and the MYULA algorithm, is that it maintains the MAP estimator of the original non-smooth distribution. We also study a number of numerical experiments that corroborate that support our theoretical findings.

preprint2022arXiv

The Nonconvex Geometry of Linear Inverse Problems

The gauge function, closely related to the atomic norm, measures the complexity of a statistical model, and has found broad applications in machine learning and statistical signal processing. In a high-dimensional learning problem, the gauge function attempts to safeguard against overfitting by promoting a sparse (concise) representation within the learning alphabet. In this work, within the context of linear inverse problems, we pinpoint the source of its success, but also argue that the applicability of the gauge function is inherently limited by its convexity, and showcase several learning problems where the classical gauge function theory fails. We then introduce a new notion of statistical complexity, gauge$_p$ function, which overcomes the limitations of the gauge function. The gauge$_p$ function is a simple generalization of the gauge function that can tightly control the sparsity of a statistical model within the learning alphabet and, perhaps surprisingly, draws further inspiration from the Burer-Monteiro factorization in computational mathematics. We also propose a new learning machine, with the building block of gauge$_p$ function, and arm this machine with a number of statistical guarantees. The potential of the proposed gauge$_p$ function theory is then studied for two stylized applications. Finally, we discuss the computational aspects and, in particular, suggest a tractable numerical algorithm for implementing the new learning machine.

preprint2021arXiv

Over-Parametrized Matrix Factorization in the Presence of Spurious Stationary Points

Motivated by the emerging role of interpolating machines in signal processing and machine learning, this work considers the computational aspects of over-parametrized matrix factorization. In this context, the optimization landscape may contain spurious stationary points (SSPs), which are proved to be full-rank matrices. The presence of these SSPs means that it is impossible to hope for any global guarantees in over-parametrized matrix factorization. For example, when initialized at an SSP, the gradient flow will be trapped there forever. Nevertheless, despite these SSPs, we establish in this work that the gradient flow of the corresponding merit function converges to a global minimizer, provided that its initialization is rank-deficient and sufficiently close to the feasible set of the optimization problem. We numerically observe that a heuristic discretization of the proposed gradient flow, inspired by primal-dual algorithms, is successful when initialized randomly. Our result is in sharp contrast with the local refinement methods which require an initialization close to the optimal set of the optimization problem. More specifically, we successfully avoid the traps set by the SSPs because the gradient flow remains rank-deficient at all times, and not because there are no SSPs nearby. The latter is the case for the local refinement methods. Moreover, the widely-used restricted isometry property plays no role in our main result.

preprint2020arXiv

Double-Loop Unadjusted Langevin Algorithm

A well-known first-order method for sampling from log-concave probability distributions is the Unadjusted Langevin Algorithm (ULA). This work proposes a new annealing step-size schedule for ULA, which allows to prove new convergence guarantees for sampling from a smooth log-concave distribution, which are not covered by existing state-of-the-art convergence guarantees. To establish this result, we derive a new theoretical bound that relates the Wasserstein distance to total variation distance between any two log-concave distributions that complements the reach of Talagrand T2 inequality. Moreover, applying this new step size schedule to an existing constrained sampling algorithm, we show state-of-the-art convergence rates for sampling from a constrained log-concave distribution, as well as improved dimension dependence.

preprint2020arXiv

Explicit Stabilised Gradient Descent for Faster Strongly Convex Optimisation

This paper introduces the Runge-Kutta Chebyshev descent method (RKCD) for strongly convex optimisation problems. This new algorithm is based on explicit stabilised integrators for stiff differential equations, a powerful class of numerical schemes that avoid the severe step size restriction faced by standard explicit integrators. For optimising quadratic and strongly convex functions, this paper proves that RKCD nearly achieves the optimal convergence rate of the conjugate gradient algorithm, and the suboptimality of RKCD diminishes as the condition number of the quadratic function worsens. It is established that this optimal rate is obtained also for a partitioned variant of RKCD applied to perturbations of quadratic functions. In addition, numerical experiments on general strongly convex problems show that RKCD outperforms Nesterov's accelerated gradient descent.

preprint2020arXiv

MOSES: A Streaming Algorithm for Linear Dimensionality Reduction

This paper introduces Memory-limited Online Subspace Estimation Scheme (MOSES) for both estimating the principal components of streaming data and reducing its dimension. More specifically, in various applications such as sensor networks, the data vectors are presented sequentially to a user who has limited storage and processing time available. Applied to such problems, MOSES can provide a running estimate of leading principal components of the data that has arrived so far and also reduce its dimension. MOSES generalises the popular incremental Singular Vale Decomposition (iSVD) to handle thin blocks of data, rather than just vectors. This minor generalisation in part allows us to complement MOSES with a comprehensive statistical analysis, thus providing the first theoretically-sound variant of iSVD, which has been lacking despite the empirical success of this method. This generalisation also enables us to concretely interpret MOSES as an approximate solver for the underlying non-convex optimisation program. We find that MOSES consistently surpasses the state of the art in our numerical experiments with both synthetic and real-world datasets, while being computationally inexpensive.

preprint2020arXiv

Scalable Learning-Based Sampling Optimization for Compressive Dynamic MRI

Compressed sensing applied to magnetic resonance imaging (MRI) allows to reduce the scanning time by enabling images to be reconstructed from highly undersampled data. In this paper, we tackle the problem of designing a sampling mask for an arbitrary reconstruction method and a limited acquisition budget. Namely, we look for an optimal probability distribution from which a mask with a fixed cardinality is drawn. We demonstrate that this problem admits a compactly supported solution, which leads to a deterministic optimal sampling mask. We then propose a stochastic greedy algorithm that (i) provides an approximate solution to this problem, and (ii) resolves the scaling issues of [1,2]. We validate its performance on in vivo dynamic MRI with retrospective undersampling, showing that our method preserves the performance of [1,2] while reducing the computational burden by a factor close to 200.

preprint2020arXiv

Stable Super-Resolution of Images: A Theoretical Study

We study the ubiquitous super-resolution problem, in which one aims at localizing positive point sources in an image, blurred by the point spread function of the imaging device. To recover the point sources, we propose to solve a convex feasibility program, which simply finds a nonnegative Borel measure that agrees with the observations collected by the imaging device. In the absence of imaging noise, we show that solving this convex program uniquely retrieves the point sources, provided that the imaging device collects enough observations. This result holds true if the point spread function of the imaging device can be decomposed into horizontal and vertical components, and if the translations of these components form a Chebyshev system, i.e., a system of continuous functions that loosely behave like algebraic polynomials. Building upon recent results for one-dimensional signals [1], we prove that this super-resolution algorithm is stable, in the generalized Wasserstein metric, to model mismatch (i.e., when the image is not sparse) and to additive imaging noise. In particular, the recovery error depends on the noise level and how well the image can be approximated with well-separated point sources. As an example, we verify these claims for the important case of a Gaussian point spread function. The proofs rely on the construction of novel interpolating polynomials---which are the main technical contribution of this paper---and partially resolve the question raised in [2] about the extension of the standard machinery to higher dimensions.

preprint2020arXiv

Training Linear Neural Networks: Non-Local Convergence and Complexity Results

Linear networks provide valuable insights into the workings of neural networks in general. This paper identifies conditions under which the gradient flow provably trains a linear network, in spite of the non-strict saddle points present in the optimization landscape. This paper also provides the computational complexity of training linear networks with gradient flow. To achieve these results, this work develops a machinery to provably identify the stable set of gradient flow, which then enables us to improve over the state of the art in the literature of linear networks (Bah et al., 2019;Arora et al., 2018a). Crucially, our results appear to be the first to break away from the lazy training regime which has dominated the literature of neural networks. This work requires the network to have a layer with one neuron, which subsumes the networks with a scalar output, but extending the results of this theoretical work to all linear networks remains a challenging open problem.

preprint2016arXiv

What Happens to a Manifold Under a Bi-Lipschitz Map?

We study geometric and topological properties of the image of a smooth submanifold of $\mathbb{R}^{n}$ under a bi-Lipschitz map to $\mathbb{R}^{m}$. In particular, we characterize how the dimension, diameter, volume, and reach of the embedded manifold relate to the original. Our main result establishes a lower bound on the reach of the embedded manifold in the case where $m \le n$ and the bi-Lipschitz map is linear. We discuss implications of this work in signal processing and machine learning, where bi-Lipschitz maps on low-dimensional manifolds have been constructed using randomized linear operators.

preprint2015arXiv

Computing Active Subspaces Efficiently with Gradient Sketching

Active subspaces are an emerging set of tools for identifying and exploiting the most important directions in the space of a computer simulation's input parameters; these directions depend on the simulation's quantity of interest, which we treat as a function from inputs to outputs. To identify a function's active subspace, one must compute the eigenpairs of a matrix derived from the function's gradient, which presents challenges when the gradient is not available as a subroutine. We numerically study two methods for estimating the necessary eigenpairs using only linear measurements of the function's gradient. In practice, these measurements can be estimated by finite differences using only two function evaluations, regardless of the dimension of the function's input space.

preprint2015arXiv

Greed is Super: A Fast Algorithm for Super-Resolution

We present a fast two-phase algorithm for super-resolution with strong theoretical guarantees. Given the low-frequency part of the spectrum of a sequence of impulses, Phase I consists of a greedy algorithm that roughly estimates the impulse positions. These estimates are then refined by local optimization in Phase II. In contrast to the convex relaxation proposed by Candès et al., our approach has a low computational complexity but requires the impulses to be separated by an additional logarithmic factor to succeed. The backbone of our work is the fundamental work of Slepian et al. involving discrete prolate spheroidal wave functions and their unique properties.

preprint2014arXiv

A First Analysis of the Stability of Takens' Embedding

Takens' Embedding Theorem asserts that when the states of a hidden dynamical system are confined to a low-dimensional attractor, complete information about the states can be preserved in the observed time-series output through the delay coordinate map. However, the conditions for the theorem to hold ignore the effects of noise and time-series analysis in practice requires a careful empirical determination of the sampling time and number of delays resulting in a number of delay coordinates larger than the minimum prescribed by Takens' theorem. In this paper, we use tools and ideas in Compressed Sensing to provide a first theoretical justification for the choice of the number of delays in noisy conditions. In particular, we show that under certain conditions on the dynamical system, measurement function, number of delays and sampling time, the delay-coordinate map can be a stable embedding of the dynamical system's attractor.

preprint2014arXiv

New Analysis of Manifold Embeddings and Signal Recovery from Compressive Measurements

Compressive Sensing (CS) exploits the surprising fact that the information contained in a sparse signal can be preserved in a small number of compressive, often random linear measurements of that signal. Strong theoretical guarantees have been established concerning the embedding of a sparse signal family under a random measurement operator and on the accuracy to which sparse signals can be recovered from noisy compressive measurements. In this paper, we address similar questions in the context of a different modeling framework. Instead of sparse models, we focus on the broad class of manifold models, which can arise in both parametric and non-parametric signal families. Using tools from the theory of empirical processes, we improve upon previous results concerning the embedding of low-dimensional manifolds under random measurement operators. We also establish both deterministic and probabilistic instance-optimal bounds in $\ell_2$ for manifold-based signal recovery and parameter estimation from noisy compressive measurements. In line with analogous results for sparsity-based CS, we conclude that much stronger bounds are possible in the probabilistic setting. Our work supports the growing evidence that manifold-based models can be used with high accuracy in compressive signal processing.

preprint2014arXiv

The Restricted Isometry Property for Random Block Diagonal Matrices

In Compressive Sensing, the Restricted Isometry Property (RIP) ensures that robust recovery of sparse vectors is possible from noisy, undersampled measurements via computationally tractable algorithms. It is by now well-known that Gaussian (or, more generally, sub-Gaussian) random matrices satisfy the RIP under certain conditions on the number of measurements. Their use can be limited in practice, however, due to storage limitations, computational considerations, or the mismatch of such matrices with certain measurement architectures. These issues have recently motivated considerable effort towards studying the RIP for structured random matrices. In this paper, we study the RIP for block diagonal measurement matrices where each block on the main diagonal is itself a sub-Gaussian random matrix. Our main result states that such matrices can indeed satisfy the RIP but that the requisite number of measurements depends on certain properties of the basis in which the signals are sparse. In the best case, these matrices perform nearly as well as dense Gaussian random matrices, despite having many fewer nonzero entries.

preprint2012arXiv

Matched Filtering from Limited Frequency Samples

In this paper, we study a simple correlation-based strategy for estimating the unknown delay and amplitude of a signal based on a small number of noisy, randomly chosen frequency-domain samples. We model the output of this "compressive matched filter" as a random process whose mean equals the scaled, shifted autocorrelation function of the template signal. Using tools from the theory of empirical processes, we prove that the expected maximum deviation of this process from its mean decreases sharply as the number of measurements increases, and we also derive a probabilistic tail bound on the maximum deviation. Putting all of this together, we bound the minimum number of measurements required to guarantee that the empirical maximum of this random process occurs sufficiently close to the true peak of its mean function. We conclude that for broad classes of signals, this compressive matched filter will successfully estimate the unknown delay (with high probability, and within a prescribed tolerance) using a number of random frequency-domain samples that scales inversely with the signal-to-noise ratio and only logarithmically in the in the observation bandwidth and the possible range of delays.

Armin Eftekhari

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

An Inexact Augmented Lagrangian Framework for Nonconvex Optimization with Nonlinear Constraints

The Forward-Backward Envelope for Sampling with the Overdamped Langevin Algorithm

The Nonconvex Geometry of Linear Inverse Problems

Over-Parametrized Matrix Factorization in the Presence of Spurious Stationary Points

Double-Loop Unadjusted Langevin Algorithm

Explicit Stabilised Gradient Descent for Faster Strongly Convex Optimisation

MOSES: A Streaming Algorithm for Linear Dimensionality Reduction

Scalable Learning-Based Sampling Optimization for Compressive Dynamic MRI

Stable Super-Resolution of Images: A Theoretical Study

Training Linear Neural Networks: Non-Local Convergence and Complexity Results

What Happens to a Manifold Under a Bi-Lipschitz Map?

Computing Active Subspaces Efficiently with Gradient Sketching

Greed is Super: A Fast Algorithm for Super-Resolution

A First Analysis of the Stability of Takens' Embedding

New Analysis of Manifold Embeddings and Signal Recovery from Compressive Measurements

The Restricted Isometry Property for Random Block Diagonal Matrices

Matched Filtering from Limited Frequency Samples