Researcher profile

Armin Eftekhari

Armin Eftekhari contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2022arXiv

An Inexact Augmented Lagrangian Framework for Nonconvex Optimization with Nonlinear Constraints

We propose a practical inexact augmented Lagrangian method (iALM) for nonconvex problems with nonlinear constraints. We characterize the total computational complexity of our method subject to a verifiable geometric condition, which is closely related to the Polyak-Lojasiewicz and Mangasarian-Fromowitz conditions. In particular, when a first-order solver is used for the inner iterates, we prove that iALM finds a first-order stationary point with $\tilde{\mathcal{O}}(1/ε^4)$ calls to the first-order oracle. If, in addition, the problem is smooth and a second-order solver is used for the inner iterates, iALM finds a second-order stationary point with $\tilde{\mathcal{O}}(1/ε^5)$ calls to the second-order oracle, which matches the known theoretical complexity result in the literature. We also provide strong numerical evidence on large-scale machine learning problems, including the Burer-Monteiro factorization of semidefinite programs, and a novel nonconvex relaxation of the standard basis pursuit template. For these examples, we also show how to verify our geometric condition.

preprint2022arXiv

The Forward-Backward Envelope for Sampling with the Overdamped Langevin Algorithm

In this paper, we analyse a proximal method based on the idea of forward-backward splitting for sampling from distributions with densities that are not necessarily smooth. In particular, we study the non-asymptotic properties of the Euler-Maruyama discretization of the Langevin equation, where the forward-backward envelope is used to deal with the non-smooth part of the dynamics. An advantage of this envelope, when compared to widely-used Moreu-Yoshida one and the MYULA algorithm, is that it maintains the MAP estimator of the original non-smooth distribution. We also study a number of numerical experiments that corroborate that support our theoretical findings.

preprint2022arXiv

The Nonconvex Geometry of Linear Inverse Problems

The gauge function, closely related to the atomic norm, measures the complexity of a statistical model, and has found broad applications in machine learning and statistical signal processing. In a high-dimensional learning problem, the gauge function attempts to safeguard against overfitting by promoting a sparse (concise) representation within the learning alphabet. In this work, within the context of linear inverse problems, we pinpoint the source of its success, but also argue that the applicability of the gauge function is inherently limited by its convexity, and showcase several learning problems where the classical gauge function theory fails. We then introduce a new notion of statistical complexity, gauge$_p$ function, which overcomes the limitations of the gauge function. The gauge$_p$ function is a simple generalization of the gauge function that can tightly control the sparsity of a statistical model within the learning alphabet and, perhaps surprisingly, draws further inspiration from the Burer-Monteiro factorization in computational mathematics. We also propose a new learning machine, with the building block of gauge$_p$ function, and arm this machine with a number of statistical guarantees. The potential of the proposed gauge$_p$ function theory is then studied for two stylized applications. Finally, we discuss the computational aspects and, in particular, suggest a tractable numerical algorithm for implementing the new learning machine.

preprint2021arXiv

Over-Parametrized Matrix Factorization in the Presence of Spurious Stationary Points

Motivated by the emerging role of interpolating machines in signal processing and machine learning, this work considers the computational aspects of over-parametrized matrix factorization. In this context, the optimization landscape may contain spurious stationary points (SSPs), which are proved to be full-rank matrices. The presence of these SSPs means that it is impossible to hope for any global guarantees in over-parametrized matrix factorization. For example, when initialized at an SSP, the gradient flow will be trapped there forever. Nevertheless, despite these SSPs, we establish in this work that the gradient flow of the corresponding merit function converges to a global minimizer, provided that its initialization is rank-deficient and sufficiently close to the feasible set of the optimization problem. We numerically observe that a heuristic discretization of the proposed gradient flow, inspired by primal-dual algorithms, is successful when initialized randomly. Our result is in sharp contrast with the local refinement methods which require an initialization close to the optimal set of the optimization problem. More specifically, we successfully avoid the traps set by the SSPs because the gradient flow remains rank-deficient at all times, and not because there are no SSPs nearby. The latter is the case for the local refinement methods. Moreover, the widely-used restricted isometry property plays no role in our main result.

preprint2020arXiv

Double-Loop Unadjusted Langevin Algorithm

A well-known first-order method for sampling from log-concave probability distributions is the Unadjusted Langevin Algorithm (ULA). This work proposes a new annealing step-size schedule for ULA, which allows to prove new convergence guarantees for sampling from a smooth log-concave distribution, which are not covered by existing state-of-the-art convergence guarantees. To establish this result, we derive a new theoretical bound that relates the Wasserstein distance to total variation distance between any two log-concave distributions that complements the reach of Talagrand T2 inequality. Moreover, applying this new step size schedule to an existing constrained sampling algorithm, we show state-of-the-art convergence rates for sampling from a constrained log-concave distribution, as well as improved dimension dependence.

preprint2020arXiv

Explicit Stabilised Gradient Descent for Faster Strongly Convex Optimisation

This paper introduces the Runge-Kutta Chebyshev descent method (RKCD) for strongly convex optimisation problems. This new algorithm is based on explicit stabilised integrators for stiff differential equations, a powerful class of numerical schemes that avoid the severe step size restriction faced by standard explicit integrators. For optimising quadratic and strongly convex functions, this paper proves that RKCD nearly achieves the optimal convergence rate of the conjugate gradient algorithm, and the suboptimality of RKCD diminishes as the condition number of the quadratic function worsens. It is established that this optimal rate is obtained also for a partitioned variant of RKCD applied to perturbations of quadratic functions. In addition, numerical experiments on general strongly convex problems show that RKCD outperforms Nesterov's accelerated gradient descent.

preprint2020arXiv

MOSES: A Streaming Algorithm for Linear Dimensionality Reduction

This paper introduces Memory-limited Online Subspace Estimation Scheme (MOSES) for both estimating the principal components of streaming data and reducing its dimension. More specifically, in various applications such as sensor networks, the data vectors are presented sequentially to a user who has limited storage and processing time available. Applied to such problems, MOSES can provide a running estimate of leading principal components of the data that has arrived so far and also reduce its dimension. MOSES generalises the popular incremental Singular Vale Decomposition (iSVD) to handle thin blocks of data, rather than just vectors. This minor generalisation in part allows us to complement MOSES with a comprehensive statistical analysis, thus providing the first theoretically-sound variant of iSVD, which has been lacking despite the empirical success of this method. This generalisation also enables us to concretely interpret MOSES as an approximate solver for the underlying non-convex optimisation program. We find that MOSES consistently surpasses the state of the art in our numerical experiments with both synthetic and real-world datasets, while being computationally inexpensive.

preprint2020arXiv

Scalable Learning-Based Sampling Optimization for Compressive Dynamic MRI

Compressed sensing applied to magnetic resonance imaging (MRI) allows to reduce the scanning time by enabling images to be reconstructed from highly undersampled data. In this paper, we tackle the problem of designing a sampling mask for an arbitrary reconstruction method and a limited acquisition budget. Namely, we look for an optimal probability distribution from which a mask with a fixed cardinality is drawn. We demonstrate that this problem admits a compactly supported solution, which leads to a deterministic optimal sampling mask. We then propose a stochastic greedy algorithm that (i) provides an approximate solution to this problem, and (ii) resolves the scaling issues of [1,2]. We validate its performance on in vivo dynamic MRI with retrospective undersampling, showing that our method preserves the performance of [1,2] while reducing the computational burden by a factor close to 200.

preprint2020arXiv

Stable Super-Resolution of Images: A Theoretical Study

We study the ubiquitous super-resolution problem, in which one aims at localizing positive point sources in an image, blurred by the point spread function of the imaging device. To recover the point sources, we propose to solve a convex feasibility program, which simply finds a nonnegative Borel measure that agrees with the observations collected by the imaging device. In the absence of imaging noise, we show that solving this convex program uniquely retrieves the point sources, provided that the imaging device collects enough observations. This result holds true if the point spread function of the imaging device can be decomposed into horizontal and vertical components, and if the translations of these components form a Chebyshev system, i.e., a system of continuous functions that loosely behave like algebraic polynomials. Building upon recent results for one-dimensional signals [1], we prove that this super-resolution algorithm is stable, in the generalized Wasserstein metric, to model mismatch (i.e., when the image is not sparse) and to additive imaging noise. In particular, the recovery error depends on the noise level and how well the image can be approximated with well-separated point sources. As an example, we verify these claims for the important case of a Gaussian point spread function. The proofs rely on the construction of novel interpolating polynomials---which are the main technical contribution of this paper---and partially resolve the question raised in [2] about the extension of the standard machinery to higher dimensions.

preprint2020arXiv

Training Linear Neural Networks: Non-Local Convergence and Complexity Results

Linear networks provide valuable insights into the workings of neural networks in general. This paper identifies conditions under which the gradient flow provably trains a linear network, in spite of the non-strict saddle points present in the optimization landscape. This paper also provides the computational complexity of training linear networks with gradient flow. To achieve these results, this work develops a machinery to provably identify the stable set of gradient flow, which then enables us to improve over the state of the art in the literature of linear networks (Bah et al., 2019;Arora et al., 2018a). Crucially, our results appear to be the first to break away from the lazy training regime which has dominated the literature of neural networks. This work requires the network to have a layer with one neuron, which subsumes the networks with a scalar output, but extending the results of this theoretical work to all linear networks remains a challenging open problem.