Source author record

Arian Maleki

Arian Maleki appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT math.ST Statistics Theory Machine Learning Computation Methodology Computer Vision cond-mat.dis-nn math.OC math.PR Mathematical Software Numerical Analysis

Catalog footprint

What is connected

32works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

High-Dimensional Statistics: Reflections on Progress and Open Problems

Over the past two decades, the field of high-dimensional statistics has experienced substantial progress, driven largely by technological advances that have dramatically reduced the cost and effort for data collection and storage across a broad range of domains, including biology, medicine, astronomy, and the social and environmental sciences. Modern datasets are increasingly complex, often exhibiting rich dependency, heterogeneity, and other features that challenge traditional statistical methods. In response, high-dimensional statistics has evolved to address more sophisticated estimation and inference problems. This evolution has, in turn, fostered deep connections with and contributions to a wide range of research areas, including optimization, concentration of measure, random matrix theory, information theory, and theoretical computer science. Given the rapid pace of recent developments in high-dimensional statistics, our goal is to synthesize representative advances, highlight common themes and open problems, and point to important works that offer entry points into the field.

preprint2023arXiv

Signal-to-noise ratio aware minimaxity and higher-order asymptotics

Since its development, the minimax framework has been one of the corner stones of theoretical statistics, and has contributed to the popularity of many well-known estimators, such as the regularized M-estimators for high-dimensional problems. In this paper, we will first show through the example of sparse Gaussian sequence model, that the theoretical results under the classical minimax framework are insufficient for explaining empirical observations. In particular, both hard and soft thresholding estimators are (asymptotically) minimax, however, in practice they often exhibit sub-optimal performances at various signal-to-noise ratio (SNR) levels. The first contribution of this paper is to demonstrate that this issue can be resolved if the signal-to-noise ratio is taken into account in the construction of the parameter space. We call the resulting minimax framework the signal-to-noise ratio aware minimaxity. The second contribution of this paper is to showcase how one can use higher-order asymptotics to obtain accurate approximations of the SNR-aware minimax risk and discover minimax estimators. The theoretical findings obtained from this refined minimax framework provide new insights and practical guidance for the estimation of sparse signals.

preprint2022arXiv

Sharp Concentration Results for Heavy-Tailed Distributions

We obtain concentration and large deviation for the sums of independent and identically distributed random variables with heavy-tailed distributions. Our concentration results are concerned with random variables whose distributions satisfy $\mathbb{P}(X>t) \leq {\rm e}^{- I(t)}$, where $I: \mathbb{R} \rightarrow \mathbb{R}$ is an increasing function and $I(t)/t \rightarrow α\in [0, \infty)$ as $t \rightarrow \infty$. Our main theorem can not only recover some of the existing results, such as the concentration of the sum of subWeibull random variables, but it can also produce new results for the sum of random variables with heavier tails. We show that the concentration inequalities we obtain are sharp enough to offer large deviation results for the sums of independent random variables as well. Our analyses which are based on standard truncation arguments simplify, unify and generalize the existing results on the concentration and large deviation of heavy-tailed random variables.

preprint2021arXiv

Consistent Risk Estimation in Moderately High-Dimensional Linear Regression

Risk estimation is at the core of many learning systems. The importance of this problem has motivated researchers to propose different schemes, such as cross validation, generalized cross validation, and Bootstrap. The theoretical properties of such estimates have been extensively studied in the low-dimensional settings, where the number of predictors $p$ is much smaller than the number of observations $n$. However, a unifying methodology accompanied with a rigorous theory is lacking in high-dimensional settings. This paper studies the problem of risk estimation under the moderately high-dimensional asymptotic setting $n,p \rightarrow \infty$ and $n/p \rightarrow δ>1$ ($δ$ is a fixed number), and proves the consistency of three risk estimates that have been successful in numerical studies, i.e., leave-one-out cross validation (LOOCV), approximate leave-one-out (ALO), and approximate message passing (AMP)-based techniques. A corner stone of our analysis is a bound that we obtain on the discrepancy of the `residuals' obtained from AMP and LOOCV. This connection not only enables us to obtain a more refined information on the estimates of AMP, ALO, and LOOCV, but also offers an upper bound on the convergence rate of each estimate.

preprint2021arXiv

Minimax Linear Estimation of the Retargeted Mean

Evaluating treatments received by one population for application to a different target population of scientific interest is a central problem in causal inference from observational studies. We study the minimax linear estimator of the treatment-specific mean outcome on a target population and provide a theoretical basis for inference based on it. In particular, we provide a justification for the common practice of ignoring bias when building confidence intervals with these linear estimators. Focusing on the case that the class of the unknown outcome function is the unit ball of a reproducing kernel Hilbert space, we show that the resulting linear estimator is asymptotically optimal under conditions only marginally stronger than those used with augmented estimators. We establish bounds attesting to the estimator's good finite sample properties. In an extensive simulation study, we observe promising performance of the estimator throughout a wide range of sample sizes, noise levels, and levels of overlap between the covariate distributions of the treated and target populations.

preprint2020arXiv

A scalable estimate of the extra-sample prediction error via approximate leave-one-out

The paper considers the problem of out-of-sample risk estimation under the high dimensional settings where standard techniques such as $K$-fold cross validation suffer from large biases. Motivated by the low bias of the leave-one-out cross validation (LO) method, we propose a computationally efficient closed-form approximate leave-one-out formula (ALO) for a large class of regularized estimators. Given the regularized estimate, calculating ALO requires minor computational overhead. With minor assumptions about the data generating process, we obtain a finite-sample upper bound for $|\text{LO} - \text{ALO}|$. Our theoretical analysis illustrates that $|\text{LO} - \text{ALO}| \rightarrow 0$ with overwhelming probability, when $n,p \rightarrow \infty$, where the dimension $p$ of the feature vectors may be comparable with or even greater than the number of observations, $n$. Despite the high-dimensionality of the problem, our theoretical results do not require any sparsity assumption on the vector of regression coefficients. Our extensive numerical experiments show that $|\text{LO} - \text{ALO}|$ decreases as $n,p$ increase, revealing the excellent finite sample performance of ALO. We further illustrate the usefulness of our proposed out-of-sample risk estimation method by an example of real recordings from spatially sensitive neurons (grid cells) in the medial entorhinal cortex of a rat.

preprint2020arXiv

Analysis of Spectral Methods for Phase Retrieval with Random Orthogonal Matrices

Phase retrieval refers to algorithmic methods for recovering a signal from its phaseless measurements. Local search algorithms that work directly on the non-convex formulation of the problem have been very popular recently. Due to the nonconvexity of the problem, the success of these local search algorithms depends heavily on their starting points. The most widely used initialization scheme is the spectral method, in which the leading eigenvector of a data-dependent matrix is used as a starting point. Recently, the performance of the spectral initialization was characterized accurately for measurement matrices with independent and identically distributed entries. This paper aims to obtain the same level of knowledge for isotropically random column-orthogonal matrices, which are substantially better models for practical phase retrieval systems. Towards this goal, we consider the asymptotic setting in which the number of measurements $m$, and the dimension of the signal, $n$, diverge to infinity with $m/n = δ\in(1,\infty)$, and obtain a simple expression for the overlap between the spectral estimator and the true signal vector.

preprint2020arXiv

Error bounds in estimating the out-of-sample prediction error using leave-one-out cross validation in high-dimensions

We study the problem of out-of-sample risk estimation in the high dimensional regime where both the sample size $n$ and number of features $p$ are large, and $n/p$ can be less than one. Extensive empirical evidence confirms the accuracy of leave-one-out cross validation (LO) for out-of-sample risk estimation. Yet, a unifying theoretical evaluation of the accuracy of LO in high-dimensional problems has remained an open problem. This paper aims to fill this gap for penalized regression in the generalized linear family. With minor assumptions about the data generating process, and without any sparsity assumptions on the regression coefficients, our theoretical analysis obtains finite sample upper bounds on the expected squared error of LO in estimating the out-of-sample error. Our bounds show that the error goes to zero as $n,p \rightarrow \infty$, even when the dimension $p$ of the feature vectors is comparable with or greater than the sample size $n$. One technical advantage of the theory is that it can be used to clarify and connect some results from the recent literature on scalable approximate LO.

preprint2020arXiv

Information Theoretic Limits for Phase Retrieval with Subsampled Haar Sensing Matrices

We study information theoretic limits of recovering an unknown $n$ dimensional, complex signal vector $\mathbf{x}_\star$ with unit norm from $m$ magnitude-only measurements of the form $y_i = |(\mathbf{A} \mathbf{x}_\star)_i|^2, \; i = 1,2 \dots , m$, where $\mathbf{A}$ is the sensing matrix. This is known as the Phase Retrieval problem and models practical imaging systems where measuring the phase of the observations is difficult. Since in a number of applications, the sensing matrix has orthogonal columns, we model the sensing matrix as a subsampled Haar matrix formed by picking $n$ columns of a uniformly random $m \times m$ unitary matrix. We study this problem in the high dimensional asymptotic regime, where $m,n \rightarrow \infty$, while $m/n \rightarrow δ$ with $δ$ being a fixed number, and show that if $m < (2-o_n(1))\cdot n$, then any estimator is asymptotically orthogonal to the true signal vector $\mathbf{x}_\star$. This lower bound is sharp since when $m > (2+o_n(1)) \cdot n $, estimators that achieve a non trivial asymptotic correlation with the signal vector are known from previous works.

preprint2020arXiv

Spectral Method for Phase Retrieval: an Expectation Propagation Perspective

Phase retrieval refers to the problem of recovering a signal $\mathbf{x}_{\star}\in\mathbb{C}^n$ from its phaseless measurements $y_i=|\mathbf{a}_i^{\mathrm{H}}\mathbf{x}_{\star}|$, where $\{\mathbf{a}_i\}_{i=1}^m$ are the measurement vectors. Many popular phase retrieval algorithms are based on the following two-step procedure: (i) initialize the algorithm based on a spectral method, (ii) refine the initial estimate by a local search algorithm (e.g., gradient descent). The quality of the spectral initialization step can have a major impact on the performance of the overall algorithm. In this paper, we focus on the model where the measurement matrix $\mathbf{A}=[\mathbf{a}_1,\ldots,\mathbf{a}_m]^{\mathrm{H}}$ has orthonormal columns, and study the spectral initialization under the asymptotic setting $m,n\to\infty$ with $m/n\toδ\in(1,\infty)$. We use the expectation propagation framework to characterize the performance of spectral initialization for Haar distributed matrices. Our numerical results confirm that the predictions of the EP method are accurate for not-only Haar distributed matrices, but also for realistic Fourier based models (e.g. the coded diffraction model). The main findings of this paper are the following: (1) There exists a threshold on $δ$ (denoted as $δ_{\mathrm{weak}}$) below which the spectral method cannot produce a meaningful estimate. We show that $δ_{\mathrm{weak}}=2$ for the column-orthonormal model. In contrast, previous results by Mondelli and Montanari show that $δ_{\mathrm{weak}}=1$ for the i.i.d. Gaussian model. (2) The optimal design for the spectral method coincides with that for the i.i.d. Gaussian model, where the latter was recently introduced by Luo, Alghamdi and Lu.

preprint2020arXiv

Using Black-box Compression Algorithms for Phase Retrieval

Compressive phase retrieval refers to the problem of recovering a structured $n$-dimensional complex-valued vector from its phase-less under-determined linear measurements. The non-linearity of measurements makes designing theoretically-analyzable efficient phase retrieval algorithms challenging. As a result, to a great extent, algorithms designed in this area are developed to take advantage of simple structures such as sparsity and its convex generalizations. The goal of this paper is to move beyond simple models through employing compression codes. Such codes are typically developed to take advantage of complex signal models to represent the signals as efficiently as possible. In this work, it is shown how an existing compression code can be treated as a black box and integrated into an efficient solution for phase retrieval. First, COmpressive PhasE Retrieval (COPER) optimization, a computationally-intensive compression-based phase retrieval method, is proposed. COPER provides a theoretical framework for studying compression-based phase retrieval. The number of measurements required by COPER is connected to $κ$, the $α$-dimension (closely related to the rate-distortion dimension) of the given family of compression codes. To finds the solution of COPER, an efficient iterative algorithm called gradient descent for COPER (GD-COPER) is proposed. It is proven that under some mild conditions on the initialization, if the number of measurements is larger than $ C κ^2 \log^2 n$, where $C$ is a constant, GD-COPER obtains an accurate estimate of the input vector in polynomial time. In the simulation results, JPEG2000 is integrated in GD-COPER to confirm the superb performance of the resulting algorithm on real-world images.

preprint2020arXiv

Which bridge estimator is optimal for variable selection?

We study the problem of variable selection for linear models under the high-dimensional asymptotic setting, where the number of observations $n$ grows at the same rate as the number of predictors $p$. We consider two-stage variable selection techniques (TVS) in which the first stage uses bridge estimators to obtain an estimate of the regression coefficients, and the second stage simply thresholds this estimate to select the "important" predictors. The asymptotic false discovery proportion (AFDP) and true positive proportion (ATPP) of these TVS are evaluated. We prove that for a fixed ATPP, in order to obtain a smaller AFDP, one should pick a bridge estimator with smaller asymptotic mean square error in the first stage of TVS. Based on such principled discovery, we present a sharp comparison of different TVS, via an in-depth investigation of the estimation properties of bridge estimators. Rather than "order-wise" error bounds with loose constants, our analysis focuses on precise error characterization. Various interesting signal-to-noise ratio and sparsity settings are studied. Our results offer new and thorough insights into high-dimensional variable selection. For instance, we prove that a TVS with Ridge in its first stage outperforms TVS with other bridge estimators in large noise settings; two-stage LASSO becomes inferior when the signal is rare and weak. As a by-product, we show that two-stage methods outperform some standard variable selection techniques, such as LASSO and Sure Independence Screening, under certain conditions.

preprint2016arXiv

From Denoising to Compressed Sensing

A denoising algorithm seeks to remove noise, errors, or perturbations from a signal. Extensive research has been devoted to this arena over the last several decades, and as a result, today's denoisers can effectively remove large amounts of additive white Gaussian noise. A compressed sensing (CS) reconstruction algorithm seeks to recover a structured signal acquired using a small number of randomized measurements. Typical CS reconstruction algorithms can be cast as iteratively estimating a signal from a perturbed observation. This paper answers a natural question: How can one effectively employ a generic denoiser in a CS reconstruction algorithm? In response, we develop an extension of the approximate message passing (AMP) framework, called Denoising-based AMP (D-AMP), that can integrate a wide class of denoisers within its iterations. We demonstrate that, when used with a high performance denoiser for natural images, D-AMP offers state-of-the-art CS recovery performance while operating tens of times faster than competing methods. We explain the exceptional performance of D-AMP by analyzing some of its theoretical features. A key element in D-AMP is the use of an appropriate Onsager correction term in its iterations, which coerces the signal perturbation at each iteration to be very close to the white Gaussian noise that denoisers are typically designed to remove.

preprint2016arXiv

Global analysis of Expectation Maximization for mixtures of two Gaussians

Expectation Maximization (EM) is among the most popular algorithms for estimating parameters of statistical models. However, EM, which is an iterative algorithm based on the maximum likelihood principle, is generally only guaranteed to find stationary points of the likelihood objective, and these points may be far from any maximizer. This article addresses this disconnect between the statistical principles behind EM and its algorithmic properties. Specifically, it provides a global analysis of EM for specific models in which the observations comprise an i.i.d. sample from a mixture of two Gaussians. This is achieved by (i) studying the sequence of parameters from idealized execution of EM in the infinite sample limit, and fully characterizing the limit points of the sequence in terms of the initial parameters; and then (ii) based on this convergence analysis, establishing statistical consistency (or lack thereof) for the actual sequence of parameters produced by EM.

preprint2015arXiv

Consistent Parameter Estimation for LASSO and Approximate Message Passing

We consider the problem of recovering a vector $β_o \in \mathbb{R}^p$ from $n$ random and noisy linear observations $y= Xβ_o + w$, where $X$ is the measurement matrix and $w$ is noise. The LASSO estimate is given by the solution to the optimization problem $\hatβ_λ = \arg \min_β \frac{1}{2} \|y-Xβ\|_2^2 + λ\| β\|_1$. Among the iterative algorithms that have been proposed for solving this optimization problem, approximate message passing (AMP) has attracted attention for its fast convergence. Despite significant progress in the theoretical analysis of the estimates of LASSO and AMP, little is known about their behavior as a function of the regularization parameter $λ$, or the thereshold parameters $τ^t$. For instance the following basic questions have not yet been studied in the literature: (i) How does the size of the active set $\|\hatβ^λ\|_0/p$ behave as a function of $λ$? (ii) How does the mean square error $\|\hatβ_λ - β_o\|_2^2/p$ behave as a function of $λ$? (iii) How does $\|β^t - β_o \|_2^2/p$ behave as a function of $τ^1, \ldots, τ^{t-1}$? Answering these questions will help in addressing practical challenges regarding the optimal tuning of $λ$ or $τ^1, τ^2, \ldots$. This paper answers these questions in the asymptotic setting and shows how these results can be employed in deriving simple and theoretically optimal approaches for tuning the parameters $τ^1, \ldots, τ^t$ for AMP or $λ$ for LASSO. It also explores the connection between the optimal tuning of the parameters of AMP and the optimal tuning of LASSO.

preprint2015arXiv

Optimal Large-MIMO Data Detection with Transmit Impairments

Real-world transceiver designs for multiple-input multiple-output (MIMO) wireless communication systems are affected by a number of hardware impairments that already appear at the transmit side, such as amplifier non-linearities, quantization artifacts, and phase noise. While such transmit-side impairments are routinely ignored in the data-detection literature, they often limit reliable communication in practical systems. In this paper, we present a novel data-detection algorithm, referred to as large-MIMO approximate message passing with transmit impairments (short LAMA-I), which takes into account a broad range of transmit-side impairments in wireless systems with a large number of transmit and receive antennas. We provide conditions in the large-system limit for which LAMA-I achieves the error-rate performance of the individually-optimal (IO) data detector. We furthermore demonstrate that LAMA-I achieves near-IO performance at low computational complexity in realistic, finite dimensional large-MIMO systems.

preprint2015arXiv

Optimality of Large MIMO Detection via Approximate Message Passing

Optimal data detection in multiple-input multiple-output (MIMO) communication systems with a large number of antennas at both ends of the wireless link entails prohibitive computational complexity. In order to reduce the computational complexity, a variety of sub-optimal detection algorithms have been proposed in the literature. In this paper, we analyze the optimality of a novel data-detection method for large MIMO systems that relies on approximate message passing (AMP). We show that our algorithm, referred to as individually-optimal (IO) large-MIMO AMP (short IO-LAMA), is able to perform IO data detection given certain conditions on the MIMO system and the constellation set (e.g., QAM or PSK) are met.

preprint2013arXiv

Asymptotic Analysis of Complex LASSO via Complex Approximate Message Passing (CAMP)

Recovering a sparse signal from an undersampled set of random linear measurements is the main problem of interest in compressed sensing. In this paper, we consider the case where both the signal and the measurements are complex. We study the popular reconstruction method of $\ell_1$-regularized least squares or LASSO. While several studies have shown that the LASSO algorithm offers desirable solutions under certain conditions, the precise asymptotic performance of this algorithm in the complex setting is not yet known. In this paper, we extend the approximate message passing (AMP) algorithm to the complex signals and measurements and obtain the complex approximate message passing algorithm (CAMP). We then generalize the state evolution framework recently introduced for the analysis of AMP, to the complex setting. Using the state evolution, we derive accurate formulas for the phase transition and noise sensitivity of both LASSO and CAMP.

preprint2013arXiv

Asymptotic Analysis of LASSOs Solution Path with Implications for Approximate Message Passing

This paper concerns the performance of the LASSO (also knows as basis pursuit denoising) for recovering sparse signals from undersampled, randomized, noisy measurements. We consider the recovery of the signal $x_o \in \mathbb{R}^N$ from $n$ random and noisy linear observations $y= Ax_o + w$, where $A$ is the measurement matrix and $w$ is the noise. The LASSO estimate is given by the solution to the optimization problem $x_o$ with $\hat{x}_λ = \arg \min_x \frac{1}{2} \|y-Ax\|_2^2 + λ\|x\|_1$. Despite major progress in the theoretical analysis of the LASSO solution, little is known about its behavior as a function of the regularization parameter $λ$. In this paper we study two questions in the asymptotic setting (i.e., where $N \rightarrow \infty$, $n \rightarrow \infty$ while the ratio $n/N$ converges to a fixed number in $(0,1)$): (i) How does the size of the active set $\|\hat{x}_λ\|_0/N$ behave as a function of $λ$, and (ii) How does the mean square error $\|\hat{x}_λ - x_o\|_2^2/N$ behave as a function of $λ$? We then employ these results in a new, reliable algorithm for solving LASSO based on approximate message passing (AMP).

preprint2013arXiv

From compression to compressed sensing

Can compression algorithms be employed for recovering signals from their underdetermined set of linear measurements? Addressing this question is the first step towards applying compression algorithms for compressed sensing (CS). In this paper, we consider a family of compression algorithms $\mathcal{C}_r$, parametrized by rate $r$, for a compact class of signals $\mathcal{Q} \subset \mathds{R}^n$. The set of natural images and JPEG at different rates are examples of $\mathcal{Q}$ and $\mathcal{C}_r$, respectively. We establish a connection between the rate-distortion performance of $\mathcal{C}_r$, and the number of linear measurements required for successful recovery in CS. We then propose compressible signal pursuit (CSP) algorithm and prove that, with high probability, it accurately and robustly recovers signals from an underdetermined set of linear measurements. We also explore the performance of CSP in the recovery of infinite dimensional signals.

preprint2013arXiv

Maximin Analysis of Message Passing Algorithms for Recovering Block Sparse Signals

We consider the problem of recovering a block (or group) sparse signal from an underdetermined set of random linear measurements, which appear in compressed sensing applications such as radar and imaging. Recent results of Donoho, Johnstone, and Montanari have shown that approximate message passing (AMP) in combination with Stein's shrinkage outperforms group LASSO for large block sizes. In this paper, we prove that, for a fixed block size and in the strong undersampling regime (i.e., having very few measurements compared to the ambient dimension), AMP cannot improve upon group LASSO, thereby complementing the results of Donoho et al.

preprint2013arXiv

Minimum Complexity Pursuit for Universal Compressed Sensing

The nascent field of compressed sensing is founded on the fact that high-dimensional signals with "simple structure" can be recovered accurately from just a small number of randomized samples. Several specific kinds of structures have been explored in the literature, from sparsity and group sparsity to low-rankness. However, two fundamental questions have been left unanswered, namely: What are the general abstract meanings of "structure" and "simplicity"? And do there exist universal algorithms for recovering such simple structured objects from fewer samples than their ambient dimension? In this paper, we address these two questions. Using algorithmic information theory tools such as the Kolmogorov complexity, we provide a unified definition of structure and simplicity. Leveraging this new definition, we develop and analyze an abstract algorithm for signal recovery motivated by Occam's Razor.Minimum complexity pursuit (MCP) requires just O(3κ) randomized samples to recover a signal of complexity κand ambient dimension n. We also discuss the performance of MCP in the presence of measurement noise and with approximately simple signals.

preprint2013arXiv

Parameterless Optimal Approximate Message Passing

Iterative thresholding algorithms are well-suited for high-dimensional problems in sparse recovery and compressive sensing. The performance of this class of algorithms depends heavily on the tuning of certain threshold parameters. In particular, both the final reconstruction error and the convergence rate of the algorithm crucially rely on how the threshold parameter is set at each step of the algorithm. In this paper, we propose a parameter-free approximate message passing (AMP) algorithm that sets the threshold parameter at each iteration in a fully automatic way without either having an information about the signal to be reconstructed or needing any tuning from the user. We show that the proposed method attains both the minimum reconstruction error and the highest convergence rate. Our method is based on applying the Stein unbiased risk estimate (SURE) along with a modified gradient descent to find the optimal threshold in each iteration. Motivated by the connections between AMP and LASSO, it could be employed to find the solution of the LASSO for the optimal regularization parameter. To the best of our knowledge, this is the first work concerning parameter tuning that obtains the fastest convergence rate with theoretical guarantees.

preprint2012arXiv

Anisotropic Nonlocal Means Denoising

It has recently been proved that the popular nonlocal means (NLM) denoising algorithm does not optimally denoise images with sharp edges. Its weakness lies in the isotropic nature of the neighborhoods it uses to set its smoothing weights. In response, in this paper we introduce several theoretical and practical anisotropic nonlocal means (ANLM) algorithms and prove that they are near minimax optimal for edge-dominated images from the Horizon class. On real-world test images, an ANLM algorithm that adapts to the underlying image gradients outperforms NLM by a significant margin.

preprint2012arXiv

Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation

The L1-regularized maximum likelihood estimation problem has recently become a topic of great interest within the machine learning, statistics, and optimization communities as a method for producing sparse inverse covariance estimators. In this paper, a proximal gradient method (G-ISTA) for performing L1-regularized covariance matrix estimation is presented. Although numerous algorithms have been proposed for solving this problem, this simple proximal gradient method is found to have attractive theoretical and numerical properties. G-ISTA has a linear rate of convergence, resulting in an O(log e) iteration complexity to reach a tolerance of e. This paper gives eigenvalue bounds for the G-ISTA iterates, providing a closed-form linear convergence rate. The rate is shown to be closely related to the condition number of the optimal point. Numerical convergence results and timing comparisons for the proposed method are presented. G-ISTA is shown to perform very well, especially when the optimal point is well-conditioned.

preprint2012arXiv

Minimum Complexity Pursuit: Stability Analysis

A host of problems involve the recovery of structured signals from a dimensionality reduced representation such as a random projection; examples include sparse signals (compressive sensing) and low-rank matrices (matrix completion). Given the wide range of different recovery algorithms developed to date, it is natural to ask whether there exist "universal" algorithms for recovering "structured" signals from their linear projections. We recently answered this question in the affirmative in the noise-free setting. In this paper, we extend our results to the case of noisy measurements.

preprint2011arXiv

Compressed Sensing over $\ell_p$-balls: Minimax Mean Square Error

We consider the compressed sensing problem, where the object $x_0 \in \bR^N$ is to be recovered from incomplete measurements $y = Ax_0 + z$; here the sensing matrix $A$ is an $n \times N$ random matrix with iid Gaussian entries and $n < N$. A popular method of sparsity-promoting reconstruction is $\ell^1$-penalized least-squares reconstruction (aka LASSO, Basis Pursuit). It is currently popular to consider the strict sparsity model, where the object $x_0$ is nonzero in only a small fraction of entries. In this paper, we instead consider the much more broadly applicable $\ell_p$-sparsity model, where $x_0$ is sparse in the sense of having $\ell_p$ norm bounded by $ξ\cdot N^{1/p}$ for some fixed $0 < p \leq 1$ and $ξ> 0$. We study an asymptotic regime in which $n$ and $N$ both tend to infinity with limiting ratio $n/N = δ\in (0,1)$, both in the noisy ($z \neq 0$) and noiseless ($z=0$) cases. Under weak assumptions on $x_0$, we are able to precisely evaluate the worst-case asymptotic minimax mean-squared reconstruction error (AMSE) for $\ell^1$ penalized least-squares: min over penalization parameters, max over $\ell_p$-sparse objects $x_0$. We exhibit the asymptotically least-favorable object (hardest sparse signal to recover) and the maximin penalization. Our explicit formulas unexpectedly involve quantities appearing classically in statistical decision theory. Occurring in the present setting, they reflect a deeper connection between penalized $\ell^1$ minimization and scalar soft thresholding. This connection, which follows from earlier work of the authors and collaborators on the AMP iterative thresholding algorithm, is carefully explained. Our approach also gives precise results under weak-$\ell_p$ ball coefficient constraints, as we show here.

preprint2011arXiv

Minimum Complexity Pursuit

The fast growing field of compressed sensing is founded on the fact that if a signal is 'simple' and has some 'structure', then it can be reconstructed accurately with far fewer samples than its ambient dimension. Many different plausible structures have been explored in this field, ranging from sparsity to low-rankness and to finite rate of innovation. However, there are important abstract questions that are yet to be answered. For instance, what are the general abstract meanings of 'structure' and 'simplicity'? Do there exist universal algorithms for recovering such simple structured objects from fewer samples than their ambient dimension? In this paper, we aim to address these two questions. Using algorithmic information theory tools such as Kolmogorov complexity, we provide a unified method of describing 'simplicity' and 'structure'. We then explore the performance of an algorithm motivated by Ocam's Razor (called MCP for minimum complexity pursuit) and show that it requires $O(k\log n)$ number of samples to recover a signal, where $k$ and $n$ represent its complexity and ambient dimension, respectively. Finally, we discuss more general classes of signals and provide guarantees on the performance of MCP.

preprint2011arXiv

Suboptimality of Nonlocal Means for Images with Sharp Edges

We conduct an asymptotic risk analysis of the nonlocal means image denoising algorithm for the Horizon class of images that are piecewise constant with a sharp edge discontinuity. We prove that the mean square risk of an optimally tuned nonlocal means algorithm decays according to $n^{-1}\log^{1/2+ε} n$, for an $n$-pixel image with $ε>0$. This decay rate is an improvement over some of the predecessors of this algorithm, including the linear convolution filter, median filter, and the SUSAN filter, each of which provides a rate of only $n^{-2/3}$. It is also within a logarithmic factor from optimally tuned wavelet thresholding. However, it is still substantially lower than the the optimal minimax rate of $n^{-4/3}$.

preprint2010arXiv

The Noise-Sensitivity Phase Transition in Compressed Sensing

Consider the noisy underdetermined system of linear equations: y=Ax0 + z0, with n x N measurement matrix A, n < N, and Gaussian white noise z0 ~ N(0,σ^2 I). Both y and A are known, both x0 and z0 are unknown, and we seek an approximation to x0. When x0 has few nonzeros, useful approximations are obtained by l1-penalized l2 minimization, in which the reconstruction \hxl solves min || y - Ax||^2/2 + λ||x||_1. Evaluate performance by mean-squared error (MSE = E ||\hxl - x0||_2^2/N). Consider matrices A with iid Gaussian entries and a large-system limit in which n,N\to\infty with n/N \to δand k/n \to ρ. Call the ratio MSE/σ^2 the noise sensitivity. We develop formal expressions for the MSE of \hxl, and evaluate its worst-case formal noise sensitivity over all types of k-sparse signals. The phase space 0 < δ, ρ< 1 is partitioned by curve ρ= \rhoMSE(δ) into two regions. Formal noise sensitivity is bounded throughout the region ρ< \rhoMSE(δ) and is unbounded throughout the region ρ> \rhoMSE(δ). The phase boundary ρ= \rhoMSE(δ) is identical to the previously-known phase transition curve for equivalence of l1 - l0 minimization in the k-sparse noiseless case. Hence a single phase boundary describes the fundamental phase transitions both for the noiseless and noisy cases. Extensive computational experiments validate the predictions of this formalism, including the existence of game theoretical structures underlying it. Underlying our formalism is the AMP algorithm introduced earlier by the authors. Other papers by the authors detail expressions for the formal MSE of AMP and its close connection to l1-penalized reconstruction. Here we derive the minimax formal MSE of AMP and then read out results for l1-penalized reconstruction.

preprint2009arXiv

Message Passing Algorithms for Compressed Sensing

Compressed sensing aims to undersample certain high-dimensional signals, yet accurately reconstruct them by exploiting signal characteristics. Accurate reconstruction is possible when the object to be recovered is sufficiently sparse in a known basis. Currently, the best known sparsity-undersampling tradeoff is achieved when reconstructing by convex optimization -- which is expensive in important large-scale applications. Fast iterative thresholding algorithms have been intensively studied as alternatives to convex optimization for large-scale problems. Unfortunately known fast algorithms offer substantially worse sparsity-undersampling tradeoffs than convex optimization. We introduce a simple costless modification to iterative thresholding making the sparsity-undersampling tradeoff of the new algorithms equivalent to that of the corresponding convex optimization procedures. The new iterative-thresholding algorithms are inspired by belief propagation in graphical models. Our empirical measurements of the sparsity-undersampling tradeoff for the new algorithms agree with theoretical calculations. We show that a state evolution formalism correctly derives the true sparsity-undersampling tradeoff. There is a surprising agreement between earlier calculations based on random convex polytopes and this new, apparently very different theoretical formalism.

preprint2009arXiv

Optimally Tuned Iterative Reconstruction Algorithms for Compressed Sensing

We conducted an extensive computational experiment, lasting multiple CPU-years, to optimally select parameters for two important classes of algorithms for finding sparse solutions of underdetermined systems of linear equations. We make the optimally tuned implementations available at {\tt sparselab.stanford.edu}; they run `out of the box' with no user tuning: it is not necessary to select thresholds or know the likely degree of sparsity. Our class of algorithms includes iterative hard and soft thresholding with or without relaxation, as well as CoSaMP, subspace pursuit and some natural extensions. As a result, our optimally tuned algorithms dominate such proposals. Our notion of optimality is defined in terms of phase transitions, i.e. we maximize the number of nonzeros at which the algorithm can successfully operate. We show that the phase transition is a well-defined quantity with our suite of random underdetermined linear systems. Our tuning gives the highest transition possible within each class of algorithms.

Arian Maleki

What is connected

Connect this record

See the researcher in context

Building this map preview

32 published item(s)

High-Dimensional Statistics: Reflections on Progress and Open Problems

Signal-to-noise ratio aware minimaxity and higher-order asymptotics

Sharp Concentration Results for Heavy-Tailed Distributions

Consistent Risk Estimation in Moderately High-Dimensional Linear Regression

Minimax Linear Estimation of the Retargeted Mean

A scalable estimate of the extra-sample prediction error via approximate leave-one-out

Analysis of Spectral Methods for Phase Retrieval with Random Orthogonal Matrices

Error bounds in estimating the out-of-sample prediction error using leave-one-out cross validation in high-dimensions

Information Theoretic Limits for Phase Retrieval with Subsampled Haar Sensing Matrices

Spectral Method for Phase Retrieval: an Expectation Propagation Perspective

Using Black-box Compression Algorithms for Phase Retrieval

Which bridge estimator is optimal for variable selection?

From Denoising to Compressed Sensing

Global analysis of Expectation Maximization for mixtures of two Gaussians

Consistent Parameter Estimation for LASSO and Approximate Message Passing

Optimal Large-MIMO Data Detection with Transmit Impairments

Optimality of Large MIMO Detection via Approximate Message Passing

Asymptotic Analysis of Complex LASSO via Complex Approximate Message Passing (CAMP)

Asymptotic Analysis of LASSOs Solution Path with Implications for Approximate Message Passing

From compression to compressed sensing

Maximin Analysis of Message Passing Algorithms for Recovering Block Sparse Signals

Minimum Complexity Pursuit for Universal Compressed Sensing

Parameterless Optimal Approximate Message Passing

Anisotropic Nonlocal Means Denoising

Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation

Minimum Complexity Pursuit: Stability Analysis

Compressed Sensing over $\ell_p$-balls: Minimax Mean Square Error

Minimum Complexity Pursuit

Suboptimality of Nonlocal Means for Images with Sharp Edges

The Noise-Sensitivity Phase Transition in Compressed Sensing

Message Passing Algorithms for Compressed Sensing

Optimally Tuned Iterative Reconstruction Algorithms for Compressed Sensing