Source author record

Christos Thrampoulidis

Christos Thrampoulidis appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Information Theory math.IT math.ST Statistics Theory eess.SP math.OC Computation math.PR Methodology Systems and Control

Catalog footprint

What is connected

25works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

High-Dimensional Statistics: Reflections on Progress and Open Problems

Over the past two decades, the field of high-dimensional statistics has experienced substantial progress, driven largely by technological advances that have dramatically reduced the cost and effort for data collection and storage across a broad range of domains, including biology, medicine, astronomy, and the social and environmental sciences. Modern datasets are increasingly complex, often exhibiting rich dependency, heterogeneity, and other features that challenge traditional statistical methods. In response, high-dimensional statistics has evolved to address more sophisticated estimation and inference problems. This evolution has, in turn, fostered deep connections with and contributions to a wide range of research areas, including optimization, concentration of measure, random matrix theory, information theory, and theoretical computer science. Given the rapid pace of recent developments in high-dimensional statistics, our goal is to synthesize representative advances, highlight common themes and open problems, and point to important works that offer entry points into the field.

preprint2026arXiv

Why Loss Re-weighting Works If You Stop Early: Training Dynamics of Unconstrained Features

The application of loss reweighting in modern deep learning presents a nuanced picture. While it fails to alter the terminal learning phase in overparameterized deep neural networks (DNNs) trained on high-dimensional datasets, empirical evidence consistently shows it offers significant benefits early in training. To transparently demonstrate and analyze this phenomenon, we introduce a small-scale model (SSM). This model is specifically designed to abstract the inherent complexities of both the DNN architecture and the input data, while maintaining key information about the structure of imbalance within its spectral components. On the one hand, the SSM reveals how vanilla empirical risk minimization preferentially learns to distinguish majority classes over minorities early in training, consequently delaying minority learning. In stark contrast, reweighting restores balanced learning dynamics, enabling the simultaneous learning of features associated with both majorities and minorities.

preprint2022arXiv

AutoBalance: Optimized Loss Functions for Imbalanced Data

Imbalanced datasets are commonplace in modern machine learning problems. The presence of under-represented classes or groups with sensitive attributes results in concerns about generalization and fairness. Such concerns are further exacerbated by the fact that large capacity deep nets can perfectly fit the training data and appear to achieve perfect accuracy and fairness during training, but perform poorly during test. To address these challenges, we propose AutoBalance, a bi-level optimization framework that automatically designs a training loss function to optimize a blend of accuracy and fairness-seeking objectives. Specifically, a lower-level problem trains the model weights, and an upper-level problem tunes the loss function by monitoring and optimizing the desired objective over the validation data. Our loss design enables personalized treatment for classes/groups by employing a parametric cross-entropy loss and individualized data augmentation schemes. We evaluate the benefits and performance of our approach for the application scenarios of imbalanced and group-sensitive classification. Extensive empirical evaluations demonstrate the benefits of AutoBalance over state-of-the-art approaches. Our experimental findings are complemented with theoretical insights on loss function design and the benefits of train-validation split. All code is available open-source.

preprint2022arXiv

FedNest: Federated Bilevel, Minimax, and Compositional Optimization

Standard federated optimization methods successfully apply to stochastic problems with single-level structure. However, many contemporary ML problems -- including adversarial robustness, hyperparameter tuning, and actor-critic -- fall under nested bilevel programming that subsumes minimax and compositional optimization. In this work, we propose \fedblo: A federated alternating stochastic gradient method to address general nested problems. We establish provable convergence rates for \fedblo in the presence of heterogeneous data and introduce variations for bilevel, minimax, and compositional optimization. \fedblo introduces multiple innovations including federated hypergradient computation and variance reduction to address inner-level heterogeneity. We complement our theory with experiments on hyperparameter \& hyper-representation learning and minimax optimization that demonstrate the benefits of our method in practice. Code is available at https://github.com/ucr-optml/FedNest.

preprint2022arXiv

Imbalance Trouble: Revisiting Neural-Collapse Geometry

Neural Collapse refers to the remarkable structural properties characterizing the geometry of class embeddings and classifier weights, found by deep nets when trained beyond zero training error. However, this characterization only holds for balanced data. Here we thus ask whether it can be made invariant to class imbalances. Towards this end, we adopt the unconstrained-features model (UFM), a recent theoretical model for studying neural collapse, and introduce Simplex-Encoded-Labels Interpolation (SELI) as an invariant characterization of the neural collapse phenomenon. Specifically, we prove for the UFM with cross-entropy loss and vanishing regularization that, irrespective of class imbalances, the embeddings and classifiers always interpolate a simplex-encoded label matrix and that their individual geometries are determined by the SVD factors of this same label matrix. We then present extensive experiments on synthetic and real datasets that confirm convergence to the SELI geometry. However, we caution that convergence worsens with increasing imbalances. We theoretically support this finding by showing that unlike the balanced case, when minorities are present, ridge-regularization plays a critical role in tweaking the geometry. This defines new questions and motivates further investigations into the impact of class imbalances on the rates at which first-order methods converge to their asymptotically preferred solutions.

preprint2022arXiv

Multi-Environment Meta-Learning in Stochastic Linear Bandits

In this work we investigate meta-learning (or learning-to-learn) approaches in multi-task linear stochastic bandit problems that can originate from multiple environments. Inspired by the work of [1] on meta-learning in a sequence of linear bandit problems whose parameters are sampled from a single distribution (i.e., a single environment), here we consider the feasibility of meta-learning when task parameters are drawn from a mixture distribution instead. For this problem, we propose a regularized version of the OFUL algorithm that, when trained on tasks with labeled environments, achieves low regret on a new task without requiring knowledge of the environment from which the new task originates. Specifically, our regret bound for the new algorithm captures the effect of environment misclassification and highlights the benefits over learning each task separately or meta-learning without recognition of the distinct mixture components.

preprint2022arXiv

On how to avoid exacerbating spurious correlations when models are overparameterized

Overparameterized models fail to generalize well in the presence of data imbalance even when combined with traditional techniques for mitigating imbalances. This paper focuses on imbalanced classification datasets, in which a small subset of the population -- a minority -- may contain features that correlate spuriously with the class label. For a parametric family of cross-entropy loss modifications and a representative Gaussian mixture model, we derive non-asymptotic generalization bounds on the worst-group error that shed light on the role of different hyper-parameters. Specifically, we prove that, when appropriately tuned, the recently proposed VS-loss learns a model that is fair towards minorities even when spurious features are strong. On the other hand, alternative heuristics, such as the weighted CE and the LA-loss, can fail dramatically. Compared to previous works, our bounds hold for more general models, they are non-asymptotic, and, they apply even at scenarios of extreme imbalance.

preprint2020arXiv

A Model of Double Descent for High-dimensional Binary Linear Classification

We consider a model for logistic regression where only a subset of features of size $p$ is used for training a linear classifier over $n$ training samples. The classifier is obtained by running gradient descent (GD) on logistic loss. For this model, we investigate the dependence of the classification error on the overparameterization ratio $κ=p/n$. First, building on known deterministic results on the implicit bias of GD, we uncover a phase-transition phenomenon for the case of Gaussian features: the classification error of GD is the same as that of the maximum-likelihood (ML) solution when $κ<κ_\star$, and that of the max-margin (SVM) solution when $κ>κ_\star$. Next, using the convex Gaussian min-max theorem (CGMT), we sharply characterize the performance of both the ML and the SVM solutions. Combining these results, we obtain curves that explicitly characterize the classification error for varying values of $κ$. The numerical results validate the theoretical predictions and unveil double-descent phenomena that complement similar recent findings in linear regression settings as well as empirical observations in more complex learning scenarios.

preprint2020arXiv

Analytic Study of Double Descent in Binary Classification: The Impact of Loss

Extensive empirical evidence reveals that, for a wide range of different learning methods and datasets, the risk curve exhibits a double-descent (DD) trend as a function of the model size. In a recent paper [Zeyu,Kammoun,Thrampoulidis,2019] the authors studied binary linear classification models and showed that the test error of gradient descent (GD) with logistic loss undergoes a DD. In this paper, we complement these results by extending them to GD with square loss. We show that the DD phenomenon persists, but we also identify several differences compared to logistic loss. This emphasizes that crucial features of DD curves (such as their transition threshold and global minima) depend both on the training data and on the learning algorithm. We further study the dependence of DD curves on the size of the training set. Similar to our earlier work, our results are analytic: we plot the DD curves by first deriving sharp asymptotics for the test error under Gaussian features. Albeit simple, the models permit a principled study of DD features, the outcomes of which theoretically corroborate related empirical findings occurring in more complex learning tasks.

preprint2020arXiv

Exploring Weight Importance and Hessian Bias in Model Pruning

Model pruning is an essential procedure for building compact and computationally-efficient machine learning models. A key feature of a good pruning algorithm is that it accurately quantifies the relative importance of the model weights. While model pruning has a rich history, we still don't have a full grasp of the pruning mechanics even for relatively simple problems involving linear models or shallow neural nets. In this work, we provide a principled exploration of pruning by building on a natural notion of importance. For linear models, we show that this notion of importance is captured by covariance scaling which connects to the well-known Hessian-based pruning. We then derive asymptotic formulas that allow us to precisely compare the performance of different pruning methods. For neural networks, we demonstrate that the importance can be at odds with larger magnitudes and proper initialization is critical for magnitude-based pruning. Specifically, we identify settings in which weights become more important despite becoming smaller, which in turn leads to a catastrophic failure of magnitude-based pruning. Our results also elucidate that implicit regularization in the form of Hessian structure has a catalytic role in identifying the important weights, which dictate the pruning performance.

preprint2020arXiv

Fundamental Limits of Ridge-Regularized Empirical Risk Minimization in High Dimensions

Empirical Risk Minimization (ERM) algorithms are widely used in a variety of estimation and prediction tasks in signal-processing and machine learning applications. Despite their popularity, a theory that explains their statistical properties in modern regimes where both the number of measurements and the number of unknown parameters is large is only recently emerging. In this paper, we characterize for the first time the fundamental limits on the statistical accuracy of convex ERM for inference in high-dimensional generalized linear models. For a stylized setting with Gaussian features and problem dimensions that grow large at a proportional rate, we start with sharp performance characterizations and then derive tight lower bounds on the estimation and prediction error that hold over a wide class of loss functions and for any value of the regularization parameter. Our precise analysis has several attributes. First, it leads to a recipe for optimally tuning the loss function and the regularization parameter. Second, it allows to precisely quantify the sub-optimality of popular heuristic choices: for instance, we show that optimally-tuned least-squares is (perhaps surprisingly) approximately optimal for standard logistic data, but the sub-optimality gap grows drastically as the signal strength increases. Third, we use the bounds to precisely assess the merits of ridge-regularization as a function of the over-parameterization ratio. Notably, our bounds are expressed in terms of the Fisher Information of random variables that are simple functions of the data distribution, thus making ties to corresponding bounds in classical statistics.

preprint2020arXiv

Regret Bounds for Safe Gaussian Process Bandit Optimization

Many applications require a learner to make sequential decisions given uncertainty regarding both the system's payoff function and safety constraints. In safety-critical systems, it is paramount that the learner's actions do not violate the safety constraints at any stage of the learning process. In this paper, we study a stochastic bandit optimization problem where the unknown payoff and constraint functions are sampled from Gaussian Processes (GPs) first considered in [Srinivas et al., 2010]. We develop a safe variant of GP-UCB called SGP-UCB, with necessary modifications to respect safety constraints at every round. The algorithm has two distinct phases. The first phase seeks to estimate the set of safe actions in the decision set, while the second phase follows the GP-UCB decision rule. Our main contribution is to derive the first sub-linear regret bounds for this problem. We numerically compare SGP-UCB against existing safe Bayesian GP optimization algorithms.

preprint2020arXiv

Safe Linear Thompson Sampling with Side Information

The design and performance analysis of bandit algorithms in the presence of stage-wise safety or reliability constraints has recently garnered significant interest. In this work, we consider the linear stochastic bandit problem under additional \textit{linear safety constraints} that need to be satisfied at each round. We provide a new safe algorithm based on linear Thompson Sampling (TS) for this problem and show a frequentist regret of order $\mathcal{O} (d^{3/2}\log^{1/2}d \cdot T^{1/2}\log^{3/2}T)$, which remarkably matches the results provided by (Abeille et al., 2017) for the standard linear TS algorithm in the absence of safety constraints. We compare the performance of our algorithm with UCB-based safe algorithms and highlight how the inherently randomized nature of TS leads to a superior performance in expanding the set of safe actions the algorithm has access to at each round.

preprint2020arXiv

Sharp Asymptotics and Optimal Performance for Inference in Binary Models

We study convex empirical risk minimization for high-dimensional inference in binary models. Our first result sharply predicts the statistical performance of such estimators in the linear asymptotic regime under isotropic Gaussian features. Importantly, the predictions hold for a wide class of convex loss functions, which we exploit in order to prove a bound on the best achievable performance among them. Notably, we show that the proposed bound is tight for popular binary models (such as Signed, Logistic or Probit), by constructing appropriate loss functions that achieve it. More interestingly, for binary linear classification under the Logistic and Probit models, we prove that the performance of least-squares is no worse than 0.997 and 0.98 times the optimal one. Numerical simulations corroborate our theoretical findings and suggest they are accurate even for relatively small problem dimensions.

preprint2020arXiv

Sharp Guarantees for Solving Random Equations with One-Bit Information

We study the performance of a wide class of convex optimization-based estimators for recovering a signal from corrupted one-bit measurements in high-dimensions. Our general result predicts sharply the performance of such estimators in the linear asymptotic regime when the measurement vectors have entries IID Gaussian. This includes, as a special case, the previously studied least-squares estimator and various novel results for other popular estimators such as least-absolute deviations, hinge-loss and logistic-loss. Importantly, we exploit the fact that our analysis holds for generic convex loss functions to prove a bound on the best achievable performance across the entire class of estimators. Numerical simulations corroborate our theoretical findings and suggest they are accurate even for relatively small problem dimensions.

preprint2016arXiv

Phaseless super-resolution in the continuous domain

Phaseless super-resolution refers to the problem of superresolving a signal from only its low-frequency Fourier magnitude measurements. In this paper, we consider the phaseless super-resolution problem of recovering a sum of sparse Dirac delta functions which can be located anywhere in the continuous time-domain. For such signals in the continuous domain, we propose a novel Semidefinite Programming (SDP) based signal recovery method to achieve the phaseless superresolution. This work extends the recent work of Jaganathan et al. [1], which considered phaseless super-resolution for discrete signals on the grid.

preprint2016arXiv

Precise Error Analysis of Regularized M-estimators in High-dimensions

A popular approach for estimating an unknown signal from noisy, linear measurements is via solving a so called \emph{regularized M-estimator}, which minimizes a weighted combination of a convex loss function and of a convex (typically, non-smooth) regularizer. We accurately predict the squared error performance of such estimators in the high-dimensional proportional regime. The random measurement matrix is assumed to have entries iid Gaussian, only minimal and rather mild regularity conditions are imposed on the loss function, the regularizer, and on the noise and signal distributions. We show that the error converges in probability to a nontrivial limit that is given as the solution to a minimax convex-concave optimization problem on four scalar optimization variables. We identify a new summary parameter, termed the Expected Moreau envelope to play a central role in the error characterization. The \emph{precise} nature of the results permits an accurate performance comparison between different instances of regularized M-estimators and allows to optimally tune the involved parameters (e.g. regularizer parameter, number of measurements). The key ingredient of our proof is the \emph{Convex Gaussian Min-max Theorem} (CGMT) which is a tight and strengthened version of a classical Gaussian comparison inequality that was proved by Gordon in 1988.

preprint2015arXiv

Asymptotically Exact Error Analysis for the Generalized $\ell_2^2$-LASSO

Given an unknown signal $\mathbf{x}_0\in\mathbb{R}^n$ and linear noisy measurements $\mathbf{y}=\mathbf{A}\mathbf{x}_0+σ\mathbf{v}\in\mathbb{R}^m$, the generalized $\ell_2^2$-LASSO solves $\hat{\mathbf{x}}:=\arg\min_{\mathbf{x}}\frac{1}{2}\|\mathbf{y}-\mathbf{A}\mathbf{x}\|_2^2 + σλf(\mathbf{x})$. Here, $f$ is a convex regularization function (e.g. $\ell_1$-norm, nuclear-norm) aiming to promote the structure of $\mathbf{x}_0$ (e.g. sparse, low-rank), and, $λ\geq 0$ is the regularizer parameter. A related optimization problem, though not as popular or well-known, is often referred to as the generalized $\ell_2$-LASSO and takes the form $\hat{\mathbf{x}}:=\arg\min_{\mathbf{x}}\|\mathbf{y}-\mathbf{A}\mathbf{x}\|_2 + λf(\mathbf{x})$, and has been analyzed in [1]. [1] further made conjectures about the performance of the generalized $\ell_2^2$-LASSO. This paper establishes these conjectures rigorously. We measure performance with the normalized squared error $\mathrm{NSE}(σ):=\|\hat{\mathbf{x}}-\mathbf{x}_0\|_2^2/σ^2$. Assuming the entries of $\mathbf{A}$ and $\mathbf{v}$ be i.i.d. standard normal, we precisely characterize the "asymptotic NSE" $\mathrm{aNSE}:=\lim_{σ\rightarrow 0}\mathrm{NSE}(σ)$ when the problem dimensions $m,n$ tend to infinity in a proportional manner. The role of $λ,f$ and $\mathbf{x}_0$ is explicitly captured in the derived expression via means of a single geometric quantity, the Gaussian distance to the subdifferential. We conjecture that $\mathrm{aNSE} = \sup_{σ>0}\mathrm{NSE}(σ)$. We include detailed discussions on the interpretation of our result, make connections to relevant literature and perform computational experiments that validate our theoretical findings.

preprint2015arXiv

BER Analysis of the box relaxation for BPSK Signal Recovery

We study the problem of recovering an $n$-dimensional vector of $\{\pm1\}^n$ (BPSK) signals from $m$ noise corrupted measurements $\mathbf{y}=\mathbf{A}\mathbf{x}_0+\mathbf{z}$. In particular, we consider the box relaxation method which relaxes the discrete set $\{\pm1\}^n$ to the convex set $[-1,1]^n$ to obtain a convex optimization algorithm followed by hard thresholding. When the noise $\mathbf{z}$ and measurement matrix $\mathbf{A}$ have iid standard normal entries, we obtain an exact expression for the bit-wise probability of error $P_e$ in the limit of $n$ and $m$ growing and $\frac{m}{n}$ fixed. At high SNR our result shows that the $P_e$ of box relaxation is within 3dB of the matched filter bound MFB for square systems, and that it approaches MFB as $m $ grows large compared to $n$. Our results also indicates that as $m,n\rightarrow\infty$, for any fixed set of size $k$, the error events of the corresponding $k$ bits in the box relaxation method are independent.

preprint2015arXiv

Isotropically Random Orthogonal Matrices: Performance of LASSO and Minimum Conic Singular Values

Recently, the precise performance of the Generalized LASSO algorithm for recovering structured signals from compressed noisy measurements, obtained via i.i.d. Gaussian matrices, has been characterized. The analysis is based on a framework introduced by Stojnic and heavily relies on the use of Gordon's Gaussian min-max theorem (GMT), a comparison principle on Gaussian processes. As a result, corresponding characterizations for other ensembles of measurement matrices have not been developed. In this work, we analyze the corresponding performance of the ensemble of isotropically random orthogonal (i.r.o.) measurements. We consider the constrained version of the Generalized LASSO and derive a sharp characterization of its normalized squared error in the large-system limit. When compared to its Gaussian counterpart, our result analytically confirms the superiority in performance of the i.r.o. ensemble. Our second result, derives an asymptotic lower bound on the minimum conic singular values of i.r.o. matrices. This bound is larger than the corresponding bound on Gaussian matrices. To prove our results we express i.r.o. matrices in terms of Gaussians and show that, with some modifications, the GMT framework is still applicable.

preprint2015arXiv

The Gaussian min-max theorem in the Presence of Convexity

Gaussian comparison theorems are useful tools in probability theory; they are essential ingredients in the classical proofs of many results in empirical processes and extreme value theory. More recently, they have been used extensively in the analysis of non-smooth optimization problems that arise in the recovery of structured signals from noisy linear observations. We refer to such problems as Primary Optimization (PO) problems. A prominent role in the study of the (PO) problems is played by Gordon's Gaussian min-max theorem (GMT) which provides probabilistic lower bounds on the optimal cost via a simpler Auxiliary Optimization (AO) problem. Motivated by resent work of M. Stojnic, we show that under appropriate convexity assumptions the (AO) problem allows one to tightly bound both the optimal cost, as well as the norm of the solution of the (PO). As an application, we use our result to develop a general framework to tightly characterize the performance (e.g. squared-error) of a wide class of convex optimization algorithms used in the context of noisy signal recovery.

preprint2014arXiv

Optimal Placement of Distributed Energy Storage in Power Networks

We formulate the optimal placement, sizing and control of storage devices in a power network to minimize generation costs with the intent of load shifting. We assume deterministic demand, a linearized DC approximated power flow model and a fixed available storage budget. Our main result proves that when the generation costs are convex and nondecreasing, there always exists an optimal storage capacity allocation that places zero storage at generation-only buses that connect to the rest of the network via single links. This holds regardless of the demand profiles, generation capacities, line-flow limits and characteristics of the storage technologies. Through a counterexample, we illustrate that this result is not generally true for generation buses with multiple connections. For specific network topologies, we also characterize the dependence of the optimal generation cost on the available storage budget, generation capacities and flow constraints.

preprint2014arXiv

Simple Error Bounds for Regularized Noisy Linear Inverse Problems

Consider estimating a structured signal $\mathbf{x}_0$ from linear, underdetermined and noisy measurements $\mathbf{y}=\mathbf{A}\mathbf{x}_0+\mathbf{z}$, via solving a variant of the lasso algorithm: $\hat{\mathbf{x}}=\arg\min_\mathbf{x}\{ \|\mathbf{y}-\mathbf{A}\mathbf{x}\|_2+λf(\mathbf{x})\}$. Here, $f$ is a convex function aiming to promote the structure of $\mathbf{x}_0$, say $\ell_1$-norm to promote sparsity or nuclear norm to promote low-rankness. We assume that the entries of $\mathbf{A}$ are independent and normally distributed and make no assumptions on the noise vector $\mathbf{z}$, other than it being independent of $\mathbf{A}$. Under this generic setup, we derive a general, non-asymptotic and rather tight upper bound on the $\ell_2$-norm of the estimation error $\|\hat{\mathbf{x}}-\mathbf{x}_0\|_2$. Our bound is geometric in nature and obeys a simple formula; the roles of $λ$, $f$ and $\mathbf{x}_0$ are all captured by a single summary parameter $δ(λ\partial((f(\mathbf{x}_0)))$, termed the Gaussian squared distance to the scaled subdifferential. We connect our result to the literature and verify its validity through simulations.

preprint2013arXiv

Simple Bounds for Noisy Linear Inverse Problems with Exact Side Information

This paper considers the linear inverse problem where we wish to estimate a structured signal $x$ from its corrupted observations. When the problem is ill-posed, it is natural to make use of a convex function $f(\cdot)$ that exploits the structure of the signal. For example, $\ell_1$ norm can be used for sparse signals. To carry out the estimation, we consider two well-known convex programs: 1) Second order cone program (SOCP), and, 2) Lasso. Assuming Gaussian measurements, we show that, if precise information about the value $f(x)$ or the $\ell_2$-norm of the noise is available, one can do a particularly good job at estimation. In particular, the reconstruction error becomes proportional to the "sparsity" of the signal rather than the ambient dimension of the noise vector. We connect our results to existing works and provide a discussion on the relation of our results to the standard least-squares problem. Our error bounds are non-asymptotic and sharp, they apply to arbitrary convex functions and do not assume any distribution on the noise.

preprint2013arXiv

The Squared-Error of Generalized LASSO: A Precise Analysis

We consider the problem of estimating an unknown signal $x_0$ from noisy linear observations $y = Ax_0 + z\in R^m$. In many practical instances, $x_0$ has a certain structure that can be captured by a structure inducing convex function $f(\cdot)$. For example, $\ell_1$ norm can be used to encourage a sparse solution. To estimate $x_0$ with the aid of $f(\cdot)$, we consider the well-known LASSO method and provide sharp characterization of its performance. We assume the entries of the measurement matrix $A$ and the noise vector $z$ have zero-mean normal distributions with variances $1$ and $σ^2$ respectively. For the LASSO estimator $x^*$, we attempt to calculate the Normalized Square Error (NSE) defined as $\frac{\|x^*-x_0\|_2^2}{σ^2}$ as a function of the noise level $σ$, the number of observations $m$ and the structure of the signal. We show that, the structure of the signal $x_0$ and choice of the function $f(\cdot)$ enter the error formulae through the summary parameters $D(cone)$ and $D(λ)$, which are defined as the Gaussian squared-distances to the subdifferential cone and to the $λ$-scaled subdifferential, respectively. The first LASSO estimator assumes a-priori knowledge of $f(x_0)$ and is given by $\arg\min_{x}\{{\|y-Ax\|_2}~\text{subject to}~f(x)\leq f(x_0)\}$. We prove that its worst case NSE is achieved when $σ\rightarrow 0$ and concentrates around $\frac{D(cone)}{m-D(cone)}$. Secondly, we consider $\arg\min_{x}\{\|y-Ax\|_2+λf(x)\}$, for some $λ\geq 0$. This time the NSE formula depends on the choice of $λ$ and is given by $\frac{D(λ)}{m-D(λ)}$. We then establish a mapping between this and the third estimator $\arg\min_{x}\{\frac{1}{2}\|y-Ax\|_2^2+ λf(x)\}$. Finally, for a number of important structured signal classes, we translate our abstract formulae to closed-form upper bounds on the NSE.

Christos Thrampoulidis

What is connected

Connect this record

See the researcher in context

Building this map preview

25 published item(s)

High-Dimensional Statistics: Reflections on Progress and Open Problems

Why Loss Re-weighting Works If You Stop Early: Training Dynamics of Unconstrained Features

AutoBalance: Optimized Loss Functions for Imbalanced Data

FedNest: Federated Bilevel, Minimax, and Compositional Optimization

Imbalance Trouble: Revisiting Neural-Collapse Geometry

Multi-Environment Meta-Learning in Stochastic Linear Bandits

On how to avoid exacerbating spurious correlations when models are overparameterized

A Model of Double Descent for High-dimensional Binary Linear Classification

Analytic Study of Double Descent in Binary Classification: The Impact of Loss

Exploring Weight Importance and Hessian Bias in Model Pruning

Fundamental Limits of Ridge-Regularized Empirical Risk Minimization in High Dimensions

Regret Bounds for Safe Gaussian Process Bandit Optimization

Safe Linear Thompson Sampling with Side Information

Sharp Asymptotics and Optimal Performance for Inference in Binary Models

Sharp Guarantees for Solving Random Equations with One-Bit Information

Phaseless super-resolution in the continuous domain

Precise Error Analysis of Regularized M-estimators in High-dimensions

Asymptotically Exact Error Analysis for the Generalized $\ell_2^2$-LASSO

BER Analysis of the box relaxation for BPSK Signal Recovery

Isotropically Random Orthogonal Matrices: Performance of LASSO and Minimum Conic Singular Values

The Gaussian min-max theorem in the Presence of Convexity

Optimal Placement of Distributed Energy Storage in Power Networks

Simple Error Bounds for Regularized Noisy Linear Inverse Problems

Simple Bounds for Noisy Linear Inverse Problems with Exact Side Information

The Squared-Error of Generalized LASSO: A Precise Analysis