Researcher profile

Christos Thrampoulidis

Christos Thrampoulidis contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
15works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

15 published item(s)

preprint2026arXiv

High-Dimensional Statistics: Reflections on Progress and Open Problems

Over the past two decades, the field of high-dimensional statistics has experienced substantial progress, driven largely by technological advances that have dramatically reduced the cost and effort for data collection and storage across a broad range of domains, including biology, medicine, astronomy, and the social and environmental sciences. Modern datasets are increasingly complex, often exhibiting rich dependency, heterogeneity, and other features that challenge traditional statistical methods. In response, high-dimensional statistics has evolved to address more sophisticated estimation and inference problems. This evolution has, in turn, fostered deep connections with and contributions to a wide range of research areas, including optimization, concentration of measure, random matrix theory, information theory, and theoretical computer science. Given the rapid pace of recent developments in high-dimensional statistics, our goal is to synthesize representative advances, highlight common themes and open problems, and point to important works that offer entry points into the field.

preprint2026arXiv

Why Loss Re-weighting Works If You Stop Early: Training Dynamics of Unconstrained Features

The application of loss reweighting in modern deep learning presents a nuanced picture. While it fails to alter the terminal learning phase in overparameterized deep neural networks (DNNs) trained on high-dimensional datasets, empirical evidence consistently shows it offers significant benefits early in training. To transparently demonstrate and analyze this phenomenon, we introduce a small-scale model (SSM). This model is specifically designed to abstract the inherent complexities of both the DNN architecture and the input data, while maintaining key information about the structure of imbalance within its spectral components. On the one hand, the SSM reveals how vanilla empirical risk minimization preferentially learns to distinguish majority classes over minorities early in training, consequently delaying minority learning. In stark contrast, reweighting restores balanced learning dynamics, enabling the simultaneous learning of features associated with both majorities and minorities.

preprint2022arXiv

AutoBalance: Optimized Loss Functions for Imbalanced Data

Imbalanced datasets are commonplace in modern machine learning problems. The presence of under-represented classes or groups with sensitive attributes results in concerns about generalization and fairness. Such concerns are further exacerbated by the fact that large capacity deep nets can perfectly fit the training data and appear to achieve perfect accuracy and fairness during training, but perform poorly during test. To address these challenges, we propose AutoBalance, a bi-level optimization framework that automatically designs a training loss function to optimize a blend of accuracy and fairness-seeking objectives. Specifically, a lower-level problem trains the model weights, and an upper-level problem tunes the loss function by monitoring and optimizing the desired objective over the validation data. Our loss design enables personalized treatment for classes/groups by employing a parametric cross-entropy loss and individualized data augmentation schemes. We evaluate the benefits and performance of our approach for the application scenarios of imbalanced and group-sensitive classification. Extensive empirical evaluations demonstrate the benefits of AutoBalance over state-of-the-art approaches. Our experimental findings are complemented with theoretical insights on loss function design and the benefits of train-validation split. All code is available open-source.

preprint2022arXiv

FedNest: Federated Bilevel, Minimax, and Compositional Optimization

Standard federated optimization methods successfully apply to stochastic problems with single-level structure. However, many contemporary ML problems -- including adversarial robustness, hyperparameter tuning, and actor-critic -- fall under nested bilevel programming that subsumes minimax and compositional optimization. In this work, we propose \fedblo: A federated alternating stochastic gradient method to address general nested problems. We establish provable convergence rates for \fedblo in the presence of heterogeneous data and introduce variations for bilevel, minimax, and compositional optimization. \fedblo introduces multiple innovations including federated hypergradient computation and variance reduction to address inner-level heterogeneity. We complement our theory with experiments on hyperparameter \& hyper-representation learning and minimax optimization that demonstrate the benefits of our method in practice. Code is available at https://github.com/ucr-optml/FedNest.

preprint2022arXiv

Imbalance Trouble: Revisiting Neural-Collapse Geometry

Neural Collapse refers to the remarkable structural properties characterizing the geometry of class embeddings and classifier weights, found by deep nets when trained beyond zero training error. However, this characterization only holds for balanced data. Here we thus ask whether it can be made invariant to class imbalances. Towards this end, we adopt the unconstrained-features model (UFM), a recent theoretical model for studying neural collapse, and introduce Simplex-Encoded-Labels Interpolation (SELI) as an invariant characterization of the neural collapse phenomenon. Specifically, we prove for the UFM with cross-entropy loss and vanishing regularization that, irrespective of class imbalances, the embeddings and classifiers always interpolate a simplex-encoded label matrix and that their individual geometries are determined by the SVD factors of this same label matrix. We then present extensive experiments on synthetic and real datasets that confirm convergence to the SELI geometry. However, we caution that convergence worsens with increasing imbalances. We theoretically support this finding by showing that unlike the balanced case, when minorities are present, ridge-regularization plays a critical role in tweaking the geometry. This defines new questions and motivates further investigations into the impact of class imbalances on the rates at which first-order methods converge to their asymptotically preferred solutions.

preprint2022arXiv

Multi-Environment Meta-Learning in Stochastic Linear Bandits

In this work we investigate meta-learning (or learning-to-learn) approaches in multi-task linear stochastic bandit problems that can originate from multiple environments. Inspired by the work of [1] on meta-learning in a sequence of linear bandit problems whose parameters are sampled from a single distribution (i.e., a single environment), here we consider the feasibility of meta-learning when task parameters are drawn from a mixture distribution instead. For this problem, we propose a regularized version of the OFUL algorithm that, when trained on tasks with labeled environments, achieves low regret on a new task without requiring knowledge of the environment from which the new task originates. Specifically, our regret bound for the new algorithm captures the effect of environment misclassification and highlights the benefits over learning each task separately or meta-learning without recognition of the distinct mixture components.

preprint2022arXiv

On how to avoid exacerbating spurious correlations when models are overparameterized

Overparameterized models fail to generalize well in the presence of data imbalance even when combined with traditional techniques for mitigating imbalances. This paper focuses on imbalanced classification datasets, in which a small subset of the population -- a minority -- may contain features that correlate spuriously with the class label. For a parametric family of cross-entropy loss modifications and a representative Gaussian mixture model, we derive non-asymptotic generalization bounds on the worst-group error that shed light on the role of different hyper-parameters. Specifically, we prove that, when appropriately tuned, the recently proposed VS-loss learns a model that is fair towards minorities even when spurious features are strong. On the other hand, alternative heuristics, such as the weighted CE and the LA-loss, can fail dramatically. Compared to previous works, our bounds hold for more general models, they are non-asymptotic, and, they apply even at scenarios of extreme imbalance.

preprint2020arXiv

A Model of Double Descent for High-dimensional Binary Linear Classification

We consider a model for logistic regression where only a subset of features of size $p$ is used for training a linear classifier over $n$ training samples. The classifier is obtained by running gradient descent (GD) on logistic loss. For this model, we investigate the dependence of the classification error on the overparameterization ratio $κ=p/n$. First, building on known deterministic results on the implicit bias of GD, we uncover a phase-transition phenomenon for the case of Gaussian features: the classification error of GD is the same as that of the maximum-likelihood (ML) solution when $κ<κ_\star$, and that of the max-margin (SVM) solution when $κ>κ_\star$. Next, using the convex Gaussian min-max theorem (CGMT), we sharply characterize the performance of both the ML and the SVM solutions. Combining these results, we obtain curves that explicitly characterize the classification error for varying values of $κ$. The numerical results validate the theoretical predictions and unveil double-descent phenomena that complement similar recent findings in linear regression settings as well as empirical observations in more complex learning scenarios.

preprint2020arXiv

Analytic Study of Double Descent in Binary Classification: The Impact of Loss

Extensive empirical evidence reveals that, for a wide range of different learning methods and datasets, the risk curve exhibits a double-descent (DD) trend as a function of the model size. In a recent paper [Zeyu,Kammoun,Thrampoulidis,2019] the authors studied binary linear classification models and showed that the test error of gradient descent (GD) with logistic loss undergoes a DD. In this paper, we complement these results by extending them to GD with square loss. We show that the DD phenomenon persists, but we also identify several differences compared to logistic loss. This emphasizes that crucial features of DD curves (such as their transition threshold and global minima) depend both on the training data and on the learning algorithm. We further study the dependence of DD curves on the size of the training set. Similar to our earlier work, our results are analytic: we plot the DD curves by first deriving sharp asymptotics for the test error under Gaussian features. Albeit simple, the models permit a principled study of DD features, the outcomes of which theoretically corroborate related empirical findings occurring in more complex learning tasks.

preprint2020arXiv

Exploring Weight Importance and Hessian Bias in Model Pruning

Model pruning is an essential procedure for building compact and computationally-efficient machine learning models. A key feature of a good pruning algorithm is that it accurately quantifies the relative importance of the model weights. While model pruning has a rich history, we still don&#39;t have a full grasp of the pruning mechanics even for relatively simple problems involving linear models or shallow neural nets. In this work, we provide a principled exploration of pruning by building on a natural notion of importance. For linear models, we show that this notion of importance is captured by covariance scaling which connects to the well-known Hessian-based pruning. We then derive asymptotic formulas that allow us to precisely compare the performance of different pruning methods. For neural networks, we demonstrate that the importance can be at odds with larger magnitudes and proper initialization is critical for magnitude-based pruning. Specifically, we identify settings in which weights become more important despite becoming smaller, which in turn leads to a catastrophic failure of magnitude-based pruning. Our results also elucidate that implicit regularization in the form of Hessian structure has a catalytic role in identifying the important weights, which dictate the pruning performance.

preprint2020arXiv

Fundamental Limits of Ridge-Regularized Empirical Risk Minimization in High Dimensions

Empirical Risk Minimization (ERM) algorithms are widely used in a variety of estimation and prediction tasks in signal-processing and machine learning applications. Despite their popularity, a theory that explains their statistical properties in modern regimes where both the number of measurements and the number of unknown parameters is large is only recently emerging. In this paper, we characterize for the first time the fundamental limits on the statistical accuracy of convex ERM for inference in high-dimensional generalized linear models. For a stylized setting with Gaussian features and problem dimensions that grow large at a proportional rate, we start with sharp performance characterizations and then derive tight lower bounds on the estimation and prediction error that hold over a wide class of loss functions and for any value of the regularization parameter. Our precise analysis has several attributes. First, it leads to a recipe for optimally tuning the loss function and the regularization parameter. Second, it allows to precisely quantify the sub-optimality of popular heuristic choices: for instance, we show that optimally-tuned least-squares is (perhaps surprisingly) approximately optimal for standard logistic data, but the sub-optimality gap grows drastically as the signal strength increases. Third, we use the bounds to precisely assess the merits of ridge-regularization as a function of the over-parameterization ratio. Notably, our bounds are expressed in terms of the Fisher Information of random variables that are simple functions of the data distribution, thus making ties to corresponding bounds in classical statistics.

preprint2020arXiv

Regret Bounds for Safe Gaussian Process Bandit Optimization

Many applications require a learner to make sequential decisions given uncertainty regarding both the system&#39;s payoff function and safety constraints. In safety-critical systems, it is paramount that the learner&#39;s actions do not violate the safety constraints at any stage of the learning process. In this paper, we study a stochastic bandit optimization problem where the unknown payoff and constraint functions are sampled from Gaussian Processes (GPs) first considered in [Srinivas et al., 2010]. We develop a safe variant of GP-UCB called SGP-UCB, with necessary modifications to respect safety constraints at every round. The algorithm has two distinct phases. The first phase seeks to estimate the set of safe actions in the decision set, while the second phase follows the GP-UCB decision rule. Our main contribution is to derive the first sub-linear regret bounds for this problem. We numerically compare SGP-UCB against existing safe Bayesian GP optimization algorithms.

preprint2020arXiv

Safe Linear Thompson Sampling with Side Information

The design and performance analysis of bandit algorithms in the presence of stage-wise safety or reliability constraints has recently garnered significant interest. In this work, we consider the linear stochastic bandit problem under additional \textit{linear safety constraints} that need to be satisfied at each round. We provide a new safe algorithm based on linear Thompson Sampling (TS) for this problem and show a frequentist regret of order $\mathcal{O} (d^{3/2}\log^{1/2}d \cdot T^{1/2}\log^{3/2}T)$, which remarkably matches the results provided by (Abeille et al., 2017) for the standard linear TS algorithm in the absence of safety constraints. We compare the performance of our algorithm with UCB-based safe algorithms and highlight how the inherently randomized nature of TS leads to a superior performance in expanding the set of safe actions the algorithm has access to at each round.

preprint2020arXiv

Sharp Asymptotics and Optimal Performance for Inference in Binary Models

We study convex empirical risk minimization for high-dimensional inference in binary models. Our first result sharply predicts the statistical performance of such estimators in the linear asymptotic regime under isotropic Gaussian features. Importantly, the predictions hold for a wide class of convex loss functions, which we exploit in order to prove a bound on the best achievable performance among them. Notably, we show that the proposed bound is tight for popular binary models (such as Signed, Logistic or Probit), by constructing appropriate loss functions that achieve it. More interestingly, for binary linear classification under the Logistic and Probit models, we prove that the performance of least-squares is no worse than 0.997 and 0.98 times the optimal one. Numerical simulations corroborate our theoretical findings and suggest they are accurate even for relatively small problem dimensions.

preprint2020arXiv

Sharp Guarantees for Solving Random Equations with One-Bit Information

We study the performance of a wide class of convex optimization-based estimators for recovering a signal from corrupted one-bit measurements in high-dimensions. Our general result predicts sharply the performance of such estimators in the linear asymptotic regime when the measurement vectors have entries IID Gaussian. This includes, as a special case, the previously studied least-squares estimator and various novel results for other popular estimators such as least-absolute deviations, hinge-loss and logistic-loss. Importantly, we exploit the fact that our analysis holds for generic convex loss functions to prove a bound on the best achievable performance across the entire class of estimators. Numerical simulations corroborate our theoretical findings and suggest they are accurate even for relatively small problem dimensions.