Source author record

Maryam Fazel

Maryam Fazel appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning Systems and Control math.ST Computer Science and Game Theory Data Structures and Algorithms eess.SY Information Theory math.DS math.IT Computation Methodology quant-ph Statistics Theory

Catalog footprint

What is connected

23works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Average Gradient Outer Product in kernel regression provably recovers the central subspace for multi-index models

We study a prototypical situation when a learned predictor can discover useful low-dimensional structure in data, while using fewer samples than are needed for accurate prediction. Specifically, we consider the problem of recovering a multi-index polynomial $f^*(x)=h(Ux)$, with $U\in\mathbb{R}^{r\times d}$ and $r\ll d$, from finitely many data/label pairs. Importantly, the target function depends on input $x$ only through the projection onto an unknown $r$-dimensional central subspace. The algorithm we analyze is appealingly simple: fit kernel ridge regression (KRR) to the data and compute the Average Gradient Outer Product (AGOP) from the fitted predictor. Our main results show that under reasonable assumptions the top $r$-dimensional eigenspace of AGOP provably recovers the central subspace, even in regimes when the prediction error remains large. Specifically, if the target function $f^*$ has degree $p^*$, it is known that $n\asymp d^{p^*}$ samples are necessary for KRR to achieve accurate prediction. In contrast, we show that if a low degree $p$ component of $f^*$ already carries all relevant directions for prediction, subspace recovery occurs in the much lower sample regime $n\asymp d^{p+δ}$ for any $δ\in(0,1)$. Our results thus demonstrate a separation between prediction and representation, and provide an explanation for why iterative kernel methods such as Recursive Feature Machines (RFM) can be sample-efficient in practice.

preprint2026arXiv

High-dimensional Limit of SGD for Diagonal Linear Networks

Understanding the behavior of stochastic gradient methods is a central problem in modern machine learning. Recent work has highlighted diagonal linear networks as a simplified yet expressive setting for analyzing the optimization and generalization properties of neural models. In this work, we show that in the high-dimensional regime, stochastic gradient descent on diagonal linear networks is well-approximated by continuous dynamics governed by a stochastic differential equation (SDE), which explicitly decouples the drift from the gradient noise. We further derive a deterministic partial differential equation whose solution propagates the relevant state of the iterates and characterizes the time evolution of a broad class of observable statistics, including the risk, curvature, and other metrics for optimality. Finally, we show that, under a suitable parametrization, the stochastic dynamics are globally well posed and converge exponentially fast to zero risk with high probability, yielding a fully explicit non-asymptotic description of their long-time behavior. Numerical simulations corroborate our theoretical findings.

preprint2026arXiv

Unregularized Linear Convergence in Zero-Sum Game from Preference Feedback

Aligning large language models (LLMs) with human preferences has proven effective for enhancing model capabilities, yet standard preference modeling using the Bradley-Terry model assumes transitivity, overlooking the inherent complexity of human population preferences. Nash learning from human feedback (NLHF) addresses this by framing non-transitive preferences as a two-player zero-sum game, where alignment reduces to finding the Nash equilibrium (NE). However, existing algorithms typically rely on regularization, incurring unavoidable bias when computing the duality gap in the original game. In this work, we provide the first convergence guarantee for Optimistic Multiplicative Weights Update ($\mathtt{OMWU}$) in NLHF, showing that it achieves last-iterate linear convergence after a burn-in phase whenever an NE with full support exists, with an instance-dependent linear convergence rate to the original NE, measured by duality gaps. Compared to prior results in Wei et al. (2020), we do not require the assumption of NE uniqueness. Our analysis identifies a novel marginal convergence behavior, where the probability of rarely played actions grows exponentially from exponentially small values, enabling exponentially better dependence on instance-dependent constants than prior results. Experiments corroborate the theoretical strengths of $\mathtt{OMWU}$ in both tabular and neural policy classes, demonstrating its potential for LLM applications.

preprint2022arXiv

Decision-Dependent Risk Minimization in Geometrically Decaying Dynamic Environments

This paper studies the problem of expected loss minimization given a data distribution that is dependent on the decision-maker's action and evolves dynamically in time according to a geometric decay process. Novel algorithms for both the information setting in which the decision-maker has a first order gradient oracle and the setting in which they have simply a loss function oracle are introduced. The algorithms operate on the same underlying principle: the decision-maker repeatedly deploys a fixed decision over the length of an epoch, thereby allowing the dynamically changing environment to sufficiently mix before updating the decision. The iteration complexity in each of the settings is shown to match existing rates for first and zero order stochastic gradient methods up to logarithmic factors. The algorithms are evaluated on a "semi-synthetic" example using real world data from the SFpark dynamic pricing pilot study; it is shown that the announced prices result in an improvement for the institution's objective (target occupancy), while achieving an overall reduction in parking rates.

preprint2022arXiv

Escaping High-order Saddles in Policy Optimization for Linear Quadratic Gaussian (LQG) Control

First order policy optimization has been widely used in reinforcement learning. It guarantees to find the optimal policy for the state-feedback linear quadratic regulator (LQR). However, the performance of policy optimization remains unclear for the linear quadratic Gaussian (LQG) control where the LQG cost has spurious suboptimal stationary points. In this paper, we introduce a novel perturbed policy gradient (PGD) method to escape a large class of bad stationary points (including high-order saddles). In particular, based on the specific structure of LQG, we introduce a novel reparameterization procedure which converts the iterate from a high-order saddle to a strict saddle, from which standard random perturbations in PGD can escape efficiently. We further characterize the high-order saddles that can be escaped by our algorithm.

preprint2022arXiv

Fast First-Order Methods for Monotone Strongly DR-Submodular Maximization

Continuous DR-submodular functions are a class of functions that satisfy the Diminishing Returns (DR) property, which implies that they are concave along non-negative directions. Existing works have studied monotone continuous DR-submodular maximization subject to a convex constraint and have proposed efficient algorithms with approximation guarantees. However, in many applications, e.g., computing the stability number of a graph and mean-field inference for probabilistic log-submodular models, the DR-submodular function has the additional property of being \emph{strongly} concave along non-negative directions that could be utilized for obtaining faster convergence rates. In this paper, we first introduce and characterize the class of \emph{strongly DR-submodular} functions and show how such a property implies strong concavity along non-negative directions. Then, we study $L$-smooth monotone strongly DR-submodular functions that have bounded curvature, and we show how to exploit such additional structure to obtain algorithms with improved approximation guarantees and faster convergence rates for the maximization problem. In particular, we propose the SDRFW algorithm that matches the provably optimal $1-\frac{c}{e}$ approximation ratio after only $\lceil\frac{L}μ\rceil$ iterations, where $c\in[0,1]$ and $μ\geq 0$ are the curvature and the strong DR-submodularity parameter. Furthermore, we study the Projected Gradient Ascent (PGA) method for this problem and provide a refined analysis of the algorithm with an improved $\frac{1}{1+c}$ approximation ratio and a linear convergence rate. Given that both algorithms require knowledge of the smoothness parameter $L$, we provide a \emph{novel} characterization of $L$ for DR-submodular functions showing that in many cases, computing $L$ could be formulated as a convex problem, i.e., a geometric program, that could be solved efficiently.

preprint2022arXiv

Improved Rates for Derivative Free Gradient Play in Strongly Monotone Games

The influential work of Bravo et al. 2018 shows that derivative free play in strongly monotone games has complexity $O(d^2/\varepsilon^3)$, where $\varepsilon$ is the target accuracy on the expected squared distance to the solution. This note shows that the efficiency estimate is actually $O(d^2/\varepsilon^2)$, which reduces to the known efficiency guarantee for the method in unconstrained optimization. The argument we present simple interprets the method as stochastic gradient play on a slightly perturbed strongly monotone game.

preprint2022arXiv

Multiplayer Performative Prediction: Learning in Decision-Dependent Games

Learning problems commonly exhibit an interesting feedback mechanism wherein the population data reacts to competing decision makers' actions. This paper formulates a new game theoretic framework for this phenomenon, called "multi-player performative prediction". We focus on two distinct solution concepts, namely (i) performatively stable equilibria and (ii) Nash equilibria of the game. The latter equilibria are arguably more informative, but can be found efficiently only when the game is monotone. We show that under mild assumptions, the performatively stable equilibria can be found efficiently by a variety of algorithms, including repeated retraining and the repeated (stochastic) gradient method. We then establish transparent sufficient conditions for strong monotonicity of the game and use them to develop algorithms for finding Nash equilibria. We investigate derivative free methods and adaptive gradient algorithms wherein each player alternates between learning a parametric description of their distribution and gradient steps on the empirical risk. Synthetic and semi-synthetic numerical experiments illustrate the results.

preprint2022arXiv

System Identification via Nuclear Norm Regularization

This paper studies the problem of identifying low-order linear systems via Hankel nuclear norm regularization. Hankel regularization encourages the low-rankness of the Hankel matrix, which maps to the low-orderness of the system. We provide novel statistical analysis for this regularization and carefully contrast it with the unregularized ordinary least-squares (OLS) estimator. Our analysis leads to new bounds on estimating the impulse response and the Hankel matrix associated with the linear system. We first design an input excitation and show that Hankel regularization enables one to recover the system using optimal number of observations in the true system order and achieve strong statistical estimation rates. Surprisingly, we demonstrate that the input design indeed matters, by showing that intuitive choices such as i.i.d. Gaussian input leads to provably sub-optimal sample complexity. To better understand the benefits of regularization, we also revisit the OLS estimator. Besides refining existing bounds, we experimentally identify when regularized approach improves over OLS: (1) For low-order systems with slow impulse-response decay, OLS method performs poorly in terms of sample complexity, (2) Hankel matrix returned by regularization has a more clear singular value gap that ease identification of the system order, (3) Hankel regularization is less sensitive to hyperparameter choice. Finally, we establish model selection guarantees through a joint train-validation procedure where we tune the regularization parameter for near-optimal estimation.

preprint2022arXiv

Towards Sample-efficient Overparameterized Meta-learning

An overarching goal in machine learning is to build a generalizable model with few samples. To this end, overparameterization has been the subject of immense interest to explain the generalization ability of deep nets even when the size of the dataset is smaller than that of the model. While the prior literature focuses on the classical supervised setting, this paper aims to demystify overparameterization for meta-learning. Here we have a sequence of linear-regression tasks and we ask: (1) Given earlier tasks, what is the optimal linear representation of features for a new downstream task? and (2) How many samples do we need to build this representation? This work shows that surprisingly, overparameterization arises as a natural answer to these fundamental meta-learning questions. Specifically, for (1), we first show that learning the optimal representation coincides with the problem of designing a task-aware regularization to promote inductive bias. We leverage this inductive bias to explain how the downstream task actually benefits from overparameterization, in contrast to prior works on few-shot learning. For (2), we develop a theory to explain how feature covariance can implicitly help reduce the sample complexity well below the degrees of freedom and lead to small estimation error. We then integrate these findings to obtain an overall performance guarantee for our meta-learning algorithm. Numerical experiments on real and synthetic data verify our insights on overparameterized meta-learning.

preprint2021arXiv

Sample Efficient Subspace-based Representations for Nonlinear Meta-Learning

Constructing good representations is critical for learning complex tasks in a sample efficient manner. In the context of meta-learning, representations can be constructed from common patterns of previously seen tasks so that a future task can be learned quickly. While recent works show the benefit of subspace-based representations, such results are limited to linear-regression tasks. This work explores a more general class of nonlinear tasks with applications ranging from binary classification, generalized linear models and neural nets. We prove that subspace-based representations can be learned in a sample-efficient manner and provably benefit future tasks in terms of sample complexity. Numerical results verify the theoretical predictions in classification and neural-network regression tasks.

preprint2016arXiv

Decomposable Norm Minimization with Proximal-Gradient Homotopy Algorithm

We study the convergence rate of the proximal-gradient homotopy algorithm applied to norm-regularized linear least squares problems, for a general class of norms. The homotopy algorithm reduces the regularization parameter in a series of steps, and uses a proximal-gradient algorithm to solve the problem at each step. Proximal-gradient algorithm has a linear rate of convergence given that the objective function is strongly convex, and the gradient of the smooth component of the objective function is Lipschitz continuous. In many applications, the objective function in this type of problem is not strongly convex, especially when the problem is high-dimensional and regularizers are chosen that induce sparsity or low-dimensionality. We show that if the linear sampling matrix satisfies certain assumptions and the regularizing norm is decomposable, proximal-gradient homotopy algorithm converges with a \emph{linear rate} even though the objective function is not strongly convex. Our result generalizes results on the linear convergence of homotopy algorithm for $l_1$-regularized least squares problems. Numerical experiments are presented that support the theoretical convergence rate analysis.

preprint2016arXiv

Worst Case Competitive Analysis of Online Algorithms for Conic Optimization

Online optimization covers problems such as online resource allocation, online bipartite matching, adwords (a central problem in e-commerce and advertising), and adwords with separable concave returns. We analyze the worst case competitive ratio of two primal-dual algorithms for a class of online convex (conic) optimization problems that contains the previous examples as special cases defined on the positive orthant. We derive a sufficient condition on the objective function that guarantees a constant worst case competitive ratio (greater than or equal to $\frac{1}{2}$) for monotone objective functions. We provide new examples of online problems on the positive orthant and the positive semidefinite cone that satisfy the sufficient condition. We show how smoothing can improve the competitive ratio of these algorithms, and in particular for separable functions, we show that the optimal smoothing can be derived by solving a convex optimization problem. This result allows us to directly optimize the competitive ratio bound over a class of smoothing functions, and hence design effective smoothing customized for a given cost function.

preprint2015arXiv

Exponentiated Subgradient Algorithm for Online Optimization under the Random Permutation Model

Online optimization problems arise in many resource allocation tasks, where the future demands for each resource and the associated utility functions change over time and are not known apriori, yet resources need to be allocated at every point in time despite the future uncertainty. In this paper, we consider online optimization problems with general concave utilities. We modify and extend an online optimization algorithm proposed by Devanur et al. for linear programming to this general setting. The model we use for the arrival of the utilities and demands is known as the random permutation model, where a fixed collection of utilities and demands are presented to the algorithm in random order. We prove that under this model the algorithm achieves a competitive ratio of $1-O(ε)$ under a near-optimal assumption that the bid to budget ratio is $O (\frac{ε^2}{\log({m}/ε)})$, where $m$ is the number of resources, while enjoying a significantly lower computational cost than the optimal algorithm proposed by Kesselheim et al. We draw a connection between the proposed algorithm and subgradient methods used in convex optimization. In addition, we present numerical experiments that demonstrate the performance and speed of this algorithm in comparison to existing algorithms.

preprint2015arXiv

Relative Density and Exact Recovery in Heterogeneous Stochastic Block Models

The Stochastic Block Model (SBM) is a widely used random graph model for networks with communities. Despite the recent burst of interest in recovering communities in the SBM from statistical and computational points of view, there are still gaps in understanding the fundamental information theoretic and computational limits of recovery. In this paper, we consider the SBM in its full generality, where there is no restriction on the number and sizes of communities or how they grow with the number of nodes, as well as on the connection probabilities inside or across communities. This generality allows us to move past the artifacts of homogenous SBM, and understand the right parameters (such as the relative densities of communities) that define the various recovery thresholds. We outline the implications of our generalizations via a set of illustrative examples. For instance, $\log n$ is considered to be the standard lower bound on the cluster size for exact recovery via convex methods, for homogenous SBM. We show that it is possible, in the right circumstances (when sizes are spread and the smaller the cluster, the denser), to recover very small clusters (up to $\sqrt{\log n}$ size), if there are just a few of them (at most polylogarithmic in $n$).

preprint2014arXiv

Learning Graphical Models With Hubs

We consider the problem of learning a high-dimensional graphical model in which certain hub nodes are highly-connected to many other nodes. Many authors have studied the use of an l1 penalty in order to learn a sparse graph in high-dimensional setting. However, the l1 penalty implicitly assumes that each edge is equally likely and independent of all other edges. We propose a general framework to accommodate more realistic networks with hub nodes, using a convex formulation that involves a row-column overlap norm penalty. We apply this general framework to three widely-used probabilistic graphical models: the Gaussian graphical model, the covariance graph model, and the binary Ising model. An alternating direction method of multipliers algorithm is used to solve the corresponding convex optimization problems. On synthetic data, we demonstrate that our proposed framework outperforms competitors that do not explicitly model hub nodes. We illustrate our proposal on a webpage data set and a gene expression data set.

preprint2014arXiv

Node-Based Learning of Multiple Gaussian Graphical Models

We consider the problem of estimating high-dimensional Gaussian graphical models corresponding to a single set of variables under several distinct conditions. This problem is motivated by the task of recovering transcriptional regulatory networks on the basis of gene expression data {containing heterogeneous samples, such as different disease states, multiple species, or different developmental stages}. We assume that most aspects of the conditional dependence networks are shared, but that there are some structured differences between them. Rather than assuming that similarities and differences between networks are driven by individual edges, we take a node-based approach, which in many cases provides a more intuitive interpretation of the network differences. We consider estimation under two distinct assumptions: (1) differences between the K networks are due to individual nodes that are perturbed across conditions, or (2) similarities among the K networks are due to the presence of common hub nodes that are shared across all K networks. Using a row-column overlap norm penalty function, we formulate two convex optimization problems that correspond to these two assumptions. We solve these problems using an alternating direction method of multipliers algorithm, and we derive a set of necessary and sufficient conditions that allows us to decompose the problem into independent subproblems so that our algorithm can be scaled to high-dimensional settings. Our proposal is illustrated on synthetic data, a webpage data set, and a brain cancer gene expression data set.

preprint2014arXiv

Simultaneously Structured Models with Application to Sparse and Low-rank Matrices

The topic of recovery of a structured model given a small number of linear observations has been well-studied in recent years. Examples include recovering sparse or group-sparse vectors, low-rank matrices, and the sum of sparse and low-rank matrices, among others. In various applications in signal processing and machine learning, the model of interest is known to be structured in several ways at the same time, for example, a matrix that is simultaneously sparse and low-rank. Often norms that promote each individual structure are known, and allow for recovery using an order-wise optimal number of measurements (e.g., $\ell_1$ norm for sparsity, nuclear norm for matrix rank). Hence, it is reasonable to minimize a combination of such norms. We show that, surprisingly, if we use multi-objective optimization with these norms, then we can do no better, order-wise, than an algorithm that exploits only one of the present structures. This result suggests that to fully exploit the multiple structures, we need an entirely new convex relaxation, i.e. not one that is a function of the convex relaxations used for each structure. We then specialize our results to the case of sparse and low-rank matrices. We show that a nonconvex formulation of the problem can recover the model from very few measurements, which is on the order of the degrees of freedom of the matrix, whereas the convex problem obtained from a combination of the $\ell_1$ and nuclear norms requires many more measurements. This proves an order-wise gap between the performance of the convex and nonconvex recovery problems in this case. Our framework applies to arbitrary structure-inducing norms as well as to a wide range of measurement ensembles. This allows us to give performance bounds for problems such as sparse phase retrieval and low-rank tensor completion.

preprint2014arXiv

Universal Convexification via Risk-Aversion

We develop a framework for convexifying a fairly general class of optimization problems. Under additional assumptions, we analyze the suboptimality of the solution to the convexified problem relative to the original nonconvex problem and prove additive approximation guarantees. We then develop algorithms based on stochastic gradient methods to solve the resulting optimization problems and show bounds on convergence rates. %We show a simple application of this framework to supervised learning, where one can perform integration explicitly and can use standard (non-stochastic) optimization algorithms with better convergence guarantees. We then extend this framework to apply to a general class of discrete-time dynamical systems. In this context, our convexification approach falls under the well-studied paradigm of risk-sensitive Markov Decision Processes. We derive the first known model-based and model-free policy gradient optimization algorithms with guaranteed convergence to the optimal solution. Finally, we present numerical results validating our formulation in different applications.

preprint2013arXiv

Convex Structured Controller Design

We consider the problem of synthesizing optimal linear feedback policies subject to arbitrary convex constraints on the feedback matrix. This is known to be a hard problem in the usual formulations ($\Htwo,\Hinf,\LQR$) and previous works have focused on characterizing classes of structural constraints that allow efficient solution through convex optimization or dynamic programming techniques. In this paper, we propose a new control objective and show that this formulation makes the problem of computing optimal linear feedback matrices convex under arbitrary convex constraints on the feedback matrix. This allows us to solve problems in decentralized control (sparsity in the feedback matrices), control with delays and variable impedance control. Although the control objective is nonstandard, we present theoretical and empirical evidence that it agrees well with standard notions of control. We also present an extension to nonlinear control affine systems. We present numerical experiments validating our approach.

preprint2011arXiv

A Simplified Approach to Recovery Conditions for Low Rank Matrices

Recovering sparse vectors and low-rank matrices from noisy linear measurements has been the focus of much recent research. Various reconstruction algorithms have been studied, including $\ell_1$ and nuclear norm minimization as well as $\ell_p$ minimization with $p<1$. These algorithms are known to succeed if certain conditions on the measurement map are satisfied. Proofs of robust recovery for matrices have so far been much more involved than in the vector case. In this paper, we show how several robust classes of recovery conditions can be extended from vectors to matrices in a simple and transparent way, leading to the best known restricted isometry and nullspace conditions for matrix recovery. Our results rely on the ability to "vectorize" matrices through the use of a key singular value inequality.

preprint2007arXiv

Computational approach to quantum encoder design for purity optimization

In this paper, we address the problem of designing a quantum encoder that maximizes the minimum output purity of a given decohering channel, where the minimum is taken over all possible pure inputs. This problem is cast as a max-min optimization problem with a rank constraint on an appropriately defined matrix variable. The problem is computationally very hard because it is non-convex with respect to both the objective function (output purity) and the rank constraint. Despite this difficulty, we provide a tractable computational algorithm that produces the exact optimal solution for codespace of dimension two. Moreover, this algorithm is easily extended to cover the general class of codespaces, in which case the solution is suboptimal in the sense that the suboptimized output purity serves as a lower bound of the exact optimal purity. The algorithm consists of a sequence of semidefinite programmings and can be performed easily. Two typical quantum error channels are investigated to illustrate the effectiveness of our method.

preprint2007arXiv

Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization

The affine rank minimization problem consists of finding a matrix of minimum rank that satisfies a given system of linear equality constraints. Such problems have appeared in the literature of a diverse set of fields including system identification and control, Euclidean embedding, and collaborative filtering. Although specific instances can often be solved with specialized algorithms, the general affine rank minimization problem is NP-hard. In this paper, we show that if a certain restricted isometry property holds for the linear transformation defining the constraints, the minimum rank solution can be recovered by solving a convex optimization problem, namely the minimization of the nuclear norm over the given affine space. We present several random ensembles of equations where the restricted isometry property holds with overwhelming probability. The techniques used in our analysis have strong parallels in the compressed sensing framework. We discuss how affine rank minimization generalizes this pre-existing concept and outline a dictionary relating concepts from cardinality minimization to those of rank minimization.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2512.24818:author:4:maryam-fazel

Imported May 21, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.17177:author:3:maryam-fazel

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.15082:author:4:maryam-fazel

Imported May 20, 2026Synced May 21, 2026

5 works

Dmitriy Drusvyatskiy

Researcher

Dmitriy Drusvyatskiy contributes to research discovery and scholarly infrastructure.

Open to collaborate

5 works

Samet Oymak

Researcher

Samet Oymak contributes to research discovery and scholarly infrastructure.

Open to collaborate

4 works

Yue Sun

Researcher

Yue Sun contributes to research discovery and scholarly infrastructure.

Open to collaborate

3 works

Karthik Mohan

Researcher

Karthik Mohan contributes to research discovery and scholarly infrastructure.

Open to collaborate

Maryam Fazel

What is connected

Connect this record

See the researcher in context

Building this map preview

23 published item(s)

Average Gradient Outer Product in kernel regression provably recovers the central subspace for multi-index models

High-dimensional Limit of SGD for Diagonal Linear Networks

Unregularized Linear Convergence in Zero-Sum Game from Preference Feedback

Decision-Dependent Risk Minimization in Geometrically Decaying Dynamic Environments

Escaping High-order Saddles in Policy Optimization for Linear Quadratic Gaussian (LQG) Control

Fast First-Order Methods for Monotone Strongly DR-Submodular Maximization

Improved Rates for Derivative Free Gradient Play in Strongly Monotone Games

Multiplayer Performative Prediction: Learning in Decision-Dependent Games

System Identification via Nuclear Norm Regularization

Towards Sample-efficient Overparameterized Meta-learning

Sample Efficient Subspace-based Representations for Nonlinear Meta-Learning

Decomposable Norm Minimization with Proximal-Gradient Homotopy Algorithm

Worst Case Competitive Analysis of Online Algorithms for Conic Optimization

Exponentiated Subgradient Algorithm for Online Optimization under the Random Permutation Model

Relative Density and Exact Recovery in Heterogeneous Stochastic Block Models

Learning Graphical Models With Hubs

Node-Based Learning of Multiple Gaussian Graphical Models

Simultaneously Structured Models with Application to Sparse and Low-rank Matrices

Universal Convexification via Risk-Aversion

Convex Structured Controller Design

A Simplified Approach to Recovery Conditions for Low Rank Matrices

Computational approach to quantum encoder design for purity optimization

Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization