Source author record

Necdet Serhat Aybat

Necdet Serhat Aybat appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning Computation Systems and Control

Catalog footprint

What is connected

16works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Accelerated Gradient Methods with Biased Gradient Estimates: Risk Sensitivity, High-Probability Guarantees, and Large Deviation Bounds

We study trade-offs between convergence rate and robustness to gradient errors in the context of first-order methods. Our focus is on generalized momentum methods (GMMs)--a broad class that includes Nesterov's accelerated gradient, heavy-ball, and gradient descent methods--for minimizing smooth strongly convex objectives. We allow stochastic gradient errors that may be adversarial and biased, and quantify robustness of these methods to gradient errors via the risk-sensitive index (RSI) from robust control theory. For quadratic objectives with i.i.d. Gaussian noise, we give closed form expressions for RSI in terms of solutions to 2x2 matrix Riccati equations, revealing a Pareto frontier between RSI and convergence rate over the choice of step-size and momentum parameters. We then prove a large-deviation principle for time-averaged suboptimality in the large iteration limit and show that the rate function is, up to a scaling, the convex conjugate of the RSI function. We further show that the rate function and RSI are linked to the $H_\infty$-norm--a measure of robustness to the worst-case deterministic gradient errors--so that stronger worst-case robustness (smaller $H_\infty$-norm) leads to sharper decay of the tail probabilities for the average suboptimality. Beyond quadratics, under potentially biased sub-Gaussian gradient errors, we derive non-asymptotic bounds on a finite-time analogue of the RSI, yielding finite-time high-probability guarantees and non-asymptotic large-deviation bounds for the averaged iterates. In the case of smooth strongly convex functions, we also observe an analogous trade-off between RSI and convergence-rate bounds. To our knowledge, these are the first non-asymptotic guarantees for GMMs with biased gradients and the first risk-sensitive analysis of GMMs. Finally, we provide numerical experiments on a robust regression problem to illustrate our results.

preprint2020arXiv

On the Theoretical Guarantees for Parameter Estimation of Gaussian Random Field Models: A Sparse Precision Matrix Approach

Iterative methods for fitting a Gaussian Random Field (GRF) model via maximum likelihood (ML) estimation requires solving a nonconvex optimization problem. The problem is aggravated for anisotropic GRFs where the number of covariance function parameters increases with the dimension. Even evaluation of the likelihood function requires $O(n^3)$ floating point operations, where $n$ denotes the number of data locations. In this paper, we propose a new two-stage procedure to estimate the parameters of second-order stationary GRFs. First, a convex likelihood problem regularized with a weighted $\ell_1$-norm, utilizing the available distance information between observation locations, is solved to fit a sparse precision (inverse covariance) matrix to the observed data. Second, the parameters of the covariance function are estimated by solving a least squares problem. Theoretical error bounds for the solutions of stage I and II problems are provided, and their tightness are investigated.

preprint2017arXiv

Distributed Linearized Alternating Direction Method of Multipliers for Composite Convex Consensus Optimization

Given an undirected graph $\mathcal{G}=(\mathcal{N},\mathcal{E})$ of agents $\mathcal{N}=\{1,\ldots,N\}$ connected with edges in $\mathcal{E}$, we study how to compute an optimal decision on which there is consensus among agents and that minimizes the sum of agent-specific private convex composite functions $\{Φ_i\}_{i\in\mathcal{N}}$ while respecting privacy requirements, where $Φ_i\triangleq ξ_i + f_i$ belongs to agent-$i$. Assuming only agents connected by an edge can communicate, we propose a distributed proximal gradient method DPGA for consensus optimization over both unweighted and weighted static (undirected) communication networks. In one iteration, each agent-$i$ computes the prox map of $ξ_i$ and gradient of $f_i$, and this is followed by local communication with neighboring agents. We also study its stochastic gradient variant, SDPGA, which can only access to noisy estimates of $\nabla f_i$ at each agent-$i$. This computational model abstracts a number of applications in distributed sensing, machine learning and statistical inference. We show ergodic convergence in both sub-optimality error and consensus violation for DPGA and SDPGA with rates $\mathcal{O}(1/t)$ and $\mathcal{O}(1/\sqrt{t})$, respectively.

preprint2017arXiv

Generalized Sparse Precision Matrix Selection for Fitting Multivariate Gaussian Random Fields to Large Data Sets

We present a new method for estimating multivariate, second-order stationary Gaussian Random Field (GRF) models based on the Sparse Precision matrix Selection (SPS) algorithm, proposed by Davanloo et al. (2015) for estimating scalar GRF models. Theoretical convergence rates for the estimated between-response covariance matrix and for the estimated parameters of the underlying spatial correlation function are established. Numerical tests using simulated and real datasets validate our theoretical findings. Data segmentation is used to handle large data sets.

preprint2016arXiv

A Parallelizable Dual Smoothing Method for Large Scale Convex Regression Problems

Convex regression (CR) is an approach for fitting a convex function to a finite number of observations. It arises in various applications from diverse fields such as statistics, operations research, economics, and electrical engineering. The least squares (LS) estimator, which can be computed via solving a quadratic program (QP), is an intuitive method for convex regression with already established strong theoretical guarantees. On the other hand, since the number of constraints in the QP formulation increases quadratically in the number of observed data points, the QP quickly becomes impractical to solve using traditional interior point methods. To address this issue, we propose a first-order method based on dual smoothing that carefully manages the memory usage through parallelization in order to efficiently compute the LS estimator in practice for large-scale CR instances.

preprint2016arXiv

A primal-dual method for conic constrained distributed optimization problems

We consider cooperative multi-agent consensus optimization problems over an undirected network of agents, where only those agents connected by an edge can directly communicate. The objective is to minimize the sum of agent-specific composite convex functions over agent-specific private conic constraint sets; hence, the optimal consensus decision should lie in the intersection of these private sets. We provide convergence rates both in sub-optimality, infeasibility and consensus violation; examine the effect of underlying network topology on the convergence rates of the proposed decentralized algorithms; and show how to extend these methods to handle time-varying communications networks and to solve problems with resource sharing constraints.

preprint2015arXiv

An ADMM Algorithm for Clustering Partially Observed Networks

Community detection has attracted increasing attention during the past decade, and many algorithms have been proposed to find the underlying community structure in a given network. Many of these algorithms are based on modularity maximization, and these methods suffer from the resolution limit. In order to detect the underlying cluster structure, we propose a new convex formulation to decompose a partially observed adjacency matrix of a network into low-rank and sparse components. In such decomposition, the low-rank component encodes the cluster structure under certain assumptions. We also devise an alternating direction method of multipliers with increasing penalty sequence to solve this problem; and compare it with Louvain method, which maximizes the modularity, on some synthetic randomly generated networks. Numerical results show that our method outperforms Louvain method on the randomly generated networks when variance among cluster sizes increases. Moreover, empirical results also demonstrate that our formulation is indeed tighter than the robust PCA formulation, and is able to find the true clustering when the robust PCA formulation fails.

preprint2015arXiv

An Alternating Direction Method with Increasing Penalty for Stable Principal Component Pursuit

The stable principal component pursuit (SPCP) is a non-smooth convex optimization problem, the solution of which enables one to reliably recover the low rank and sparse components of a data matrix which is corrupted by a dense noise matrix, even when only a fraction of data entries are observable. In this paper, we propose a new algorithm for solving SPCP. The proposed algorithm is a modification of the alternating direction method of multipliers (ADMM) where we use an increasing sequence of penalty parameters instead of a fixed penalty. The algorithm is based on partial variable splitting and works directly with the non-smooth objective function. We show that both primal and dual iterate sequences converge under mild conditions on the sequence of penalty parameters. To the best of our knowledge, this is the first convergence result for a variable penalty ADMM when penalties are not bounded, the objective function is non-smooth and its sub-differential is not uniformly bounded. Using partial variable splitting and adopting an increasing sequence of penalty multipliers, together, significantly reduce the number of iterations required to achieve feasibility in practice. Our preliminary computational tests show that the proposed algorithm works very well in practice, and outperforms ASALM, a state of the art ADMM algorithm for the SPCP problem with a constant penalty parameter.

preprint2015arXiv

An Asynchronous Distributed Proximal Gradient Method for Composite Convex Optimization

We propose a distributed first-order augmented Lagrangian (DFAL) algorithm to minimize the sum of composite convex functions, where each term in the sum is a private cost function belonging to a node, and only nodes connected by an edge can directly communicate with each other. This optimization model abstracts a number of applications in distributed sensing and machine learning. We show that any limit point of DFAL iterates is optimal; and for any $ε>0$, an $ε$-optimal and $ε$-feasible solution can be computed within $\mathcal{O}(\log(ε^{-1}))$ DFAL iterations, which require $\mathcal{O}(\frac{ψ_{\max}^{1.5}}{d_{\min}} ε^{-1})$ proximal gradient computations and communications per node in total, where $ψ_{\max}$ denotes the largest eigenvalue of the graph Laplacian, and $d_{\min}$ is the minimum degree of the graph. We also propose an asynchronous version of DFAL by incorporating randomized block coordinate descent methods; and demonstrate the efficiency of DFAL on large scale sparse-group LASSO problems.

preprint2015arXiv

Semidefinite Programming For Chance Constrained Optimization Over Semialgebraic Sets

In this paper, "chance optimization" problems are introduced, where one aims at maximizing the probability of a set defined by polynomial inequalities. These problems are, in general, nonconvex and computationally hard. With the objective of developing systematic numerical procedures to solve such problems, a sequence of convex relaxations based on the theory of measures and moments is provided, whose sequence of optimal values is shown to converge to the optimal value of the original problem. Indeed, we provide a sequence of semidefinite programs of increasing dimension which can arbitrarily approximate the solution of the original problem. To be able to efficiently solve the resulting large-scale semidefinite relaxations, a first-order augmented Lagrangian algorithm is implemented. Numerical examples are presented to illustrate the computational performance of the proposed approach.

preprint2014arXiv

A Parallel Method for Large Scale Convex Regression Problems

Convex regression (CR) problem deals with fitting a convex function to a finite number of observations. It has many applications in various disciplines, such as statistics, economics, operations research, and electrical engineering. Computing the least squares (LS) estimator via solving a quadratic program (QP) is the most common technique to fit a piecewise-linear convex function to the observed data. Since the number of constraints in the QP formulation increases quadratically in N, the number of observed data points, computing the LS estimator is not practical using interior point methods when N is very large. The first-order method proposed in this paper carefully manages the memory usage through parallelization, and efficiently solves large-scale instances of CR.

preprint2013arXiv

An Augmented Lagrangian Method for Conic Convex Programming

We propose a new first-order augmented Lagrangian algorithm ALCC for solving convex conic programs of the form min{rho(x)+gamma(x): Ax-b in K, x in chi}, where rho and gamma are closed convex functions, and gamma has a Lipschitz continuous gradient, A is mxn real matrix, K is a closed convex cone, and chi is a "simple" convex compact set such that optimization problems of the form min{rho(x)+|x-x0|_2^2: x in chi} can be efficiently solved for any given x0. We show that any limit point of the primal ALCC iterates is an optimal solution of the conic convex problem, and the dual ALCC iterates have a unique limit point that is a Karush-Kuhn-Tucker (KKT) point of the conic program. We also show that for any epsilon>0, the primal ALCC iterates are epsilon-feasible and epsilon optimal after O(log(1/epsilon)) iterations which require solving O(1/epsilon log(1/epsilon)) problems of the form min{rho(x)+|x-x0|_2^2: x in chi}.

preprint2013arXiv

Efficient Algorithms for Robust and Stable Principal Component Pursuit Problems

The problem of recovering a low-rank matrix from a set of observations corrupted with gross sparse error is known as the robust principal component analysis (RPCA) and has many applications in computer vision, image processing and web data ranking. It has been shown that under certain conditions, the solution to the NP-hard RPCA problem can be obtained by solving a convex optimization problem, namely the robust principal component pursuit (RPCP). Moreover, if the observed data matrix has also been corrupted by a dense noise matrix in addition to gross sparse error, then the stable principal component pursuit (SPCP) problem is solved to recover the low-rank matrix. In this paper, we develop efficient algorithms with provable iteration complexity bounds for solving RPCP and SPCP. Numerical results on problems with millions of variables and constraints such as foreground extraction from surveillance video, shadow and specularity removal from face images and video denoising from heavily corrupted data show that our algorithms are competitive to current state-of-the-art solvers for RPCP and SPCP in terms of accuracy and speed.

preprint2012arXiv

A Unified Approach for Minimizing Composite Norms

We propose a first-order augmented Lagrangian algorithm (FALC) to solve the composite norm minimization problem min |sigma(F(X)-G)|_alpha + |C(X)- d|_beta subject to A(X)-b in Q; where sigma(X) denotes the vector of singular values of X, the matrix norm |sigma(X)|_alpha denotes either the Frobenius, the nuclear, or the L2-operator norm of X, the vector norm |.|_beta denotes either the L1-norm, L2-norm or the L infty-norm; Q is a closed convex set and A(.), C(.), F(.) are linear operators from matrices to vector spaces of appropriate dimensions. Basis pursuit, matrix completion, robust principal component pursuit (PCP), and stable PCP problems are all special cases of the composite norm minimization problem. Thus, FALC is able to solve all these problems in a unified manner. We show that any limit point of FALC iterate sequence is an optimal solution of the composite norm minimization problem. We also show that for all epsilon > 0, the FALC iterates are epsilon-feasible and epsilon-optimal after O(log(1/epsilon)) iterations, which require O(1/epsilon) constrained shrinkage operations and Euclidean projection onto the set Q. Surprisingly, on the problem sets we tested, FALC required only O(log(1/epsilon)) constrained shrinkage, instead of the O(1/epsilon) worst case bound, to compute an epsilon-feasible and epsilon-optimal solution. To best of our knowledge, FALC is the first algorithm with a known complexity bound that solves the stable PCP problem.

preprint2011arXiv

A First-order Augmented Lagrangian Method for Compressed Sensing

We propose a first-order augmented Lagrangian algorithm (FAL) for solving the basis pursuit problem. FAL computes a solution to this problem by inexactly solving a sequence of L1-regularized least squares sub-problems. These sub-problems are solved using an infinite memory proximal gradient algorithm wherein each update reduces to "shrinkage" or constrained "shrinkage". We show that FAL converges to an optimal solution of the basis pursuit problem whenever the solution is unique, which is the case with very high probability for compressed sensing problems. We construct a parameter sequence such that the corresponding FAL iterates are eps-feasible and eps-optimal for all eps>0 within O(log(1/eps)) FAL iterations. Moreover, FAL requires at most O(1/eps) matrix-vector multiplications of the form Ax or A^Ty to compute an eps-feasible, eps-optimal solution. We show that FAL can be easily extended to solve the basis pursuit denoising problem when there is a non-trivial level of noise on the measurements. We report the results of numerical experiments comparing FAL with the state-of-the-art algorithms for both noisy and noiseless compressed sensing problems. A striking property of FAL that we observed in the numerical experiments with randomly generated instances when there is no measurement noise was that FAL always correctly identifies the support of the target signal without any thresholding or post-processing, for moderately small error tolerance values.

preprint2011arXiv

Fast First-Order Methods for Stable Principal Component Pursuit

The stable principal component pursuit (SPCP) problem is a non-smooth convex optimization problem, the solution of which has been shown both in theory and in practice to enable one to recover the low rank and sparse components of a matrix whose elements have been corrupted by Gaussian noise. In this paper, we show how several fast first-order methods can be applied to this problem very efficiently. Specifically, we show that the subproblems that arise when applying optimal gradient methods of Nesterov, alternating linearization methods and alternating direction augmented Lagrangian methods to the SPCP problem either have closed-form solutions or have solutions that can be obtained with very modest effort. All but one of the methods analyzed require at least one of the non-smooth terms in the objective function to be smoothed and obtain an eps-optimal solution to the SPCP problem in O(1/eps) iterations. The method that works directly with the fully non-smooth objective function, is proved to be convergent under mild conditions on the sequence of parameters it uses. Our preliminary computational tests show that the latter method, although its complexity is not known, is fastest and substantially outperforms existing methods for the SPCP problem. To best of our knowledge, an algorithm for the SPCP problem that has O(1/eps) iteration complexity and has a per iteration complexity equal to that of a singular value decomposition is given for the first time.

Necdet Serhat Aybat

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

Accelerated Gradient Methods with Biased Gradient Estimates: Risk Sensitivity, High-Probability Guarantees, and Large Deviation Bounds

On the Theoretical Guarantees for Parameter Estimation of Gaussian Random Field Models: A Sparse Precision Matrix Approach

Distributed Linearized Alternating Direction Method of Multipliers for Composite Convex Consensus Optimization

Generalized Sparse Precision Matrix Selection for Fitting Multivariate Gaussian Random Fields to Large Data Sets

A Parallelizable Dual Smoothing Method for Large Scale Convex Regression Problems

A primal-dual method for conic constrained distributed optimization problems

An ADMM Algorithm for Clustering Partially Observed Networks

An Alternating Direction Method with Increasing Penalty for Stable Principal Component Pursuit

An Asynchronous Distributed Proximal Gradient Method for Composite Convex Optimization

Semidefinite Programming For Chance Constrained Optimization Over Semialgebraic Sets

A Parallel Method for Large Scale Convex Regression Problems

An Augmented Lagrangian Method for Conic Convex Programming

Efficient Algorithms for Robust and Stable Principal Component Pursuit Problems

A Unified Approach for Minimizing Composite Norms

A First-order Augmented Lagrangian Method for Compressed Sensing

Fast First-Order Methods for Stable Principal Component Pursuit