Source author record

Nicolas Loizou

Nicolas Loizou appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning Computer Science and Game Theory Distributed, Parallel, and Cluster Computing Artificial Intelligence Information Theory math.IT Multiagent Systems

Catalog footprint

What is connected

12works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

AI-SARAH: Adaptive and Implicit Stochastic Recursive Gradient Methods

We present AI-SARAH, a practical variant of SARAH. As a variant of SARAH, this algorithm employs the stochastic recursive gradient yet adjusts step-size based on local geometry. AI-SARAH implicitly computes step-size and efficiently estimates local Lipschitz smoothness of stochastic functions. It is fully adaptive, tune-free, straightforward to implement, and computationally efficient. We provide technical insight and intuitive illustrations on its design and convergence. We conduct extensive empirical analysis and demonstrate its strong performance compared with its classical counterparts and other state-of-the-art first-order methods in solving convex machine learning problems.

preprint2022arXiv

Extragradient Method: $O(1/K)$ Last-Iterate Convergence for Monotone Variational Inequalities and Connections With Cocoercivity

Extragradient method (EG) (Korpelevich, 1976) is one of the most popular methods for solving saddle point and variational inequalities problems (VIP). Despite its long history and significant attention in the optimization community, there remain important open questions about convergence of EG. In this paper, we resolve one of such questions and derive the first last-iterate $O(1/K)$ convergence rate for EG for monotone and Lipschitz VIP without any additional assumptions on the operator unlike the only known result of this type (Golowich et al., 2020) that relies on the Lipschitzness of the Jacobian of the operator. The rate is given in terms of reducing the squared norm of the operator. Moreover, we establish several results on the (non-)cocoercivity of the update operators of EG, Optimistic Gradient Method, and Hamiltonian Gradient Method, when the original operator is monotone and Lipschitz.

preprint2022arXiv

On the Convergence of Stochastic Extragradient for Bilinear Games using Restarted Iteration Averaging

We study the stochastic bilinear minimax optimization problem, presenting an analysis of the same-sample Stochastic ExtraGradient (SEG) method with constant step size, and presenting variations of the method that yield favorable convergence. In sharp contrasts with the basic SEG method whose last iterate only contracts to a fixed neighborhood of the Nash equilibrium, SEG augmented with iteration averaging provably converges to the Nash equilibrium under the same standard settings, and such a rate is further improved by incorporating a scheduled restarting procedure. In the interpolation setting where noise vanishes at the Nash equilibrium, we achieve an optimal convergence rate up to tight constants. We present numerical experiments that validate our theoretical findings and demonstrate the effectiveness of the SEG method when equipped with iteration averaging and restarting.

preprint2022arXiv

Stochastic Extragradient: General Analysis and Improved Rates

The Stochastic Extragradient (SEG) method is one of the most popular algorithms for solving min-max optimization and variational inequalities problems (VIP) appearing in various machine learning tasks. However, several important questions regarding the convergence properties of SEG are still open, including the sampling of stochastic gradients, mini-batching, convergence guarantees for the monotone finite-sum variational inequalities with possibly non-monotone terms, and others. To address these questions, in this paper, we develop a novel theoretical framework that allows us to analyze several variants of SEG in a unified manner. Besides standard setups, like Same-Sample SEG under Lipschitzness and monotonicity or Independent-Samples SEG under uniformly bounded variance, our approach allows us to analyze variants of SEG that were never explicitly considered in the literature before. Notably, we analyze SEG with arbitrary sampling which includes importance sampling and various mini-batching strategies as special cases. Our rates for the new variants of SEG outperform the current state-of-the-art convergence guarantees and rely on less restrictive assumptions.

preprint2021arXiv

A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

Decentralized stochastic optimization methods have gained a lot of attention recently, mainly because of their cheap per iteration cost, data locality, and their communication-efficiency. In this paper we introduce a unified convergence analysis that covers a large variety of decentralized SGD methods which so far have required different intuitions, have different applications, and which have been developed separately in various communities. Our algorithmic framework covers local SGD updates and synchronous and pairwise gossip updates on adaptive network topology. We derive universal convergence rates for smooth (convex and non-convex) problems and the rates interpolate between the heterogeneous (non-identically distributed data) and iid-data settings, recovering linear convergence rates in many special cases, for instance for over-parametrized models. Our proofs rely on weak assumptions (typically improving over prior work in several aspects) and recover (and improve) the best known complexity results for a host of important scenarios, such as for instance coorperative SGD and federated averaging (local SGD).

preprint2020arXiv

Stochastic Hamiltonian Gradient Methods for Smooth Games

The success of adversarial formulations in machine learning has brought renewed motivation for smooth games. In this work, we focus on the class of stochastic Hamiltonian methods and provide the first convergence guarantees for certain classes of stochastic smooth games. We propose a novel unbiased estimator for the stochastic Hamiltonian gradient descent (SHGD) and highlight its benefits. Using tools from the optimization literature we show that SHGD converges linearly to the neighbourhood of a stationary point. To guarantee convergence to the exact solution, we analyze SHGD with a decreasing step-size and we also present the first stochastic variance reduced Hamiltonian method. Our results provide the first global non-asymptotic last-iterate convergence guarantees for the class of stochastic unconstrained bilinear games and for the more general class of stochastic games that satisfy a "sufficiently bilinear" condition, notably including some non-convex non-concave problems. We supplement our analysis with experiments on stochastic bilinear and sufficiently bilinear games, where our theory is shown to be tight, and on simple adversarial machine learning formulations.

preprint2020arXiv

Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization

We present a unified theorem for the convergence analysis of stochastic gradient algorithms for minimizing a smooth and convex loss plus a convex regularizer. We do this by extending the unified analysis of Gorbunov, Hanzely \& Richtárik (2020) and dropping the requirement that the loss function be strongly convex. Instead, we only rely on convexity of the loss function. Our unified analysis applies to a host of existing algorithms such as proximal SGD, variance reduced methods, quantization and some coordinate descent type methods. For the variance reduced methods, we recover the best known convergence rates as special cases. For proximal SGD, the quantization and coordinate type methods, we uncover new state-of-the-art convergence rates. Our analysis also includes any form of sampling and minibatching. As such, we are able to determine the minibatch size that optimizes the total complexity of variance reduced methods. We showcase this by obtaining a simple formula for the optimal minibatch size of two variance reduced methods (\textit{L-SVRG} and \textit{SAGA}). This optimal minibatch size not only improves the theoretical total complexity of the methods but also improves their convergence in practice, as we show in several experiments.

preprint2019arXiv

SGD: General Analysis and Improved Rates

We propose a general yet simple theorem describing the convergence of SGD under the arbitrary sampling paradigm. Our theorem describes the convergence of an infinite array of variants of SGD, each of which is associated with a specific probability law governing the data selection rule used to form mini-batches. This is the first time such an analysis is performed, and most of our variants of SGD were never explicitly considered in the literature before. Our analysis relies on the recently introduced notion of expected smoothness and does not rely on a uniform bound on the variance of the stochastic gradients. By specializing our theorem to different mini-batching strategies, such as sampling with replacement and independent sampling, we derive exact expressions for the stepsize as a function of the mini-batch size. With this we can also determine the mini-batch size that optimizes the total complexity, and show explicitly that as the variance of the stochastic gradient evaluated at the minimum grows, so does the optimal mini-batch size. For zero variance, the optimal mini-batch size is one. Moreover, we prove insightful stepsize-switching rules which describe when one should switch from a constant to a decreasing stepsize regime.

preprint2019arXiv

Stochastic Gradient Push for Distributed Deep Learning

Distributed data-parallel algorithms aim to accelerate the training of deep neural networks by parallelizing the computation of large mini-batch gradient updates across multiple nodes. Approaches that synchronize nodes using exact distributed averaging (e.g., via AllReduce) are sensitive to stragglers and communication delays. The PushSum gossip algorithm is robust to these issues, but only performs approximate distributed averaging. This paper studies Stochastic Gradient Push (SGP), which combines PushSum with stochastic gradient updates. We prove that SGP converges to a stationary point of smooth, non-convex objectives at the same sub-linear rate as SGD, and that all nodes achieve consensus. We empirically validate the performance of SGP on image classification (ResNet-50, ImageNet) and machine translation (Transformer, WMT'16 En-De) workloads. Our code will be made publicly available.

preprint2016arXiv

A New Perspective on Randomized Gossip Algorithms

In this short note we propose a new approach for the design and analysis of randomized gossip algorithms which can be used to solve the average consensus problem. We show how that Randomized Block Kaczmarz (RBK) method - a method for solving linear systems - works as gossip algorithm when applied to a special system encoding the underlying network. The famous pairwise gossip algorithm arises as a special case. Subsequently, we reveal a hidden duality of randomized gossip algorithms, with the dual iterative process maintaining a set of numbers attached to the edges as opposed to nodes of the network. We prove that RBK obtains a superlinear speedup in the size of the block, and demonstrate this effect through experiments.

preprint2016arXiv

Distributionally Robust Games with Risk-averse Players

We present a new model of incomplete information games without private information in which the players use a distributionally robust optimization approach to cope with the payoff uncertainty. With some specific restrictions, we show that our "Distributionally Robust Game" constitutes a true generalization of three popular finite games. These are the Complete Information Games, Bayesian Games and Robust Games. Subsequently, we prove that the set of equilibria of an arbitrary distributionally robust game with specified ambiguity set can be computed as the component-wise projection of the solution set of a multi-linear system of equations and inequalities. For special cases of such games we show equivalence to complete information finite games (Nash Games) with the same number of players and same action spaces. Thus, when our game falls within these special cases one can simply solve the corresponding Nash Game. Finally, we demonstrate the applicability of our new model of games and highlight its importance.

preprint2015arXiv

Distributionally Robust Game Theory

The classical, complete-information two-player games assume that the problem data (in particular the payoff matrix) is known exactly by both players. In a now famous result, Nash has shown that any such game has an equilibrium in mixed strategies. This result was later extended to a class of incomplete-information two-player games by Harsanyi, who assumed that the payoff matrix is not known exactly but rather represents a random variable that is governed by a probability distribution known to both players. In 2006, Bertsimas and Aghassi proposed a new class of distribution-free two-player games where the payoff matrix is only known to belong to a given uncertainty set. This model relaxes the distributional assumptions of Harsanyi's Bayesian games, and it gives rise to an alternative distribution-free equilibrium concept. In this thesis we present a new model of incomplete information games without private information in which the players use a distributionally robust optimization approach to cope with the payoff uncertainty. With some specific restrictions, we show that our "Distributionally Robust Game" constitutes a true generalization of the three aforementioned finite games (Nash games, Bayesian Games and Robust Games). Subsequently, we prove that the set of equilibria of an arbitrary distributionally robust game with specified ambiguity set can be computed as the component-wise projection of the solution set of a multi-linear system of equations and inequalities. Finally, we demonstrate the applicability of our new model of games and highlight its importance.

Nicolas Loizou

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

AI-SARAH: Adaptive and Implicit Stochastic Recursive Gradient Methods

Extragradient Method: $O(1/K)$ Last-Iterate Convergence for Monotone Variational Inequalities and Connections With Cocoercivity

On the Convergence of Stochastic Extragradient for Bilinear Games using Restarted Iteration Averaging

Stochastic Extragradient: General Analysis and Improved Rates

A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

Stochastic Hamiltonian Gradient Methods for Smooth Games

Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization

SGD: General Analysis and Improved Rates

Stochastic Gradient Push for Distributed Deep Learning

A New Perspective on Randomized Gossip Algorithms

Distributionally Robust Games with Risk-averse Players

Distributionally Robust Game Theory