Source author record

Alexander Gasnikov

Alexander Gasnikov appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning Distributed, Parallel, and Cluster Computing Computational Complexity Data Structures and Algorithms Multiagent Systems Systems and Control Artificial Intelligence Information Retrieval math-ph math.MP math.PR Numerical Analysis

Catalog footprint

What is connected

78works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Gradient-Free Approaches is a Key to an Efficient Interaction with Markovian Stochasticity

This paper deals with stochastic optimization problems involving Markovian noise with a zero-order oracle. We present and analyze a novel derivative-free method for solving such problems in strongly convex smooth and non-smooth settings with both one-point and two-point feedback oracles. Using a randomized batching scheme, we show that when mixing time $τ$ of the underlying noise sequence is less than the dimension of the problem $d$, the convergence estimates of our method do not depend on $τ$. This observation provides an efficient way to interact with Markovian stochasticity: instead of invoking the expensive first-order oracle, one should use the zero-order oracle. Finally, we complement our upper bounds with the corresponding lower bounds. This confirms the optimality of our results.

preprint2026arXiv

SDG-MoE: Signed Debate Graph Mixture-of-Experts

Sparse MoE models achieve a good balance between capacity and compute by routing each token to a small subset of experts. However, in most MoE architectures, once a token is routed, the selected experts process it independently and their outputs are combined via a weighted sum. This leaves open whether enabling communication among them could improve performance. While prior work has raised this question, direct interaction among the active routed experts remains underexplored. In this paper, we propose SDG-MoE (Signed Debate Graph Mixture-of-Experts), a novel architecture that adds a lightweight, iterative deliberation step before final aggregation. SDG-MoE introduces three components: (i) two learned interaction matrices over the active experts, a support graph $A^+$ and a critique graph $A^-$, capturing reinforcing and corrective influences; (ii) a signed message-passing step that updates expert representations before aggregation; and (iii) a disagreement-gated Friedkin-Johnsen-style anchoring that controls deliberation strength while preventing expert drift. Together, these enable a structured deliberation process where interaction strength scales with disagreement and specialization is preserved. We also provide a theoretical analysis establishing stability conditions on expert states and showing that deliberation adds only low-order overhead over the active set. In controlled three-seed pretraining experiments, SDG-MoE improves validation perplexity over both an unsigned graph communication baseline and vanilla MoE, outperforming the strongest baseline by 19.8%, and gives the best external perplexity on WikiText-103, C4, and Paloma among the compared systems.

preprint2026arXiv

UCB-type Algorithm for Budget-Constrained Expert Learning

In many modern applications, a system must dynamically choose between several adaptive learning algorithms that are trained online. Examples include model selection in streaming environments, switching between trading strategies in finance, and orchestrating multiple contextual bandit or reinforcement learning agents. At each round, a learner must select one predictor among $K$ adaptive experts to make a prediction, while being able to update at most $M \le K$ of them under a fixed training budget. We address this problem in the \emph{stochastic setting} and introduce \algname{M-LCB}, a computationally efficient UCB-style meta-algorithm that provides \emph{anytime regret guarantees}. Its confidence intervals are built directly from realized losses, require no additional optimization, and seamlessly reflect the convergence properties of the underlying experts. If each expert achieves internal regret $\tilde O(T^α)$, then \algname{M-LCB} ensures overall regret bounded by $\tilde O\!\Bigl(\sqrt{\tfrac{KT}{M}} \;+\; (K/M)^{1-α}\,T^α\Bigr)$. To our knowledge, this is the first result establishing regret guarantees when multiple adaptive experts are trained simultaneously under per-round budget constraints. We illustrate the framework with two representative cases: (i) parametric models trained online with stochastic losses, and (ii) experts that are themselves multi-armed bandit algorithms. These examples highlight how \algname{M-LCB} extends the classical bandit paradigm to the more realistic scenario of coordinating stateful, self-learning experts under limited resources.

preprint2023arXiv

Accelerated gradient methods with absolute and relative noise in the gradient

In this paper, we investigate accelerated first-order methods for smooth convex optimization problems under inexact information on the gradient of the objective. The noise in the gradient is considered to be additive with two possibilities: absolute noise bounded by a constant, and relative noise proportional to the norm of the gradient. We investigate the accumulation of the errors in the convex and strongly convex settings with the main difference with most of the previous works being that the feasible set can be unbounded. The key to the latter is to prove a bound on the trajectory of the algorithm. We also give a stopping criterion for the algorithm and consider extensions to the cases of stochastic optimization and composite nonsmooth problems.

preprint2023arXiv

Decentralized Strongly-Convex Optimization with Affine Constraints: Primal and Dual Approaches

Decentralized optimization is a common paradigm used in distributed signal processing and sensing as well as privacy-preserving and large-scale machine learning. It is assumed that several computational entities locally hold objective functions and are connected by a network. The agents aim to commonly minimize the sum of the local objectives subject by making gradient updates and exchanging information with their immediate neighbors. Theory of decentralized optimization is pretty well-developed in the literature. In particular, it includes lower bounds and optimal algorithms. In this paper, we assume that along with an objective, each node also holds affine constraints. We discuss several primal and dual approaches to decentralized optimization problem with affine constraints.

preprint2023arXiv

The Mirror-Prox Sliding Method for Non-smooth decentralized saddle-point problems

The saddle-point optimization problems have a lot of practical applications. This paper focuses on such non-smooth problems in decentralized case. This work contains generalization of recently proposed sliding for centralized problem. Through specific penalization method and this sliding we obtain algorithm for non-smooth decentralized saddle-point problems. Note, the proposed method approaches lower bounds both for number of communication rounds and calls of (sub-)gradient per node.

preprint2022arXiv

Accelerated Primal-Dual Gradient Method for Smooth and Convex-Concave Saddle-Point Problems with Bilinear Coupling

In this paper we study the convex-concave saddle-point problem $\min_x \max_y f(x) + y^T \mathbf{A} x - g(y)$, where $f(x)$ and $g(y)$ are smooth and convex functions. We propose an Accelerated Primal-Dual Gradient Method (APDG) for solving this problem, achieving (i) an optimal linear convergence rate in the strongly-convex-strongly-concave regime, matching the lower complexity bound (Zhang et al., 2021), and (ii) an accelerated linear convergence rate in the case when only one of the functions $f(x)$ and $g(y)$ is strongly convex or even none of them are. Finally, we obtain a linearly convergent algorithm for the general smooth and convex-concave saddle point problem $\min_x \max_y F(x,y)$ without the requirement of strong convexity or strong concavity.

preprint2022arXiv

Acceleration in Distributed Optimization under Similarity

We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes. The loss functions of the agents are assumed to be \textit{similar}, due to statistical data similarity or otherwise. In order to reduce the number of communications to reach a solution accuracy, we proposed a {\it preconditioned, accelerated} distributed method. An $\varepsilon$-solution is achieved in $\tilde{\mathcal{O}}\big(\sqrt{\frac{β/μ}{1-ρ}}\log1/\varepsilon\big)$ number of communications steps, where $β/μ$ is the relative condition number between the global and local loss functions, and $ρ$ characterizes the connectivity of the network. This rate matches (up to poly-log factors) lower complexity communication bounds of distributed gossip-algorithms applied to the class of problems of interest. Numerical results show significant communication savings with respect to existing accelerated distributed schemes, especially when solving ill-conditioned problems.

preprint2022arXiv

An Approach for Non-Convex Uniformly Concave Structured Saddle Point Problem

Recently, saddle point problems have received much attention due to their powerful modeling capability for a lot of problems from diverse domains. Applications of these problems occur in many applied areas, such as robust optimization, distributed optimization, game theory, and many applications in machine learning such as empirical risk minimization and generative adversarial networks training. Therefore, many researchers have actively worked on developing numerical methods for solving saddle point problems in many different settings. This paper is devoted to developing a numerical method for solving saddle point problems in the non-convex uniformly-concave setting. We study a general class of saddle point problems with composite structure and Hölder-continuous higher-order derivatives. To solve the problem under consideration, we propose an approach in which we reduce the problem to a combination of two auxiliary optimization problems separately for each group of variables, outer minimization problem w.r.t. primal variables, and inner maximization problem w.r.t the dual variables. For solving the outer minimization problem, we use the \textit{Adaptive Gradient Method}, which is applicable for non-convex problems and also works with an inexact oracle that is generated by approximately solving the inner problem. For solving the inner maximization problem, we use the \textit{Restarted Unified Acceleration Framework}, which is a framework that unifies the high-order acceleration methods for minimizing a convex function that has Hölder-continuous higher-order derivatives. Separate complexity bounds are provided for the number of calls to the first-order oracles for the outer minimization problem and higher-order oracles for the inner maximization problem. Moreover, the complexity of the whole proposed approach is then estimated.

preprint2022arXiv

Decentralized convex optimization under affine constraints for power systems control

Modern power systems are now in continuous process of massive changes. Increased penetration of distributed generation, usage of energy storage and controllable demand require introduction of a new control paradigm that does not rely on massive information exchange required by centralized approaches. Distributed algorithms can rely only on limited information from neighbours to obtain an optimal solution for various optimization problems, such as optimal power flow, unit commitment etc. As a generalization of these problems we consider the problem of decentralized minimization of the smooth and convex partially separable function $f = \sum_{k=1}^l f^k(x^k,\tilde x)$ under the coupled $\sum_{k=1}^l (A^k x^k - b^k) \leq 0$ and the shared $\tilde{A} \tilde{x} - \tilde{b} \leq 0$ affine constraints, where the information about $A^k$ and $b^k$ is only available for the $k$-th node of the computational network. One way to handle the coupled constraints in a distributed manner is to rewrite them in a distributed-friendly form using the Laplace matrix of the communication graph and auxiliary variables (Khamisov, CDC, 2017). Instead of using this method we reformulate the constrained optimization problem as a saddle point problem (SPP) and utilize the consensus constraint technique to make it distributed-friendly. Then we provide a complexity analysis for state-of-the-art SPP solving algorithms applied to this SPP.

preprint2022arXiv

Distributed Saddle-Point Problems Under Similarity

We study solution methods for (strongly-)convex-(strongly)-concave Saddle-Point Problems (SPPs) over networks of two type - master/workers (thus centralized) architectures and meshed (thus decentralized) networks. The local functions at each node are assumed to be similar, due to statistical data similarity or otherwise. We establish lower complexity bounds for a fairly general class of algorithms solving the SPP. We show that a given suboptimality $ε>0$ is achieved over master/workers networks in $Ω\big(Δ\cdot δ/μ\cdot \log (1/\varepsilon)\big)$ rounds of communications, where $δ>0$ measures the degree of similarity of the local functions, $μ$ is their strong convexity constant, and $Δ$ is the diameter of the network. The lower communication complexity bound over meshed networks reads $Ω\big(1/{\sqrtρ} \cdot δ/μ\cdot\log (1/\varepsilon)\big)$, where $ρ$ is the (normalized) eigengap of the gossip matrix used for the communication between neighbouring nodes. We then propose algorithms matching the lower bounds over either types of networks (up to log-factors). We assess the effectiveness of the proposed algorithms on a robust logistic regression problem.

preprint2022arXiv

FLECS: A Federated Learning Second-Order Framework via Compression and Sketching

Inspired by the recent work FedNL (Safaryan et al, FedNL: Making Newton-Type Methods Applicable to Federated Learning), we propose a new communication efficient second-order framework for Federated learning, namely FLECS. The proposed method reduces the high-memory requirements of FedNL by the usage of an L-SR1 type update for the Hessian approximation which is stored on the central server. A low dimensional `sketch' of the Hessian is all that is needed by each device to generate an update, so that memory costs as well as number of Hessian-vector products for the agent are low. Biased and unbiased compressions are utilized to make communication costs also low. Convergence guarantees for FLECS are provided in both the strongly convex, and nonconvex cases, and local linear convergence is also established under strong convexity. Numerical experiments confirm the practical benefits of this new FLECS algorithm.

preprint2022arXiv

Generalized Mirror Prox for Monotone Variational Inequalities: Universality and Inexact Oracle

We introduce an inexact oracle model for variational inequalities (VI) with monotone operator, propose a numerical method which solves such VI's and analyze its convergence rate. As a particular case, we consider VI's with Hölder-continuous operator and show that our algorithm is universal. This means that without knowing the Hölder parameter $ν$ and Hölder constant $L_ν$ it has the best possible complexity for this class of VI's, namely our algorithm has complexity $O\left( \inf_{ν\in[0,1]}\left(\frac{L_ν}{\varepsilon} \right)^{\frac{2}{1+ν}}R^2 \right)$, where $R$ is the size of the feasible set and $\varepsilon$ is the desired accuracy of the solution. We also consider the case of VI's with strongly monotone operator and generalize our method for VI's with inexact oracle and our universal method for this class of problems. Finally, we show, how our method can be applied to convex-concave saddle point problems with Hölder-continuous partial subgradients.

preprint2022arXiv

Gradient-Free Methods for Saddle-Point Problem

In the paper, we generalize the approach Gasnikov et. al, 2017, which allows to solve (stochastic) convex optimization problems with an inexact gradient-free oracle, to the convex-concave saddle-point problem. The proposed approach works, at least, like the best existing approaches. But for a special set-up (simplex type constraints and closeness of Lipschitz constants in 1 and 2 norms) our approach reduces $\frac{n}{\log n}$ times the required number of oracle calls (function calculations). Our method uses a stochastic approximation of the gradient via finite differences. In this case, the function must be specified not only on the optimization set itself, but in a certain neighbourhood of it. In the second part of the paper, we analyze the case when such an assumption cannot be made, we propose a general approach on how to modernize the method to solve this problem, and also we apply this approach to particular cases of some classical sets.

preprint2022arXiv

On the relations of stochastic convex optimization problems with empirical risk minimization problems on $p$-norm balls

In this paper, we consider convex stochastic optimization problems arising in machine learning applications (e.g., risk minimization) and mathematical statistics (e.g., maximum likelihood estimation). There are two main approaches to solve such kinds of problems, namely the Stochastic Approximation approach (online approach) and the Sample Average Approximation approach, also known as the Monte Carlo approach, (offline approach). In the offline approach, the problem is replaced by its empirical counterpart (the empirical risk minimization problem). The natural question is how to define the problem sample size, i.e., how many realizations should be sampled so that the quite accurate solution of the empirical problem be the solution of the original problem with the desired precision. This issue is one of the main issues in modern machine learning and optimization. In the last decade, a lot of significant advances were made in these areas to solve convex stochastic optimization problems on the Euclidean balls (or the whole space). In this work, we are based on these advances and study the case of arbitrary balls in the $\ell_p$-norms. We also explore the question of how the parameter $p$ affects the estimates of the required number of terms as a function of empirical risk.

preprint2022arXiv

Optimal Gradient Sliding and its Application to Distributed Optimization Under Similarity

We study structured convex optimization problems, with additive objective $r:=p + q$, where $r$ is ($μ$-strongly) convex, $q$ is $L_q$-smooth and convex, and $p$ is $L_p$-smooth, possibly nonconvex. For such a class of problems, we proposed an inexact accelerated gradient sliding method that can skip the gradient computation for one of these components while still achieving optimal complexity of gradient calls of $p$ and $q$, that is, $\mathcal{O}(\sqrt{L_p/μ})$ and $\mathcal{O}(\sqrt{L_q/μ})$, respectively. This result is much sharper than the classic black-box complexity $\mathcal{O}(\sqrt{(L_p+L_q)/μ})$, especially when the difference between $L_q$ and $L_q$ is large. We then apply the proposed method to solve distributed optimization problems over master-worker architectures, under agents' function similarity, due to statistical data similarity or otherwise. The distributed algorithm achieves for the first time lower complexity bounds on {\it both} communication and local gradient calls, with the former having being a long-standing open problem. Finally the method is extended to distributed saddle-problems (under function similarity) by means of solving a class of variational inequalities, achieving lower communication and computation complexity bounds.

preprint2022arXiv

Oracle Complexity Separation in Convex Optimization

Many convex optimization problems have structured objective function written as a sum of functions with different types of oracles (full gradient, coordinate derivative, stochastic gradient) and different evaluation complexity of these oracles. In the strongly convex case these functions also have different condition numbers, which eventually define the iteration complexity of first-order methods and the number of oracle calls required to achieve given accuracy. Motivated by the desire to call more expensive oracle less number of times, in this paper we consider minimization of a sum of two functions and propose a generic algorithmic framework to separate oracle complexities for each component in the sum. As a specific example, for the $μ$-strongly convex problem $\min_{x\in \mathbb{R}^n} h(x) + g(x)$ with $L_h$-smooth function $h$ and $L_g$-smooth function $g$, a special case of our algorithm requires, up to a logarithmic factor, $O(\sqrt{L_h/μ})$ first-order oracle calls for $h$ and $O(\sqrt{L_g/μ})$ first-order oracle calls for $g$. Our general framework covers also the setting of strongly convex objectives, the setting when $g$ is given by coordinate derivative oracle, and the setting when $g$ has a finite-sum structure and is available through stochastic gradient oracle. In the latter two cases we obtain respectively accelerated random coordinate descent and accelerated variance reduction methods with oracle complexity separation.

preprint2022arXiv

Primal-Dual Stochastic Mirror Descent for MDPs

We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs). For this purpose, some variant of Stochastic Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals. An important detail is the ability to use inexact values of functional constraints and compute the value of dual variables. We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method. Using this algorithm, we get the first parallel algorithm for mixing average-reward MDPs with a generative model without reduction to discounted MDP. One of the main features of the presented method is low communication costs in a distributed centralized setting, even with very large networks.

preprint2022arXiv

The First Optimal Acceleration of High-Order Methods in Smooth Convex Optimization

In this paper, we study the fundamental open question of finding the optimal high-order algorithm for solving smooth convex minimization problems. Arjevani et al. (2019) established the lower bound $Ω\left(ε^{-2/(3p+1)}\right)$ on the number of the $p$-th order oracle calls required by an algorithm to find an $ε$-accurate solution to the problem, where the $p$-th order oracle stands for the computation of the objective function value and the derivatives up to the order $p$. However, the existing state-of-the-art high-order methods of Gasnikov et al. (2019b); Bubeck et al. (2019); Jiang et al. (2019) achieve the oracle complexity $\mathcal{O}\left(ε^{-2/(3p+1)} \log (1/ε)\right)$, which does not match the lower bound. The reason for this is that these algorithms require performing a complex binary search procedure, which makes them neither optimal nor practical. We fix this fundamental issue by providing the first algorithm with $\mathcal{O}\left(ε^{-2/(3p+1)}\right)$ $p$-th order oracle complexity.

preprint2022arXiv

The First Optimal Algorithm for Smooth and Strongly-Convex-Strongly-Concave Minimax Optimization

In this paper, we revisit the smooth and strongly-convex-strongly-concave minimax optimization problem. Zhang et al. (2021) and Ibrahim et al. (2020) established the lower bound $Ω\left(\sqrt{κ_xκ_y} \log \frac{1}ε\right)$ on the number of gradient evaluations required to find an $ε$-accurate solution, where $κ_x$ and $κ_y$ are condition numbers for the strong convexity and strong concavity assumptions. However, the existing state-of-the-art methods do not match this lower bound: algorithms of Lin et al. (2020) and Wang and Li (2020) have gradient evaluation complexity $\mathcal{O}\left( \sqrt{κ_xκ_y}\log^3\frac{1}ε\right)$ and $\mathcal{O}\left( \sqrt{κ_xκ_y}\log^3 (κ_xκ_y)\log\frac{1}ε\right)$, respectively. We fix this fundamental issue by providing the first algorithm with $\mathcal{O}\left(\sqrt{κ_xκ_y}\log\frac{1}ε\right)$ gradient evaluation complexity. We design our algorithm in three steps: (i) we reformulate the original problem as a minimization problem via the pointwise conjugate function; (ii) we apply a specific variant of the proximal point algorithm to the reformulated problem; (iii) we compute the proximal operator inexactly using the optimal algorithm for operator norm reduction in monotone inclusions.

preprint2022arXiv

Vaidya's method for convex stochastic optimization in small dimension

This paper considers a general problem of convex stochastic optimization in a relatively low-dimensional space (e.g., 100 variables). It is known that for deterministic convex optimization problems of small dimensions, the fastest convergence is achieved by the center of gravity type methods (e.g., Vaidya's cutting plane method). For stochastic optimization problems, the question of whether Vaidya's method can be used comes down to the question of how it accumulates inaccuracy in the subgradient. The recent result of the authors states that the errors do not accumulate on iterations of Vaidya's method, which allows proposing its analog for stochastic optimization problems. The primary technique is to replace the subgradient in Vaidya's method with its probabilistic counterpart (the arithmetic mean of the stochastic subgradients). The present paper implements the described plan, which ultimately leads to an effective (if parallel computations for batching are possible) method for solving convex stochastic optimization problems in relatively low-dimensional spaces.

preprint2021arXiv

Adaptive Catalyst for Smooth Convex Optimization

In this paper, we present a generic framework that allows accelerating almost arbitrary non-accelerated deterministic and randomized algorithms for smooth convex optimization problems. The main approach of our envelope is the same as in Catalyst (Lin et al., 2015): an accelerated proximal outer gradient method, which is used as an envelope for a non-accelerated inner method for the $\ell_2$ regularized auxiliary problem. Our algorithm has two key differences: 1) easily verifiable stopping criteria for inner algorithm; 2) the regularization parameter can be tunned along the way. As a result, the main contribution of our work is a new framework that applies to adaptive inner algorithms: Steepest Descent, Adaptive Coordinate Descent, Alternating Minimization. Moreover, in the non-adaptive case, our approach allows obtaining Catalyst without a logarithmic factor, which appears in the standard Catalyst (Lin et al., 2015, 2018).

preprint2021arXiv

ADOM: Accelerated Decentralized Optimization Method for Time-Varying Networks

We propose ADOM - an accelerated method for smooth and strongly convex decentralized optimization over time-varying networks. ADOM uses a dual oracle, i.e., we assume access to the gradient of the Fenchel conjugate of the individual loss functions. Up to a constant factor, which depends on the network structure only, its communication complexity is the same as that of accelerated Nesterov gradient method (Nesterov, 2003). To the best of our knowledge, only the algorithm of Rogozin et al. (2019) has a convergence rate with similar properties. However, their algorithm converges under the very restrictive assumption that the number of network changes can not be greater than a tiny percentage of the number of iterations. This assumption is hard to satisfy in practice, as the network topology changes usually can not be controlled. In contrast, ADOM merely requires the network to stay connected throughout time.

preprint2021arXiv

Decentralized and Parallel Primal and Dual Accelerated Methods for Stochastic Convex Programming Problems

We introduce primal and dual stochastic gradient oracle methods for decentralized convex optimization problems. Both for primal and dual oracles, the proposed methods are optimal in terms of the number of communication steps. However, for all classes of the objective, the optimality in terms of the number of oracle calls per node takes place only up to a logarithmic factor and the notion of smoothness. By using mini-batching technique, we show that the proposed methods with stochastic oracle can be additionally parallelized at each node. The considered algorithms can be applied to many data science problems and inverse problems.

preprint2021arXiv

Lecture Notes on Stochastic Processes

This is lecture notes on the course "Stochastic Processes". In this format, the course was taught in the spring semesters 2017 and 2018 for third-year bachelor students of the Department of Control and Applied Mathematics, School of Applied Mathematics and Informatics at Moscow Institute of Physics and Technology. The base of this course was formed and taught for decades by professors from the Department of Mathematical Foundations of Control A.A. Natan, S.A. Guz, and O.G. Gorbachev. Besides standard chapters of stochastic processes theory (correlation theory, Markov processes) in this book (and lectures) the following chapters are included: von Neumann-Birkhoff-Khinchin ergodic theorem, macrosystem equilibrium concept, Markov Chain Monte Carlo, Markov decision processes and the secretary problem.

preprint2021arXiv

Linearly Convergent Gradient-Free Methods for Minimization of Parabolic Approximation

Finding the global minimum of non-convex functions is one of the main and most difficult problems in modern optimization. In the first part of the paper, we consider a certain class of "good" non-convex functions that can be bounded above and below by a parabolic function. We show that using only the zeroth-order oracle, one can obtain the linear speed $\log \left(\frac{1}{\varepsilon}\right)$ of finding the global minimum on a cube. The second part of the paper looks at the nonconvex problem in a slightly different way. We assume that minimizing the quadratic function, but at the same time we have access to a zeroth-order oracle with noise and this noise is proportional to the distance to the solution. Dealing with such noise assumptions for gradient-free methods is new in the literature. We show that here it is also possible to achieve the linear rate of convergence.

preprint2021arXiv

Mirror Descent for Constrained Optimization Problems with Large Subgradient Values

Based on the ideas of arXiv:1710.06612, we consider the problem of minimization of the Holder-continuous non-smooth functional $f$ with non-positive convex (generally, non-smooth) Lipschitz-continuous functional constraint. We propose some novel strategies of step-sizes and adaptive stopping rules in Mirror Descent algorithms for the considered class of problems. It is shown that the methods are applicable to the objective functionals of various levels of smoothness. Applying the restart technique to the Mirror Descent Algorithm there was proposed an optimal method to solve optimization problems with strongly convex objective functionals. Estimates of the rate of convergence of the considered algorithms are obtained depending on the level of smoothness of the objective functional. These estimates indicate the optimality of considered methods from the point of view of the theory of lower oracle bounds. In addition, the case of a quasi-convex objective functional and constraint was considered.

preprint2021arXiv

Numerical methods for the resource allocation problem in networks

In this paper, we consider the resource allocation problem in a network with a large number of connections which are used by a huge number of users. The resource allocation problem under discussion is a maximization problem with linear inequality constraints. To solve this problem we construct the dual problem and propose to use the following numerical optimization methods for the dual: a fast gradient method, a stochastic projected subgradient method, an ellipsoid method, and a random gradient extrapolation method. A special focus is made on the primal-dual analysis of these methods. For each method we estimate the convergence rate. We also provide some modifications of these methods in the setup of distributed computations, taking into account their application to networks.

preprint2021arXiv

On solving convex min-min problems with smoothness and strong convexity in one variable group and small dimension of the other

This paper is devoted to some approaches for convex min-min problems with smoothness and strong convexity in only one of the two variable groups. It is shown that the proposed approaches, based on Vaidya's cutting plane method and Nesterov's fast gradient method, achieve the linear convergence. The outer minimization problem is solved using Vaidya's cutting plane method, and the inner problem (smooth and strongly convex) is solved using the fast gradient method. Due to the importance of machine learning applications, we also consider the case when the objective function is a sum of a large number of functions. In this case, the variance-reduced accelerated gradient algorithm is used instead of Nesterov's fast gradient method. The numerical experiments' results illustrate the advantages of the proposed procedures for logistic regression with the prior on one of the parameter groups.

preprint2021arXiv

One-Point Gradient-Free Methods for Smooth and Non-Smooth Saddle-Point Problems

In this paper, we analyze gradient-free methods with one-point feedback for stochastic saddle point problems $\min_{x}\max_{y} φ(x, y)$. For non-smooth and smooth cases, we present analysis in a general geometric setup with arbitrary Bregman divergence. For problems with higher-order smoothness, the analysis is carried out only in the Euclidean case. The estimates we have obtained repeat the best currently known estimates of gradient-free methods with one-point feedback for problems of imagining a convex or strongly convex function. The paper uses three main approaches to recovering the gradient through finite differences: standard with a random direction, as well as its modifications with kernels and residual feedback. We also provide experiments to compare these approaches for the matrix game.

preprint2021arXiv

Recent theoretical advances in decentralized distributed convex optimization

In the last few years, the theory of decentralized distributed convex optimization has made significant progress. The lower bounds on communications rounds and oracle calls have appeared, as well as methods that reach both of these bounds. In this paper, we focus on how these results can be explained based on optimal algorithms for the non-distributed setup. In particular, we provide our recent results that have not been published yet and that could be found in details only in arXiv preprints.

preprint2021arXiv

Zeroth-order methods for noisy Hölder-gradient functions

In this paper, we prove new complexity bounds for zeroth-order methods in non-convex optimization with inexact observations of the objective function values. We use the Gaussian smoothing approach of Nesterov and Spokoiny [2015] and extend their results, obtained for optimization methods for smooth zeroth-order non-convex problems, to the setting of minimization of functions with Hölder-continuous gradient with noisy zeroth-order oracle, obtaining noise upper-bounds as well. We consider finite-difference gradient approximation based on normally distributed random Gaussian vectors and prove that gradient descent scheme based on this approximation converges to the stationary point of the smoothed function. We also consider convergence to the stationary point of the original (not smoothed) function and obtain bounds on the number of steps of the algorithm for making the norm of its gradient small. Additionally, we provide bounds for the level of noise in the zeroth-order oracle for which it is still possible to guarantee that the above bounds hold. We also consider separately the case of $ν= 1$ and show that in this case the dependence of the obtained bounds on the dimension can be improved.

Alexander Gasnikov

What is connected

Connect this record

See the researcher in context

Building this map preview

78 published item(s)

Gradient-Free Approaches is a Key to an Efficient Interaction with Markovian Stochasticity

SDG-MoE: Signed Debate Graph Mixture-of-Experts

UCB-type Algorithm for Budget-Constrained Expert Learning

Accelerated gradient methods with absolute and relative noise in the gradient

Decentralized Strongly-Convex Optimization with Affine Constraints: Primal and Dual Approaches

The Mirror-Prox Sliding Method for Non-smooth decentralized saddle-point problems

Accelerated Primal-Dual Gradient Method for Smooth and Convex-Concave Saddle-Point Problems with Bilinear Coupling

Acceleration in Distributed Optimization under Similarity

An Approach for Non-Convex Uniformly Concave Structured Saddle Point Problem

Decentralized convex optimization under affine constraints for power systems control

Distributed Saddle-Point Problems Under Similarity

FLECS: A Federated Learning Second-Order Framework via Compression and Sketching

Generalized Mirror Prox for Monotone Variational Inequalities: Universality and Inexact Oracle

Gradient-Free Methods for Saddle-Point Problem

On the relations of stochastic convex optimization problems with empirical risk minimization problems on $p$-norm balls

Optimal Gradient Sliding and its Application to Distributed Optimization Under Similarity

Oracle Complexity Separation in Convex Optimization

Primal-Dual Stochastic Mirror Descent for MDPs

The First Optimal Acceleration of High-Order Methods in Smooth Convex Optimization

The First Optimal Algorithm for Smooth and Strongly-Convex-Strongly-Concave Minimax Optimization

Vaidya's method for convex stochastic optimization in small dimension

Adaptive Catalyst for Smooth Convex Optimization

ADOM: Accelerated Decentralized Optimization Method for Time-Varying Networks

Decentralized and Parallel Primal and Dual Accelerated Methods for Stochastic Convex Programming Problems

Lecture Notes on Stochastic Processes

Linearly Convergent Gradient-Free Methods for Minimization of Parabolic Approximation

Mirror Descent for Constrained Optimization Problems with Large Subgradient Values

Numerical methods for the resource allocation problem in networks

On solving convex min-min problems with smoothness and strong convexity in one variable group and small dimension of the other

One-Point Gradient-Free Methods for Smooth and Non-Smooth Saddle-Point Problems

Recent theoretical advances in decentralized distributed convex optimization

Zeroth-order methods for noisy Hölder-gradient functions

A Dual Approach for Optimal Algorithms in Distributed Optimization over Networks

Accelerated and nonaccelerated stochastic gradient descent with inexact model

Accelerated and nonaccelerated stochastic gradient descent with model conception

Accelerated gradient sliding and variance reduction

Accelerated methods for composite non-bilinear saddle point problem

Adaptive Gradient Descent for Convex and Non-Convex Stochastic Optimization

ADMM-based Distributed State Estimation for Power Systems: Evaluation of Performance

Alternating Minimization Methods for Strongly Convex Optimization

An Accelerated Directional Derivative Method for Smooth Stochastic Convex Optimization

An Accelerated Method for Derivative-Free Smooth Stochastic Convex Optimization

Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters

Evolutionary interpretations of entropy model for correspondence matrix calculation

Finding equilibrium in two-stage traffic assignment model

Inexact Model: A Framework for Optimization and Variational Inequalities

Multimarginal Optimal Transport by Accelerated Alternating Minimization

Near-Optimal Hyperfast Second-Order Method for convex optimization and its Sliding

On the Complexity of Approximating Wasserstein Barycenter

On the Optimal Combination of Tensor Optimization Methods

Projected Gradient Method for Decentralized Optimization over Time-Varying Networks

Traffic assignment models. Numerical aspects

Universal gradient descent

Accelerated Directional Search with non-Euclidean prox-structure

On the upper bound for the mathematical expectation of the norm of a vector uniformly distributed on the sphere and the phenomenon of concentration of uniform measure on the sphere

The global rate of convergence for optimal tensor methods in smooth convex optimization

Walrasian Equilibrium and Centralized Distributed Optimization from the point of view of Modern Convex Optimization Methods on the Example of Resource Allocation Problem

Comparision of the definitions of generalized solution of the Cauchy problem for quasi-linear equation

About reduction of searching competetive equillibrium to the minimax problem in application to different network problems

Efficient calculation of stochastic equilibriums in the Beckmann's and stable dynamic models

Efficient numerical algorithms for regularized regression problem with applications to traffic matrix estimations

Efficient randomized algorithms for PageRank problem

Entropy linear programming

Fast Primal-Dual Gradient Method for Strongly Convex Minimization Problems with Linear Constraints

Gradient and gradient-free methods for stochastic convex optimization with inexact oracle

Gradient-free prox-methods with inexact oracle for stochastic convex optimization problems on a simplex

Learning Supervised PageRank with Gradient-Based and Gradient-Free Optimization Methods

Non accelerated efficient numerical methods for sparse quadratic optimization problems and its generalizations

On the relationship between imitative logit dynamics in the population game theory and mirror descent method in the online optimization using the example of the Shortest Path Problem

On the three-stage version of stable dynamic model

Primal-Dual Method for Searching Equilibrium in Hierarchical Congestion Population Games

Searching equillibriums in Beckmann's and Nesterov--de Palma's models

Searching of equilibriums in hierarchical congestion population games

Universal composite prox-method for strictly convex optimization problems