Researcher profile

Pavel Dvurechensky

Pavel Dvurechensky contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
20works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

20 published item(s)

preprint2023arXiv

Accelerated gradient methods with absolute and relative noise in the gradient

In this paper, we investigate accelerated first-order methods for smooth convex optimization problems under inexact information on the gradient of the objective. The noise in the gradient is considered to be additive with two possibilities: absolute noise bounded by a constant, and relative noise proportional to the norm of the gradient. We investigate the accumulation of the errors in the convex and strongly convex settings with the main difference with most of the previous works being that the feasible set can be unbounded. The key to the latter is to prove a bound on the trajectory of the algorithm. We also give a stopping criterion for the algorithm and consider extensions to the cases of stochastic optimization and composite nonsmooth problems.

preprint2022arXiv

An Approach for Non-Convex Uniformly Concave Structured Saddle Point Problem

Recently, saddle point problems have received much attention due to their powerful modeling capability for a lot of problems from diverse domains. Applications of these problems occur in many applied areas, such as robust optimization, distributed optimization, game theory, and many applications in machine learning such as empirical risk minimization and generative adversarial networks training. Therefore, many researchers have actively worked on developing numerical methods for solving saddle point problems in many different settings. This paper is devoted to developing a numerical method for solving saddle point problems in the non-convex uniformly-concave setting. We study a general class of saddle point problems with composite structure and Hölder-continuous higher-order derivatives. To solve the problem under consideration, we propose an approach in which we reduce the problem to a combination of two auxiliary optimization problems separately for each group of variables, outer minimization problem w.r.t. primal variables, and inner maximization problem w.r.t the dual variables. For solving the outer minimization problem, we use the \textit{Adaptive Gradient Method}, which is applicable for non-convex problems and also works with an inexact oracle that is generated by approximately solving the inner problem. For solving the inner maximization problem, we use the \textit{Restarted Unified Acceleration Framework}, which is a framework that unifies the high-order acceleration methods for minimizing a convex function that has Hölder-continuous higher-order derivatives. Separate complexity bounds are provided for the number of calls to the first-order oracles for the outer minimization problem and higher-order oracles for the inner maximization problem. Moreover, the complexity of the whole proposed approach is then estimated.

preprint2022arXiv

Decentralized convex optimization under affine constraints for power systems control

Modern power systems are now in continuous process of massive changes. Increased penetration of distributed generation, usage of energy storage and controllable demand require introduction of a new control paradigm that does not rely on massive information exchange required by centralized approaches. Distributed algorithms can rely only on limited information from neighbours to obtain an optimal solution for various optimization problems, such as optimal power flow, unit commitment etc. As a generalization of these problems we consider the problem of decentralized minimization of the smooth and convex partially separable function $f = \sum_{k=1}^l f^k(x^k,\tilde x)$ under the coupled $\sum_{k=1}^l (A^k x^k - b^k) \leq 0$ and the shared $\tilde{A} \tilde{x} - \tilde{b} \leq 0$ affine constraints, where the information about $A^k$ and $b^k$ is only available for the $k$-th node of the computational network. One way to handle the coupled constraints in a distributed manner is to rewrite them in a distributed-friendly form using the Laplace matrix of the communication graph and auxiliary variables (Khamisov, CDC, 2017). Instead of using this method we reformulate the constrained optimization problem as a saddle point problem (SPP) and utilize the consensus constraint technique to make it distributed-friendly. Then we provide a complexity analysis for state-of-the-art SPP solving algorithms applied to this SPP.

preprint2022arXiv

Generalized Mirror Prox for Monotone Variational Inequalities: Universality and Inexact Oracle

We introduce an inexact oracle model for variational inequalities (VI) with monotone operator, propose a numerical method which solves such VI's and analyze its convergence rate. As a particular case, we consider VI's with Hölder-continuous operator and show that our algorithm is universal. This means that without knowing the Hölder parameter $ν$ and Hölder constant $L_ν$ it has the best possible complexity for this class of VI's, namely our algorithm has complexity $O\left( \inf_{ν\in[0,1]}\left(\frac{L_ν}{\varepsilon} \right)^{\frac{2}{1+ν}}R^2 \right)$, where $R$ is the size of the feasible set and $\varepsilon$ is the desired accuracy of the solution. We also consider the case of VI's with strongly monotone operator and generalize our method for VI's with inexact oracle and our universal method for this class of problems. Finally, we show, how our method can be applied to convex-concave saddle point problems with Hölder-continuous partial subgradients.

preprint2022arXiv

Oracle Complexity Separation in Convex Optimization

Many convex optimization problems have structured objective function written as a sum of functions with different types of oracles (full gradient, coordinate derivative, stochastic gradient) and different evaluation complexity of these oracles. In the strongly convex case these functions also have different condition numbers, which eventually define the iteration complexity of first-order methods and the number of oracle calls required to achieve given accuracy. Motivated by the desire to call more expensive oracle less number of times, in this paper we consider minimization of a sum of two functions and propose a generic algorithmic framework to separate oracle complexities for each component in the sum. As a specific example, for the $μ$-strongly convex problem $\min_{x\in \mathbb{R}^n} h(x) + g(x)$ with $L_h$-smooth function $h$ and $L_g$-smooth function $g$, a special case of our algorithm requires, up to a logarithmic factor, $O(\sqrt{L_h/μ})$ first-order oracle calls for $h$ and $O(\sqrt{L_g/μ})$ first-order oracle calls for $g$. Our general framework covers also the setting of strongly convex objectives, the setting when $g$ is given by coordinate derivative oracle, and the setting when $g$ has a finite-sum structure and is available through stochastic gradient oracle. In the latter two cases we obtain respectively accelerated random coordinate descent and accelerated variance reduction methods with oracle complexity separation.

preprint2021arXiv

First-Order Methods for Convex Optimization

First-order methods for solving convex optimization problems have been at the forefront of mathematical optimization in the last 20 years. The rapid development of this important class of algorithms is motivated by the success stories reported in various applications, including most importantly machine learning, signal processing, imaging and control theory. First-order methods have the potential to provide low accuracy solutions at low computational complexity which makes them an attractive set of tools in large-scale optimization problems. In this survey we cover a number of key developments in gradient-based optimization methods. This includes non-Euclidean extensions of the classical proximal gradient method, and its accelerated versions. Additionally we survey recent developments within the class of projection-free methods, and proximal versions of primal-dual schemes. We give complete proofs for various key results, and highlight the unifying aspects of several optimization algorithms.

preprint2021arXiv

Numerical methods for the resource allocation problem in networks

In this paper, we consider the resource allocation problem in a network with a large number of connections which are used by a huge number of users. The resource allocation problem under discussion is a maximization problem with linear inequality constraints. To solve this problem we construct the dual problem and propose to use the following numerical optimization methods for the dual: a fast gradient method, a stochastic projected subgradient method, an ellipsoid method, and a random gradient extrapolation method. A special focus is made on the primal-dual analysis of these methods. For each method we estimate the convergence rate. We also provide some modifications of these methods in the setup of distributed computations, taking into account their application to networks.

preprint2021arXiv

Zeroth-order methods for noisy Hölder-gradient functions

In this paper, we prove new complexity bounds for zeroth-order methods in non-convex optimization with inexact observations of the objective function values. We use the Gaussian smoothing approach of Nesterov and Spokoiny [2015] and extend their results, obtained for optimization methods for smooth zeroth-order non-convex problems, to the setting of minimization of functions with Hölder-continuous gradient with noisy zeroth-order oracle, obtaining noise upper-bounds as well. We consider finite-difference gradient approximation based on normally distributed random Gaussian vectors and prove that gradient descent scheme based on this approximation converges to the stationary point of the smoothed function. We also consider convergence to the stationary point of the original (not smoothed) function and obtain bounds on the number of steps of the algorithm for making the norm of its gradient small. Additionally, we provide bounds for the level of noise in the zeroth-order oracle for which it is still possible to guarantee that the above bounds hold. We also consider separately the case of $ν= 1$ and show that in this case the dependence of the obtained bounds on the dimension can be improved.

preprint2020arXiv

Adaptive Gradient Descent for Convex and Non-Convex Stochastic Optimization

In this paper we propose several adaptive gradient methods for stochastic optimization. Unlike AdaGrad-type of methods, our algorithms are based on Armijo-type line search and they simultaneously adapt to the unknown Lipschitz constant of the gradient and variance of the stochastic approximation for the gradient. We consider an accelerated and non-accelerated gradient descent for convex problems and gradient descent for non-convex problems. In the experiments we demonstrate superiority of our methods to existing adaptive methods, e.g. AdaGrad and Adam.

preprint2020arXiv

Alternating Minimization Methods for Strongly Convex Optimization

{We consider alternating minimization procedures for convex optimization problems with variable divided in many block, each block being amenable for minimization with respect to its variable with freezed other variables blocks. In the case of two blocks, we prove a linear convergence rate for alternating minimization procedure under Polyak-Lojasiewicz condition, which can be seen as a relaxation of the strong convexity assumption. Under strong convexity assumption in many-blocks setting we provide an accelerated alternating minimization procedure with linear rate depending on the square root of the condition number as opposed to condition number for the non-accelerated method. We also mention an approximating non-negative solution to a linear system of equations $Ax=y$ with alternating minimization of Kullback-Leibler (KL) divergence between $Ax$ and $y$.

preprint2020arXiv

An Accelerated Directional Derivative Method for Smooth Stochastic Convex Optimization

We consider smooth stochastic convex optimization problems in the context of algorithms which are based on directional derivatives of the objective function. This context can be considered as an intermediate one between derivative-free optimization and gradient-based optimization. We assume that at any given point and for any given direction, a stochastic approximation for the directional derivative of the objective function at this point and in this direction is available with some additive noise. The noise is assumed to be of an unknown nature, but bounded in the absolute value. We underline that we consider directional derivatives in any direction, as opposed to coordinate descent methods which use only derivatives in coordinate directions. For this setting, we propose a non-accelerated and an accelerated directional derivative method and provide their complexity bounds. Our non-accelerated algorithm has a complexity bound which is similar to the gradient-based algorithm, that is, without any dimension-dependent factor. Our accelerated algorithm has a complexity bound which coincides with the complexity bound of the accelerated gradient-based algorithm up to a factor of square root of the problem dimension. We extend these results to strongly convex problems.

preprint2020arXiv

An Accelerated Method for Derivative-Free Smooth Stochastic Convex Optimization

We consider an unconstrained problem of minimizing a smooth convex function which is only available through noisy observations of its values, the noise consisting of two parts. Similar to stochastic optimization problems, the first part is of stochastic nature. The second part is additive noise of unknown nature, but bounded in absolute value. In the two-point feedback setting, i.e. when pairs of function values are available, we propose an accelerated derivative-free algorithm together with its complexity analysis. The complexity bound of our derivative-free algorithm is only by a factor of $\sqrt{n}$ larger than the bound for accelerated gradient-based algorithms, where $n$ is the dimension of the decision variable. We also propose a non-accelerated derivative-free algorithm with a complexity bound similar to the stochastic-gradient-based algorithm, that is, our bound does not have any dimension-dependent factor except logarithmic. Notably, if the difference between the starting point and the solution is a sparse vector, for both our algorithms, we obtain a better complexity bound if the algorithm uses an $1$-norm proximal setup, rather than the Euclidean proximal setup, which is a standard choice for unconstrained problems

preprint2020arXiv

Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters

We study the decentralized distributed computation of discrete approximations for the regularized Wasserstein barycenter of a finite set of continuous probability measures distributedly stored over a network. We assume there is a network of agents/machines/computers, and each agent holds a private continuous probability measure and seeks to compute the barycenter of all the measures in the network by getting samples from its local measure and exchanging information with its neighbors. Motivated by this problem, we develop, and analyze, a novel accelerated primal-dual stochastic gradient method for general stochastic convex optimization problems with linear equality constraints. Then, we apply this method to the decentralized distributed optimization setting to obtain a new algorithm for the distributed semi-discrete regularized Wasserstein barycenter problem. Moreover, we show explicit non-asymptotic complexity for the proposed algorithm.

preprint2020arXiv

Inexact Model: A Framework for Optimization and Variational Inequalities

In this paper we propose a general algorithmic framework for first-order methods in optimization in a broad sense, including minimization problems, saddle-point problems and variational inequalities. This framework allows to obtain many known methods as a special case, the list including accelerated gradient method, composite optimization methods, level-set methods, proximal methods. The idea of the framework is based on constructing an inexact model of the main problem component, i.e. objective function in optimization or operator in variational inequalities. Besides reproducing known results, our framework allows to construct new methods, which we illustrate by constructing a universal method for variational inequalities with composite structure. This method works for smooth and non-smooth problems with optimal complexity without a priori knowledge of the problem smoothness. We also generalize our framework for strongly convex objectives and strongly monotone variational inequalities.

preprint2020arXiv

Multimarginal Optimal Transport by Accelerated Alternating Minimization

We consider a multimarginal optimal transport, which includes as a particular case the Wasserstein barycenter problem. In this problem one has to find an optimal coupling between $m$ probability measures, which amounts to finding a tensor of the order $m$. We propose an accelerated method based on accelerated alternating minimization and estimate its complexity to find the approximate solution to the problem. We use entropic regularization with sufficiently small regularization parameter and apply accelerated alternating minimization to the dual problem. A novel primal-dual analysis is used to reconstruct the approximately optimal coupling tensor. Our algorithm exhibits a better computational complexity than the state-of-the-art methods for some regimes of the problem parameters.

preprint2020arXiv

On the Complexity of Approximating Wasserstein Barycenter

We study the complexity of approximating Wassertein barycenter of $m$ discrete measures, or histograms of size $n$ by contrasting two alternative approaches, both using entropic regularization. The first approach is based on the Iterative Bregman Projections (IBP) algorithm for which our novel analysis gives a complexity bound proportional to $\frac{mn^2}{\varepsilon^2}$ to approximate the original non-regularized barycenter. Using an alternative accelerated-gradient-descent-based approach, we obtain a complexity proportional to $\frac{mn^{2.5}}{\varepsilon} $. As a byproduct, we show that the regularization parameter in both approaches has to be proportional to $\varepsilon$, which causes instability of both algorithms when the desired accuracy is high. To overcome this issue, we propose a novel proximal-IBP algorithm, which can be seen as a proximal gradient method, which uses IBP on each iteration to make a proximal step. We also consider the question of scalability of these algorithms using approaches from distributed optimization and show that the first algorithm can be implemented in a centralized distributed setting (master/slave), while the second one is amenable to a more general decentralized distributed setting with an arbitrary network topology.

preprint2020arXiv

On the Optimal Combination of Tensor Optimization Methods

We consider the minimization problem of a sum of a number of functions having Lipshitz $p$-th order derivatives with different Lipschitz constants. In this case, to accelerate optimization, we propose a general framework allowing to obtain near-optimal oracle complexity for each function in the sum separately, meaning, in particular, that the oracle for a function with lower Lipschitz constant is called a smaller number of times. As a building block, we extend the current theory of tensor methods and show how to generalize near-optimal tensor methods to work with inexact tensor step. Further, we investigate the situation when the functions in the sum have Lipschitz derivatives of a different order. For this situation, we propose a generic way to separate the oracle complexity between the parts of the sum. Our method is not optimal, which leads to an open problem of the optimal combination of oracles of a different order.

preprint2020arXiv

Self-Concordant Analysis of Frank-Wolfe Algorithms

Projection-free optimization via different variants of the Frank-Wolfe (FW), a.k.a. Conditional Gradient method has become one of the cornerstones in optimization for machine learning since in many cases the linear minimization oracle is much cheaper to implement than projections and some sparsity needs to be preserved. In a number of applications, e.g. Poisson inverse problems or quantum state tomography, the loss is given by a self-concordant (SC) function having unbounded curvature, implying absence of theoretical guarantees for the existing FW methods. We use the theory of SC functions to provide a new adaptive step size for FW methods and prove global convergence rate O(1/k) after k iterations. If the problem admits a stronger local linear minimization oracle, we construct a novel FW method with linear convergence rate for SC functions.

preprint2019arXiv

The global rate of convergence for optimal tensor methods in smooth convex optimization

We consider convex optimization problems with the objective function having Lipshitz-continuous $p$-th order derivative, where $p\geq 1$. We propose a new tensor method, which closes the gap between the lower $O\left(\varepsilon^{-\frac{2}{3p+1}} \right)$ and upper $O\left(\varepsilon^{-\frac{1}{p+1}} \right)$ iteration complexity bounds for this class of optimization problems. We also consider uniformly convex functions, and show how the proposed method can be accelerated under this additional assumption. Moreover, we introduce a $p$-th order condition number which naturally arises in the complexity analysis of tensor methods under this assumption. Finally, we make a numerical study of the proposed optimal method and show that in practice it is faster than the best known accelerated tensor method. We also compare the performance of tensor methods for $p=2$ and $p=3$ and show that the 3rd-order method is superior to the 2nd-order method in practice.