Source author record

Alexander Tyurin

Alexander Tyurin appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning

Catalog footprint

What is connected

11works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

A Computation and Communication Efficient Method for Distributed Nonconvex Problems in the Partial Participation Setting

We present a new method that includes three key components of distributed optimization and federated learning: variance reduction of stochastic gradients, partial participation, and compressed communication. We prove that the new method has optimal oracle complexity and state-of-the-art communication complexity in the partial participation setting. Regardless of the communication compression feature, our method successfully combines variance reduction and partial participation: we get the optimal oracle complexity, never need the participation of all nodes, and do not require the bounded gradients (dissimilarity) assumption.

preprint2022arXiv

DASHA: Distributed Nonconvex Optimization with Communication Compression, Optimal Oracle Complexity, and No Client Synchronization

We develop and analyze DASHA: a new family of methods for nonconvex distributed optimization problems. When the local functions at the nodes have a finite-sum or an expectation form, our new methods, DASHA-PAGE and DASHA-SYNC-MVR, improve the theoretical oracle and communication complexity of the previous state-of-the-art method MARINA by Gorbunov et al. (2020). In particular, to achieve an epsilon-stationary point, and considering the random sparsifier RandK as an example, our methods compute the optimal number of gradients $\mathcal{O}\left(\frac{\sqrt{m}}{\varepsilon\sqrt{n}}\right)$ and $\mathcal{O}\left(\fracσ{\varepsilon^{3/2}n}\right)$ in finite-sum and expectation form cases, respectively, while maintaining the SOTA communication complexity $\mathcal{O}\left(\frac{d}{\varepsilon \sqrt{n}}\right)$. Furthermore, unlike MARINA, the new methods DASHA, DASHA-PAGE and DASHA-MVR send compressed vectors only and never synchronize the nodes, which makes them more practical for federated learning. We extend our results to the case when the functions satisfy the Polyak-Lojasiewicz condition. Finally, our theory is corroborated in practice: we see a significant improvement in experiments with nonconvex classification and training of deep learning models.

preprint2022arXiv

Oracle Complexity Separation in Convex Optimization

Many convex optimization problems have structured objective function written as a sum of functions with different types of oracles (full gradient, coordinate derivative, stochastic gradient) and different evaluation complexity of these oracles. In the strongly convex case these functions also have different condition numbers, which eventually define the iteration complexity of first-order methods and the number of oracle calls required to achieve given accuracy. Motivated by the desire to call more expensive oracle less number of times, in this paper we consider minimization of a sum of two functions and propose a generic algorithmic framework to separate oracle complexities for each component in the sum. As a specific example, for the $μ$-strongly convex problem $\min_{x\in \mathbb{R}^n} h(x) + g(x)$ with $L_h$-smooth function $h$ and $L_g$-smooth function $g$, a special case of our algorithm requires, up to a logarithmic factor, $O(\sqrt{L_h/μ})$ first-order oracle calls for $h$ and $O(\sqrt{L_g/μ})$ first-order oracle calls for $g$. Our general framework covers also the setting of strongly convex objectives, the setting when $g$ is given by coordinate derivative oracle, and the setting when $g$ has a finite-sum structure and is available through stochastic gradient oracle. In the latter two cases we obtain respectively accelerated random coordinate descent and accelerated variance reduction methods with oracle complexity separation.

preprint2022arXiv

Sharper Rates and Flexible Framework for Nonconvex SGD with Client and Data Sampling

We revisit the classical problem of finding an approximately stationary point of the average of $n$ smooth and possibly nonconvex functions. The optimal complexity of stochastic first-order methods in terms of the number of gradient evaluations of individual functions is $\mathcal{O}\left(n + n^{1/2}\varepsilon^{-1}\right)$, attained by the optimal SGD methods $\small\sf\color{green}{SPIDER}$(arXiv:1807.01695) and $\small\sf\color{green}{PAGE}$(arXiv:2008.10898), for example, where $\varepsilon$ is the error tolerance. However, i) the big-$\mathcal{O}$ notation hides crucial dependencies on the smoothness constants associated with the functions, and ii) the rates and theory in these methods assume simplistic sampling mechanisms that do not offer any flexibility. In this work we remedy the situation. First, we generalize the $\small\sf\color{green}{PAGE}$ algorithm so that it can provably work with virtually any (unbiased) sampling mechanism. This is particularly useful in federated learning, as it allows us to construct and better understand the impact of various combinations of client and data sampling strategies. Second, our analysis is sharper as we make explicit use of certain novel inequalities that capture the intricate interplay between the smoothness constants and the sampling procedure. Indeed, our analysis is better even for the simple sampling procedure analyzed in the $\small\sf\color{green}{PAGE}$ paper. However, this already improved bound can be further sharpened by a different sampling scheme which we propose. In summary, we provide the most general and most accurate analysis of optimal SGD in the smooth nonconvex regime. Finally, our theoretical findings are supposed with carefully designed experiments.

preprint2020arXiv

Accelerated and nonaccelerated stochastic gradient descent with inexact model

In this paper, we propose a new way to obtain optimal convergence rates for smooth stochastic (strong) convex optimization tasks. Our approach is based on results for optimization tasks where gradients have nonrandom noise. In contrast to previously known results, we extend our idea to the inexact model conception.

preprint2020arXiv

Accelerated and nonaccelerated stochastic gradient descent with model conception

In this paper, we describe a new way to get convergence rates for optimal methods in smooth (strongly) convex optimization tasks. Our approach is based on results for tasks where gradients have nonrandom small noises. Unlike previous results, we obtain convergence rates with model conception.

preprint2020arXiv

Accelerated gradient sliding and variance reduction

We consider sum-type strongly convex optimization problem (first term) with smooth convex not proximal friendly composite (second term). We show that the complexity of this problem can be split into optimal number of incremental oracle calls for the first (sum-type) term and optimal number of oracle calls for the second (composite) term. Here under `optimal number' we mean estimate that corresponds to the well known lower bound in the absence of another term.

preprint2020arXiv

Adaptive Gradient Descent for Convex and Non-Convex Stochastic Optimization

In this paper we propose several adaptive gradient methods for stochastic optimization. Unlike AdaGrad-type of methods, our algorithms are based on Armijo-type line search and they simultaneously adapt to the unknown Lipschitz constant of the gradient and variance of the stochastic approximation for the gradient. We consider an accelerated and non-accelerated gradient descent for convex problems and gradient descent for non-convex problems. In the experiments we demonstrate superiority of our methods to existing adaptive methods, e.g. AdaGrad and Adam.

preprint2020arXiv

Development of a method for solving structural optimization problems

In practice, optimization tasks have some structure that allows developing new algorithms for every problem with faster convergence rates. Using the structure of optimization tasks, we can propose algorithms with more optimistic convergence rates for the following optimization problems: functions with Holder continuous gradients, superposition of functions (min-max problems), transportation problems, clustering by electorial model. In this work, we propose the unification of gradient-type methods into one method using a special concept of inexact model and develop a series of methods that can solve generalized optimization problem statements and use its structure with the aid of the proposed concept of inexact model. We constructed the gradient method for problems with relative smoothness, the primal--dual adaptive gradient and fast gradient methods, and the stochastic nonadaptive gradient methods that support an inexact model of a function. Moreover, the concept of inexact model is supported by different examples of optimization problems.

preprint2020arXiv

Inexact Model: A Framework for Optimization and Variational Inequalities

In this paper we propose a general algorithmic framework for first-order methods in optimization in a broad sense, including minimization problems, saddle-point problems and variational inequalities. This framework allows to obtain many known methods as a special case, the list including accelerated gradient method, composite optimization methods, level-set methods, proximal methods. The idea of the framework is based on constructing an inexact model of the main problem component, i.e. objective function in optimization or operator in variational inequalities. Besides reproducing known results, our framework allows to construct new methods, which we illustrate by constructing a universal method for variational inequalities with composite structure. This method works for smooth and non-smooth problems with optimal complexity without a priori knowledge of the problem smoothness. We also generalize our framework for strongly convex objectives and strongly monotone variational inequalities.

preprint2019arXiv

Heuristic adaptive fast gradient method in stochastic optimization tasks

In this paper, we present a heuristic adaptive fast gradient method. We show that in practice our method has a better convergence rate than popular today optimization methods. Moreover, we justify our method and point out some problems that do not allow us to obtain theoretical estimates.

Alexander Tyurin

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

A Computation and Communication Efficient Method for Distributed Nonconvex Problems in the Partial Participation Setting

DASHA: Distributed Nonconvex Optimization with Communication Compression, Optimal Oracle Complexity, and No Client Synchronization

Oracle Complexity Separation in Convex Optimization

Sharper Rates and Flexible Framework for Nonconvex SGD with Client and Data Sampling

Accelerated and nonaccelerated stochastic gradient descent with inexact model

Accelerated and nonaccelerated stochastic gradient descent with model conception

Accelerated gradient sliding and variance reduction

Adaptive Gradient Descent for Convex and Non-Convex Stochastic Optimization

Development of a method for solving structural optimization problems

Inexact Model: A Framework for Optimization and Variational Inequalities

Heuristic adaptive fast gradient method in stochastic optimization tasks