Source author record

Yurii Nesterov

Yurii Nesterov appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning econ.TH physics.soc-ph Populations and Evolution

Catalog footprint

What is connected

14works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Adaptive Third-Order Methods for Composite Convex Optimization

In this paper we propose third-order methods for composite convex optimization problems in which the smooth part is a three-times continuously differentiable function with Lipschitz continuous third-order derivatives. The methods are adaptive in the sense that they do not require the knowledge of the Lipschitz constant. Trial points are computed by the inexact minimization of models that consist in the nonsmooth part of the objective plus a quartic regularization of third-order Taylor polynomial of the smooth part. Specifically, approximate solutions of the auxiliary problems are obtained by using a Bregman gradient method as inner solver. Different from existing adaptive approaches for high-order methods, in our new schemes the regularization parameters are adjusted taking into account the progress of the inner solver. With this technique, we show that the basic method finds an $ε$-approximate minimizer of the objective function performing at most $\mathcal{O}\left(|\log(ε)|ε^{-\frac{1}{3}}\right)$ iterations of the inner solver. An accelerated adaptive third-order method is also presented with total inner iteration complexity of $\mathcal{O}\left(|\log(ε)|ε^{-\frac{1}{4}}\right)$.

preprint2022arXiv

Quartic Regularity

In this paper, we propose new linearly convergent second-order methods for minimizing convex quartic polynomials. This framework is applied for designing optimization schemes, which can solve general convex problems satisfying a new condition of quartic regularity. It assumes positive definiteness and boundedness of the fourth derivative of the objective function. For such problems, an appropriate quartic regularization of Damped Newton Method has global linear rate of convergence. We discuss several important consequences of this result. In particular, it can be used for constructing new second-order methods in the framework of high-order proximal-point schemes. These methods have convergence rate $\tilde O(k^{-p})$, where $k$ is the iteration counter, $p$ is equal to 3, 4, or 5, and tilde indicates the presence of logarithmic factors in the complexity bounds for the auxiliary problems, which are solved at each iteration of the schemes.

preprint2022arXiv

Super-Universal Regularized Newton Method

We analyze the performance of a variant of Newton method with quadratic regularization for solving composite convex minimization problems. At each step of our method, we choose regularization parameter proportional to a certain power of the gradient norm at the current point. We introduce a family of problem classes characterized by Hölder continuity of either the second or third derivative. Then we present the method with a simple adaptive search procedure allowing an automatic adjustment to the problem class with the best global complexity bounds, without knowing specific parameters of the problem. In particular, for the class of functions with Lipschitz continuous third derivative, we get the global $O(1/k^3)$ rate, which was previously attributed to third-order tensor methods. When the objective function is uniformly convex, we justify an automatic acceleration of our scheme, resulting in a faster global rate and local superlinear convergence. The switching between the different rates (sublinear, linear, and superlinear) is automatic. Again, for that, no a priori knowledge of parameters is needed.

preprint2021arXiv

Dynamic pricing under nested logit demand

Recently, there is growing interest and need for dynamic pricing algorithms, especially, in the field of online marketplaces by offering smart pricing options for big online stores. We present an approach to adjust prices based on the observed online market data. The key idea is to characterize optimal prices as minimizers of a total expected revenue function, which turns out to be convex. We assume that consumers face information processing costs, hence, follow a discrete choice demand model, and suppliers are equipped with quantity adjustment costs. We prove the strong smoothness of the total expected revenue function by deriving the strong convexity modulus of its dual. Our gradient-based pricing schemes outbalance supply and demand at the convergence rates of $\mathcal{O}(\frac{1}{t})$ and $\mathcal{O}(\frac{1}{t^2})$, respectively. This suggests that the imperfect behavior of consumers and suppliers helps to stabilize the market.

preprint2020arXiv

Affine-invariant contracting-point methods for Convex Optimization

In this paper, we develop new affine-invariant algorithms for solving composite convex minimization problems with bounded domain. We present a general framework of Contracting-Point methods, which solve at each iteration an auxiliary subproblem restricting the smooth part of the objective function onto contraction of the initial domain. This framework provides us with a systematic way for developing optimization methods of different order, endowed with the global complexity bounds. We show that using an appropriate affine-invariant smoothness condition, it is possible to implement one iteration of the Contracting-Point method by one step of the pure tensor method of degree $p \geq 1$. The resulting global rate of convergence in functional residual is then ${\cal O}(1 / k^p)$, where $k$ is the iteration counter. It is important that all constants in our bounds are affine-invariant. For $p = 1$, our scheme recovers well-known Frank-Wolfe algorithm, providing it with a new interpretation by a general perspective of tensor methods. Finally, within our framework, we present efficient implementation and total complexity analysis of the inexact second-order scheme $(p = 2)$, called Contracting Newton method. It can be seen as a proper implementation of the trust-region idea. Preliminary numerical results confirm its good practical performance both in the number of iterations, and in computational time.

preprint2020arXiv

On the Quality of First-Order Approximation of Functions with Hölder Continuous Gradient

We show that Hölder continuity of the gradient is not only a sufficient condition, but also a necessary condition for the existence of a global upper bound on the error of the first-order Taylor approximation. We also relate this global upper bound to the Hölder constant of the gradient. This relation is expressed as an interval, depending on the Hölder constant, in which the error of the first-order Taylor approximation is guaranteed to be. We show that, for the Lipschitz continuous case, the interval cannot be reduced. An application to the norms of quadratic forms is proposed, which allows us to derive a novel characterization of Euclidean norms.

preprint2020arXiv

Online analysis of epidemics with variable infection rate

In this paper, we continue development of the new epidemiological model HIT, which is suitable for analyzing and predicting the propagation of COVID-19 epidemics. This is a discrete-time model allowing a reconstruction of the dynamics of asymptomatic virus holders using the available daily statistics on the number of new cases. We suggest to use a new indicator, the total infection rate, to distinguish the propagation and recession modes of the epidemic. We check our indicator on the available data for eleven different countries and for the whole world. Our reconstructions are very precise. In several cases, we are able to detect the exact dates of the disastrous political decisions, ensuring the second wave of the epidemics. It appears that for all our examples the decisions made on the basis of the current number of new cases are wrong. In this paper, we suggest a reasonable alternative. Our analysis shows that all tested countries are in a dangerous zone except Sweden.

preprint2020arXiv

Stochastic Subspace Cubic Newton Method

In this paper, we propose a new randomized second-order optimization algorithm---Stochastic Subspace Cubic Newton (SSCN)---for minimizing a high dimensional convex function $f$. Our method can be seen both as a {\em stochastic} extension of the cubically-regularized Newton method of Nesterov and Polyak (2006), and a {\em second-order} enhancement of stochastic subspace descent of Kozak et al. (2019). We prove that as we vary the minibatch size, the global convergence rate of SSCN interpolates between the rate of stochastic coordinate descent (CD) and the rate of cubic regularized Newton, thus giving new insights into the connection between first and second-order methods. Remarkably, the local convergence rate of SSCN matches the rate of stochastic subspace descent applied to the problem of minimizing the quadratic function $\frac12 (x-x^*)^\top \nabla^2f(x^*)(x-x^*)$, where $x^*$ is the minimizer of $f$, and hence depends on the properties of $f$ at the optimum only. Our numerical experiments show that SSCN outperforms non-accelerated first-order CD algorithms while being competitive to their accelerated variants.

preprint2016arXiv

A Subgradient Method for Free Material Design

A small improvement in the structure of the material could save the manufactory a lot of money. The free material design can be formulated as an optimization problem. However, due to its large scale, second-order methods cannot solve the free material design problem in reasonable size. We formulate the free material optimization (FMO) problem into a saddle-point form in which the inverse of the stiffness matrix A(E) in the constraint is eliminated. The size of A(E) is generally large, denoted as N by N. This is the first formulation of FMO without A(E). We apply the primal-dual subgradient method [17] to solve the restricted saddle-point formula. This is the first gradient-type method for FMO. Each iteration of our algorithm takes a total of $O(N^2)$ foating-point operations and an auxiliary vector storage of size O(N), compared with formulations having the inverse of A(E) which requires $O(N^3)$ arithmetic operations and an auxiliary vector storage of size $O(N^2)$. To solve the problem, we developed a closed-form solution to a semidefinite least squares problem and an efficient parameter update scheme for the gradient method, which are included in the appendix. We also approximate a solution to the bounded Lagrangian dual problem. The problem is decomposed into small problems each only having an unknown of k by k (k = 3 or 6) matrix, and can be solved in parallel. The iteration bound of our algorithm is optimal for general subgradient scheme. Finally we present promising numerical results.

preprint2016arXiv

Entropy linear programming

We propose an efficient dual algorithm for ELP based on Fast Gradient Method. The basic idea - to solve properly regularized dual problem.

preprint2016arXiv

Learning Supervised PageRank with Gradient-Based and Gradient-Free Optimization Methods

In this paper, we consider a non-convex loss-minimization problem of learning Supervised PageRank models, which can account for some properties not considered by classical approaches such as the classical PageRank model. We propose gradient-based and random gradient-free methods to solve this problem. Our algorithms are based on the concept of an inexact oracle and unlike the state state-of-the-art gradient-based method we manage to provide theoretically the convergence rate guarantees for both of them. In particular, under the assumption of local convexity of the loss function, our random gradient-free algorithm guarantees decrease of the loss function value expectation. At the same time, we theoretically justify that without convexity assumption for the loss function our gradient-based algorithm allows to find a point where the stationary condition is fulfilled with a given accuracy. For both proposed optimization algorithms, we find the settings of hyperparameters which give the lowest complexity (i.e., the number of arithmetic operations needed to achieve the given accuracy of the solution of the loss-minimization problem). The resulting estimates of the complexity are also provided. Finally, we apply proposed optimization algorithms to the web page ranking problem and compare proposed and state-of-the-art algorithms in terms of the considered loss function.

preprint2016arXiv

On the three-stage version of stable dynamic model

An attempt to merge into a single model, which reduces to the solution of non-smooth convex optimization problem: calculation model of OD-matrix (entropy model), the mode split model and the model of the equilibrium distribution of flows (Stable dynamic model, Nesterov - de Palma, 2003). To best of our knowledge, this is the first attempt to combine this three models. Previously such attempts were done for other types of equlibrium models, mainly with the BMW-model (1955), the calibration of which is significantly more difficult. We also remark, that our model much better then traditional from computational point of view.

preprint2016arXiv

Universal method with inexact oracle and its applications for searching equillibriums in multistage transport problems

In this paper we propose a new efficient approach for numerical calculation of equillibriums in multistage transport problems. In the very core of our approach lies the proper combination of Universal Gradient Method proposed by Yu. Nesterov (2013) and conception of inexact oracle (Devolder--Glineur--Nesterov, 2011). In particular our technique allows us to calculate Wasserstein's Barycenter in a fast manner (this results generalized M. Cuturi et al. (2014)).

preprint2015arXiv

Learning Supervised PageRank with Gradient-Free Optimization Methods

In this paper, we consider a problem of learning supervised PageRank models, which can account for some properties not considered by classical approaches such as the classical PageRank algorithm. Due to huge hidden dimension of the optimization problem we use random gradient-free methods to solve it. We prove a convergence theorem and estimate the number of arithmetic operations needed to solve it with a given accuracy. We find the best settings of the gradient-free optimization method in terms of the number of arithmetic operations needed to achieve given accuracy of the objective. In the paper, we apply our algorithm to the web page ranking problem. We consider a parametric graph model of users' behavior and evaluate web pages' relevance to queries by our algorithm. The experiments show that our optimization method outperforms the untuned gradient-free method in the ranking quality.

Yurii Nesterov

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Adaptive Third-Order Methods for Composite Convex Optimization

Quartic Regularity

Super-Universal Regularized Newton Method

Dynamic pricing under nested logit demand

Affine-invariant contracting-point methods for Convex Optimization

On the Quality of First-Order Approximation of Functions with Hölder Continuous Gradient

Online analysis of epidemics with variable infection rate

Stochastic Subspace Cubic Newton Method

A Subgradient Method for Free Material Design

Entropy linear programming

Learning Supervised PageRank with Gradient-Based and Gradient-Free Optimization Methods

On the three-stage version of stable dynamic model

Universal method with inexact oracle and its applications for searching equillibriums in multistage transport problems

Learning Supervised PageRank with Gradient-Free Optimization Methods