Source author record

Jérôme Bolte

Jérôme Bolte appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning Artificial Intelligence math.AP math.NA Numerical Analysis

Catalog footprint

What is connected

12works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Automatic differentiation of nonsmooth iterative algorithms

Differentiation along algorithms, i.e., piggyback propagation of derivatives, is now routinely used to differentiate iterative solvers in differentiable programming. Asymptotics is well understood for many smooth problems but the nondifferentiable case is hardly considered. Is there a limiting object for nonsmooth piggyback automatic differentiation (AD)? Does it have any variational meaning and can it be used effectively in machine learning? Is there a connection with classical derivative? All these questions are addressed under appropriate nonexpansivity conditions in the framework of conservative derivatives which has proved useful in understanding nonsmooth AD. For nonsmooth piggyback iterations, we characterize the attractor set of nonsmooth piggyback iterations as a set-valued fixed point which remains in the conservative framework. This has various consequences and in particular almost everywhere convergence of classical derivatives. Our results are illustrated on parametric convex optimization problems with forward-backward, Douglas-Rachford and Alternating Direction of Multiplier algorithms as well as the Heavy-Ball method.

preprint2022arXiv

Nonsmooth Implicit Differentiation for Machine Learning and Optimization

In view of training increasingly complex learning architectures, we establish a nonsmooth implicit function theorem with an operational calculus. Our result applies to most practical problems (i.e., definable problems) provided that a nonsmooth form of the classical invertibility condition is fulfilled. This approach allows for formal subdifferentiation: for instance, replacing derivatives by Clarke Jacobians in the usual differentiation formulas is fully justified for a wide class of nonsmooth problems. Moreover this calculus is entirely compatible with algorithmic differentiation (e.g., backpropagation). We provide several applications such as training deep equilibrium networks, training neural nets with conic optimization layers, or hyperparameter-tuning for nonsmooth Lasso-type models. To show the sharpness of our assumptions, we present numerical experiments showcasing the extremely pathological gradient dynamics one can encounter when applying implicit algorithmic differentiation without any hypothesis.

preprint2022arXiv

Swarm gradient dynamics for global optimization: the density case

Using jointly geometric and stochastic reformulations of nonconvex problems and exploiting a Monge-Kantorovich gradient system formulation with vanishing forces, we formally extend the simulated annealing method to a wide class of global optimization methods. Due to an inbuilt combination of a gradient-like strategy and particles interactions, we call them swarm gradient dynamics. As in the original paper of Holley-Kusuoka-Stroock, the key to the existence of a schedule ensuring convergence to a global minimizer is a functional inequality. One of our central theoretical contributions is the proof of such an inequality for one-dimensional compact manifolds. We conjecture the inequality to be true in a much wider setting. We also describe a general method allowing for global optimization and evidencing the crucial role of functional inequalities {à} la Łojasiewicz.

preprint2022arXiv

The Iterates of the Frank-Wolfe Algorithm May Not Converge

The Frank-Wolfe algorithm is a popular method for minimizing a smooth convex function $f$ over a compact convex set $\mathcal{C}$. While many convergence results have been derived in terms of function values, hardly nothing is known about the convergence behavior of the sequence of iterates $(x_t)_{t\in\mathbb{N}}$. Under the usual assumptions, we design several counterexamples to the convergence of $(x_t)_{t\in\mathbb{N}}$, where $f$ is $d$-time continuously differentiable, $d\geq2$, and $f(x_t)\to\min_\mathcal{C}f$. Our counterexamples cover the cases of open-loop, closed-loop, and line-search step-size strategies. We do not assume \emph{misspecification} of the linear minimization oracle and our results thus hold regardless of the points it returns, demonstrating the fundamental pathologies in the convergence behavior of $(x_t)_{t\in\mathbb{N}}$.

preprint2021arXiv

Optimal Complexity and Certification of Bregman First-Order Methods

We provide a lower bound showing that the $O(1/k)$ convergence rate of the NoLips method (a.k.a. Bregman Gradient) is optimal for the class of functions satisfying the $h$-smoothness assumption. This assumption, also known as relative smoothness, appeared in the recent developments around the Bregman Gradient method, where acceleration remained an open issue. On the way, we show how to constructively obtain the corresponding worst-case functions by extending the computer-assisted performance estimation framework of Drori and Teboulle (Mathematical Programming, 2014) to Bregman first-order methods, and to handle the classes of differentiable and strictly convex functions.

preprint2021arXiv

Quartic First-Order Methods for Low-Rank Minimization

We study a generalized nonconvex Burer-Monteiro formulation for low-rank minimization problems. We use recent results on non-Euclidean first order methods to provide efficient and scalable algorithms. Our approach uses geometries induced by quartic kernels on matrix spaces; for unconstrained cases we introduce a novel family of Gram kernels that considerably improves numerical performances. Numerical experiments for Euclidean distance matrix completion and symmetric nonnegative matrix factorization show that our algorithms scale well and reach state of the art performance when compared to specialized methods.

preprint2021arXiv

Second-order step-size tuning of SGD for non-convex optimization

In view of a direct and simple improvement of vanilla SGD, this paper presents a fine-tuning of its step-sizes in the mini-batch case. For doing so, one estimates curvature, based on a local quadratic model and using only noisy gradient approximations. One obtains a new stochastic first-order method (Step-Tuned SGD), enhanced by second-order information, which can be seen as a stochastic version of the classical Barzilai-Borwein method. Our theoretical results ensure almost sure convergence to the critical set and we provide convergence rates. Experiments on deep residual network training illustrate the favorable properties of our approach. For such networks we observe, during training, both a sudden drop of the loss and an improvement of test accuracy at medium stages, yielding better results than SGD, RMSprop, or ADAM.

preprint2020arXiv

A Hölderian backtracking method for min-max and min-min problems

We present a new algorithm to solve min-max or min-min problems out of the convex world. We use rigidity assumptions, ubiquitous in learning, making our method applicable to many optimization problems. Our approach takes advantage of hidden regularity properties and allows us to devise a simple algorithm of ridge type. An original feature of our method is to come with automatic step size adaptation which departs from the usual overly cautious backtracking methods. In a general framework, we provide convergence theoretical guarantees and rates. We apply our findings on simple GAN problems obtaining promising numerical results.

preprint2020arXiv

Conservative set valued fields, automatic differentiation, stochastic gradient method and deep learning

Modern problems in AI or in numerical analysis require nonsmooth approaches with a flexible calculus. We introduce generalized derivatives called conservative fields for which we develop a calculus and provide representation formulas. Functions having a conservative field are called path differentiable: convex, concave, Clarke regular and any semialgebraic Lipschitz continuous functions are path differentiable. Using Whitney stratification techniques for semialgebraic and definable sets, our model provides variational formulas for nonsmooth automatic differentiation oracles, as for instance the famous backpropagation algorithm in deep learning. Our differential model is applied to establish the convergence in values of nonsmooth stochastic gradient methods as they are implemented in practice.

preprint2016arXiv

From error bounds to the complexity of first-order descent methods for convex functions

This paper shows that error bounds can be used as effective tools for deriving complexity results for first-order descent methods in convex minimization. In a first stage, this objective led us to revisit the interplay between error bounds and the Kurdyka-Łojasiewicz (KL) inequality. One can show the equivalence between the two concepts for convex functions having a moderately flat profile near the set of minimizers (as those of functions with Hölderian growth). A counterexample shows that the equivalence is no longer true for extremely flat functions. This fact reveals the relevance of an approach based on KL inequality. In a second stage, we show how KL inequalities can in turn be employed to compute new complexity bounds for a wealth of descent methods for convex problems. Our approach is completely original and makes use of a one-dimensional worst-case proximal sequence in the spirit of the famous majorant method of Kantorovich. Our result applies to a very simple abstract scheme that covers a wide class of descent methods. As a byproduct of our study, we also provide new results for the globalization of KL inequalities in the convex framework. Our main results inaugurate a simple methodology: derive an error bound, compute the desingularizing function whenever possible, identify essential constants in the descent method and finally compute the complexity using the one-dimensional worst case proximal sequence. Our method is illustrated through projection methods for feasibility problems, and through the famous iterative shrinkage thresholding algorithm (ISTA), for which we show that the complexity bound is of the form $O(q^{k})$ where the constituents of the bound only depend on error bound constants obtained for an arbitrary least squares objective with $\ell^1$ regularization.

preprint2015arXiv

Majorization-minimization procedures and convergence of SQP methods for semi-algebraic and tame programs

In view of solving nonsmooth and nonconvex problems involving complex constraints (like standard NLP problems), we study general maximization-minimization procedures produced by families of strongly convex sub-problems. Using techniques from semi-algebraic geometry and variational analysis -in particular Lojasiewicz inequality- we establish the convergence of sequences generated by this type of schemes to critical points. The broad applicability of this process is illustrated in the context of NLP. In that case critical points coincide with KKT points. When the data are semi-algebraic or real analytic our method applies (for instance) to the study of various SQP methods: the moving balls method, Sl1QP, ESQP. Under standard qualification conditions, this provides -to the best of our knowledge- the first general convergence results for general nonlinear programming problems. We emphasize the fact that, unlike most works on this subject, no second-order assumption and/or convexity assumptions whatsoever are made. Rate of convergence are shown to be of the same form as those commonly encountered with first order methods.

preprint2013arXiv

Definable zero-sum stochastic games

Definable zero-sum stochastic games involve a finite number of states and action sets, reward and transition functions that are definable in an o-minimal structure. Prominent examples of such games are finite, semi-algebraic or globally subanalytic stochastic games. We prove that the Shapley operator of any definable stochastic game with separable transition and reward functions is definable in the same structure. Definability in the same structure does not hold systematically: we provide a counterexample of a stochastic game with semi-algebraic data yielding a non semi-algebraic but globally subanalytic Shapley operator. Our definability results on Shapley operators are used to prove that any separable definable game has a uniform value; in the case of polynomially bounded structures we also provide convergence rates. Using an approximation procedure, we actually establish that general zero-sum games with separable definable transition functions have a uniform value. These results highlight the key role played by the tame structure of transition functions. As particular cases of our main results, we obtain that stochastic games with polynomial transitions, definable games with finite actions on one side, definable games with perfect information or switching controls have a uniform value. Applications to nonlinear maps arising in risk sensitive control and Perron-Frobenius theory are also given

Jérôme Bolte

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Automatic differentiation of nonsmooth iterative algorithms

Nonsmooth Implicit Differentiation for Machine Learning and Optimization

Swarm gradient dynamics for global optimization: the density case

The Iterates of the Frank-Wolfe Algorithm May Not Converge

Optimal Complexity and Certification of Bregman First-Order Methods

Quartic First-Order Methods for Low-Rank Minimization

Second-order step-size tuning of SGD for non-convex optimization

A Hölderian backtracking method for min-max and min-min problems

Conservative set valued fields, automatic differentiation, stochastic gradient method and deep learning

From error bounds to the complexity of first-order descent methods for convex functions

Majorization-minimization procedures and convergence of SQP methods for semi-algebraic and tame programs

Definable zero-sum stochastic games