Researcher profile

Jérôme Malick

Jérôme Malick contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

$\texttt{skwdro}$: a library for Wasserstein distributionally robust machine learning

We present skwdro, a Python library for training robust machine learning models. The library is based on distributionally robust optimization using Wasserstein distances, popular in optimal transport and machine learnings. The goal of the library is to make the training of robust models easier for a wide audience by proposing a wrapper for PyTorch modules, enabling model loss' robustification with minimal code changes. It comes along with scikit-learn compatible estimators for some popular objectives. The core of the implementation relies on an entropic smoothing of the original robust objective, in order to ensure maximal model flexibility. The library is available at https://github.com/iutzeler/skwdro and the documentation at https://skwdro.readthedocs.io.

preprint2022arXiv

Multi-Agent Online Optimization with Delays: Asynchronicity, Adaptivity, and Optimism

In this paper, we provide a general framework for studying multi-agent online learning problems in the presence of delays and asynchronicities. Specifically, we propose and analyze a class of adaptive dual averaging schemes in which agents only need to accumulate gradient feedback received from the whole system, without requiring any between-agent coordination. In the single-agent case, the adaptivity of the proposed method allows us to extend a range of existing results to problems with potentially unbounded delays between playing an action and receiving the corresponding feedback. In the multi-agent case, the situation is significantly more complicated because agents may not have access to a global clock to use as a reference point; to overcome this, we focus on the information that is available for producing each prediction rather than the actual delay associated with each feedback. This allows us to derive adaptive learning strategies with optimal regret bounds, even in a fully decentralized, asynchronous environment. Finally, we also analyze an "optimistic" variant of the proposed algorithm which is capable of exploiting the predictability of problems with a slower variation and leads to improved regret bounds.

preprint2022arXiv

Superquantile-based learning: a direct approach using gradient-based optimization

We consider a formulation of supervised learning that endows models with robustness to distributional shifts from training to testing. The formulation hinges upon the superquantile risk measure, also known as the conditional value-at-risk, which has shown promise in recent applications of machine learning and signal processing. We show that, thanks to a direct smoothing of the superquantile function, a superquantile-based learning objective is amenable to gradient-based optimization, using batch optimization algorithms such as gradient descent or quasi-Newton algorithms, or using stochastic optimization algorithms such as stochastic gradient algorithms. A companion software SPQR implements in Python the algorithms described and allows practitioners to experiment with superquantile-based supervised learning.

preprint2022arXiv

Superquantiles at Work: Machine Learning Applications and Efficient Subgradient Computation

R. Tyrell Rockafellar and collaborators introduced, in a series of works, new regression modeling methods based on the notion of superquantile (or conditional value-at-risk). These methods have been influential in economics, finance, management science, and operations research in general. Recently, they have been the subject of a renewed interest in machine learning, to address issues of distributional robustness and fair allocation. In this paper, we review some of these new applications of the superquantile, with references to recent developments. These applications involve nonsmooth superquantile-based objective functions that admit explicit subgradient calculations. To make these superquantile-based functions amenable to the gradient-based algorithms popular in machine learning, we show how to smooth them by infimal convolution and describe numerical procedures to compute the gradients of the smooth approximations. We put the approach into perspective by comparing it to other smoothing techniques and by illustrating it on toy examples.

preprint2021arXiv

On the Convexity of Level-sets of Probability Functions

In decision-making problems under uncertainty, probabilistic constraints are a valuable tool to express safety of decisions. They result from taking the probability measure of a given set of random inequalities depending on the decision vector. Even if the original set of inequalities is convex, this favourable property is not immediately transferred to the probabilistically constrained feasible set and may in particular depend on the chosen safety level. In this paper, we provide results guaranteeing the convexity of feasible sets to probabilistic constraints when the safety level is greater than a computable threshold. Our results extend all the existing ones and also cover the case where decision vectors belong to Banach spaces. The key idea in our approach is to reveal the level of underlying convexity in the nominal problem data (e.g., concavity of the probability function) by auxiliary transforming functions. We provide several examples illustrating our theoretical developments.

preprint2020arXiv

Distributed Learning with Sparse Communications by Identification

In distributed optimization for large-scale learning, a major performance limitation comes from the communications between the different entities. When computations are performed by workers on local data while a coordinator machine coordinates their updates to minimize a global loss, we present an asynchronous optimization algorithm that efficiently reduces the communications between the coordinator and workers. This reduction comes from a random sparsification of the local updates. We show that this algorithm converges linearly in the strongly convex case and also identifies optimal strongly sparse solutions. We further exploit this identification to propose an automatic dimension reduction, aptly sparsifying all exchanges between coordinator and workers.

preprint2020arXiv

On the convergence of single-call stochastic extra-gradient methods

Variational inequalities have recently attracted considerable interest in machine learning as a flexible paradigm for models that go beyond ordinary loss function minimization (such as generative adversarial networks and related deep learning systems). In this setting, the optimal $\mathcal{O}(1/t)$ convergence rate for solving smooth monotone variational inequalities is achieved by the Extra-Gradient (EG) algorithm and its variants. Aiming to alleviate the cost of an extra gradient step per iteration (which can become quite substantial in deep learning applications), several algorithms have been proposed as surrogates to Extra-Gradient with a \emph{single} oracle call per iteration. In this paper, we develop a synthetic view of such algorithms, and we complement the existing literature by showing that they retain a $\mathcal{O}(1/t)$ ergodic convergence rate in smooth, deterministic problems. Subsequently, beyond the monotone deterministic case, we also show that the last iterate of single-call, \emph{stochastic} extra-gradient methods still enjoys a $\mathcal{O}(1/t)$ local convergence rate to solutions of \emph{non-monotone} variational inequalities that satisfy a second-order sufficient condition.

preprint2020arXiv

Proximal Gradient methods with Adaptive Subspace Sampling

Many applications in machine learning or signal processing involve nonsmooth optimization problems. This nonsmoothness brings a low-dimensional structure to the optimal solutions. In this paper, we propose a randomized proximal gradient method harnessing this underlying structure. We introduce two key components: i) a random subspace proximal gradient algorithm; ii) an identification-based sampling of the subspaces. Their interplay brings a significant performance improvement on typical learning problems in terms of dimensions explored.