Source author record

Romuald Elie

Romuald Elie appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.PR Machine Learning math.OC Computer Science and Game Theory Multiagent Systems Artificial Intelligence physics.soc-ph Populations and Evolution q-fin.EC econ.GN econ.TH eess.SY math.AP q-fin.CP q-fin.PR q-fin.RM Systems and Control

Catalog footprint

What is connected

20works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MIND: Monge Inception Distance for Generative Models Evaluation

We propose the Monge Inception Distance (MIND), a metric for evaluating generative models that addresses key limitations of the widely adopted Fréchet Inception Distance (FID). The MIND metric leverages the sliced Wasserstein distance to compare distributions by averaging one-dimensional optimal transport distances, efficiently computed via sorting. This approach circumvents the estimation of high-dimensional means and covariance matrices, which underlie FID's poor sample complexity and vulnerability to adversarial attacks. We empirically demonstrate three primary advantages: (i) it is more sample-efficient by one order of magnitude, (ii) it is faster to compute by two orders of magnitude, (iii) it is more robust to adversarial attacks such as moment-matching. We show that MIND with 5k samples can replace the evaluation performance of FID with 50k samples, providing high correlation with this standard benchmark and superior discriminative performance. We further demonstrate that even smaller sample sizes (e.g., 1k or 2k) remain highly informative for rapid model iteration.

preprint2022arXiv

Concave Utility Reinforcement Learning: the Mean-Field Game Viewpoint

Concave Utility Reinforcement Learning (CURL) extends RL from linear to concave utilities in the occupancy measure induced by the agent's policy. This encompasses not only RL but also imitation learning and exploration, among others. Yet, this more general paradigm invalidates the classical Bellman equations, and calls for new algorithms. Mean-field Games (MFGs) are a continuous approximation of many-agent RL. They consider the limit case of a continuous distribution of identical agents, anonymous with symmetric interests, and reduce the problem to the study of a single representative agent in interaction with the full population. Our core contribution consists in showing that CURL is a subclass of MFGs. We think this important to bridge together both communities. It also allows to shed light on aspects of both fields: we show the equivalence between concavity in CURL and monotonicity in the associated MFG, between optimality conditions in CURL and Nash equilibrium in MFG, or that Fictitious Play (FP) for this class of MFGs is simply Frank-Wolfe, bringing the first convergence rate for discrete-time FP for MFGs. We also experimentally demonstrate that, using algorithms recently introduced for solving MFGs, we can address the CURL problem more efficiently.

preprint2022arXiv

Learning Correlated Equilibria in Mean-Field Games

The designs of many large-scale systems today, from traffic routing environments to smart grids, rely on game-theoretic equilibrium concepts. However, as the size of an $N$-player game typically grows exponentially with $N$, standard game theoretic analysis becomes effectively infeasible beyond a low number of players. Recent approaches have gone around this limitation by instead considering Mean-Field games, an approximation of anonymous $N$-player games, where the number of players is infinite and the population's state distribution, instead of every individual player's state, is the object of interest. The practical computability of Mean-Field Nash equilibria, the most studied Mean-Field equilibrium to date, however, typically depends on beneficial non-generic structural properties such as monotonicity or contraction properties, which are required for known algorithms to converge. In this work, we provide an alternative route for studying Mean-Field games, by developing the concepts of Mean-Field correlated and coarse-correlated equilibria. We show that they can be efficiently learnt in \emph{all games}, without requiring any additional assumption on the structure of the game, using three classical algorithms. Furthermore, we establish correspondences between our notions and those already present in the literature, derive optimality bounds for the Mean-Field - $N$-player transition, and empirically demonstrate the convergence of these algorithms on simple games.

preprint2022arXiv

Learning Equilibria in Mean-Field Games: Introducing Mean-Field PSRO

Recent advances in multiagent learning have seen the introduction ofa family of algorithms that revolve around the population-based trainingmethod PSRO, showing convergence to Nash, correlated and coarse corre-lated equilibria. Notably, when the number of agents increases, learningbest-responses becomes exponentially more difficult, and as such ham-pers PSRO training methods. The paradigm of mean-field games pro-vides an asymptotic solution to this problem when the considered gamesare anonymous-symmetric. Unfortunately, the mean-field approximationintroduces non-linearities which prevent a straightforward adaptation ofPSRO. Building upon optimization and adversarial regret minimization,this paper sidesteps this issue and introduces mean-field PSRO, an adap-tation of PSRO which learns Nash, coarse correlated and correlated equi-libria in mean-field games. The key is to replace the exact distributioncomputation step by newly-defined mean-field no-adversarial-regret learn-ers, or by black-box optimization. We compare the asymptotic complexityof the approach to standard PSRO, greatly improve empirical bandit con-vergence speed by compressing temporal mixture weights, and ensure itis theoretically robust to payoff noise. Finally, we illustrate the speed andaccuracy of mean-field PSRO on several mean-field games, demonstratingconvergence to strong and weak equilibria.

preprint2022arXiv

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes). Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome. Episodes are long, with often hundreds of moves before a player wins, and situations in Stratego can not easily be broken down into manageably-sized sub-problems as in poker. For these reasons, Stratego has been a grand challenge for the field of AI for decades, and existing AI methods barely reach an amateur level of play. DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego via self-play. The Regularised Nash Dynamics (R-NaD) algorithm, a key component of DeepNash, converges to an approximate Nash equilibrium, instead of 'cycling' around it, by directly modifying the underlying multi-agent learning dynamics. DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform, competing with human expert players.

preprint2021arXiv

Scaling up Mean Field Games with Online Mirror Descent

We address scaling up equilibrium computation in Mean Field Games (MFGs) using Online Mirror Descent (OMD). We show that continuous-time OMD provably converges to a Nash equilibrium under a natural and well-motivated set of monotonicity assumptions. This theoretical result nicely extends to multi-population games and to settings involving common noise. A thorough experimental investigation on various single and multi-population MFGs shows that OMD outperforms traditional algorithms such as Fictitious Play (FP). We empirically show that OMD scales up and converges significantly faster than FP by solving, for the first time to our knowledge, examples of MFGs with hundreds of billions states. This study establishes the state-of-the-art for learning in large-scale multi-agent and multi-population games.

preprint2020arXiv

Contact rate epidemic control of COVID-19: an equilibrium view

We consider the control of the COVID-19 pandemic through a standard SIR compartmental model. This control is induced by the aggregation of individuals' decisions to limit their social interactions: when the epidemic is ongoing, an individual can diminish his/her contact rate in order to avoid getting infected, but this effort comes at a social cost. If each individual lowers his/her contact rate, the epidemic vanishes faster, but the effort cost may be high. A Mean Field Nash equilibrium at the population level is formed, resulting in a lower effective transmission rate of the virus. We prove theoretically that equilibrium exists and compute it numerically. However, this equilibrium selects a sub-optimal solution in comparison to the societal optimum (a centralized decision respected fully by all individuals), meaning that the cost of anarchy is strictly positive. We provide numerical examples and a sensitivity analysis, as well as an extension to a SEIR compartmental model to account for the relatively long latent phase of the COVID-19 disease. In all the scenarii considered, the divergence between the individual and societal strategies happens both before the peak of the epidemic, due to individuals' fears, and after, when a significant propagation is still underway.

preprint2020arXiv

COVID-19 pandemic control: balancing detection policy and lockdown intervention under ICU sustainability

We consider here an extended SIR model, including several features of the recent COVID-19 outbreak: in particular the infected and recovered individuals can either be detected (+) or undetected (-) and we also integrate an intensive care unit (ICU) capacity. Our model enables a tractable quantitative analysis of the optimal policy for the control of the epidemic dynamics using both lockdown and detection intervention levers. With parametric specification based on literature on COVID-19, we investigate the sensitivities of various quantities on the optimal strategies, taking into account the subtle trade-off between the sanitary and the socio-economic cost of the pandemic, together with the limited capacity level of ICU. We identify the optimal lockdown policy as an intervention structured in 4 successive phases: First a quick and strong lockdown intervention to stop the exponential growth of the contagion; second a short transition phase to reduce the prevalence of the virus; third a long period with full ICU capacity and stable virus prevalence; finally a return to normal social interactions with disappearance of the virus. The optimal scenario hereby avoids the second wave of infection, provided the lockdown is released sufficiently slowly. We also provide optimal intervention measures with increasing ICU capacity, as well as optimization over the effort on detection of infectious and immune individuals. Whenever massive resources are introduced to detect infected individuals, the pressure on social distancing can be released, whereas the impact of detection of immune individuals reveals to be more moderate.

preprint2020arXiv

Mean-field moral hazard for optimal energy demand response management

We study the problem of demand response contracts in electricity markets by quantifying the impact of considering a mean-field of consumers, whose consumption is impacted by a common noise. We formulate the problem as a Principal-Agent problem with moral hazard in which the Principal - she - is an electricity producer who observes continuously the consumption of a continuum of risk-averse consumers, and designs contracts in order to reduce her production costs. More precisely, the producer incentivises the consumers to reduce the average and the volatility of their consumption in different usages, without observing the efforts they make. We prove that the producer can benefit from considering the mean-field of consumers by indexing contracts on the consumption of one Agent and aggregate consumption statistics from the distribution of the entire population of consumers. In the case of linear energy valuation, we provide closed-form expression for this new type of optimal contracts that maximises the utility of the producer. In most cases, we show that this new type of contracts allows the Principal to choose the risks she wants to bear, and to reduce the problem at hand to an uncorrelated one.

preprint2020arXiv

On the Convergence of Model Free Learning in Mean Field Games

Learning by experience in Multi-Agent Systems (MAS) is a difficult and exciting task, due to the lack of stationarity of the environment, whose dynamics evolves as the population learns. In order to design scalable algorithms for systems with a large population of interacting agents (e.g. swarms), this paper focuses on Mean Field MAS, where the number of agents is asymptotically infinite. Recently, a very active burgeoning field studies the effects of diverse reinforcement learning algorithms for agents with no prior information on a stationary Mean Field Game (MFG) and learn their policy through repeated experience. We adopt a high perspective on this problem and analyze in full generality the convergence of a fictitious iterative scheme using any single agent learning algorithm at each step. We quantify the quality of the computed approximate Nash equilibrium, in terms of the accumulated errors arising at each learning iteration step. Notably, we show for the first time convergence of model free learning algorithms towards non-stationary MFG equilibria, relying only on classical assumptions on the MFG dynamics. We illustrate our theoretical results with a numerical experiment in a continuous action-space environment, where the approximate best response of the iterative fictitious play scheme is computed with a deep RL algorithm.

preprint2020arXiv

Reinforcement Learning in Economics and Finance

Reinforcement learning algorithms describe how an agent can learn an optimal action policy in a sequential decision process, through repeated experience. In a given environment, the agent policy provides him some running and terminal rewards. As in online learning, the agent learns sequentially. As in multi-armed bandit problems, when an agent picks an action, he can not infer ex-post the rewards induced by other action choices. In reinforcement learning, his actions have consequences: they influence not only rewards, but also future states of the world. The goal of reinforcement learning is to find an optimal policy -- a mapping from the states of the world to the set of actions, in order to maximize cumulative reward, which is a long term strategy. Exploring might be sub-optimal on a short-term horizon but could lead to optimal long-term ones. Many problems of optimal control, popular in economics for more than forty years, can be expressed in the reinforcement learning framework, and recent advances in computational science, provided in particular by deep learning algorithms, can be used by economists in order to solve complex behavioral problems. In this article, we propose a state-of-the-art of reinforcement learning techniques, and present applications in economics, game theory, operation research and finance.

preprint2016arXiv

BSDEs with mean reflection

In this paper, we study a new type of BSDE, where the distribution of the Y-component of the solution is required to satisfy an additional constraint, written in terms of the expectation of a loss function. This constraint is imposed at any deterministic time t and is typically weaker than the classical pointwise one associated to reflected BSDEs. Focusing on solutions (Y, Z, K) with deterministic K, we obtain the well-posedness of such equation, in the presence of a natural Skorokhod type condition. Such condition indeed ensures the minimality of the enhanced solution, under an additional structural condition on the driver. Our results extend to the more general framework where the constraint is written in terms of a static risk measure on Y. In particular, we provide an application to the super hedging of claims under running risk management constraint.

preprint2016arXiv

Contracting theory with competitive interacting agents

In a framework close to the one developed by Holmström and Milgrom [44], we study the optimal contracting scheme between a Principal and several Agents. Each hired Agent is in charge of one project, and can make efforts towards managing his own project, as well as impact (positively or negatively) the projects of the other Agents. Considering economic Agents in competition with relative performance concerns, we derive the optimal contracts in both first best and moral hazard settings. The enhanced resolution methodology relies heavily on the connection between Nash equilibria and multidimensional quadratic BSDEs. The optimal contracts are linear and each agent is paid a fixed proportion of the terminal value of all the projects of the firm. Besides, each Agent receives his reservation utility, and those with high competitive appetence are assigned less volatile projects, and shall even receive help from the other Agents. From the principal point of view, it is in the firm interest in our model to strongly diversify the competitive appetence of the Agents.

preprint2014arXiv

BSDEs with weak terminal condition

We introduce a new class of Backward Stochastic Differential Equations in which the $T$-terminal value $Y_{T}$ of the solution $(Y,Z)$ is not fixed as a random variable, but only satisfies a weak constraint of the form $E[Ψ(Y_{T})]\ge m$, for some (possibly random) non-decreasing map $Ψ$ and some threshold $m$. We name them \textit{BSDEs with weak terminal condition} and obtain a representation of the minimal time $t$-values $Y_{t}$ such that $(Y,Z)$ is a supersolution of the BSDE with weak terminal condition. It provides a non-Markovian BSDE formulation of the PDE characterization obtained for Markovian stochastic target problems under controlled loss in Bouchard, Elie and Touzi \cite{BoElTo09}. We then study the main properties of this minimal value. In particular, we analyze its continuity and convexity with respect to the $m$-parameter appearing in the weak terminal condition, and show how it can be related to a dual optimal control problem in Meyer form. These last properties generalize to a non Markovian framework previous results on quantile hedging and hedging under loss constraints obtained in Föllmer and Leukert \cite{FoLe99,FoLe00}, and in Bouchard, Elie and Touzi \cite{BoElTo09}.

preprint2014arXiv

Regularity of BSDEs with a convex constraint on the gains-process

We consider the minimal super-solution of a backward stochastic differential equation with constraint on the gains-process. The terminal condition is given by a function of the terminal value of a forward stochastic differential equation. Under boundedness assumptions on the coefficients, we show that the first component of the solution is Lipschitz in space and 1/2-Hölder in time with respect to the initial data of the forward process. Its path is continuous before the time horizon at which its left-limit is given by a face-lifted version of its natural boundary condition. This first component is actually equal to its own face-lift. We only use probabilistic arguments. In particular, our results can be extended to certain non-Markovian settings.

preprint2013arXiv

On the expectation of normalized Brownian functionals up to first hitting times

Let B be a Brownian motion and T its first hitting time of the level 1. For U a uniform random variable independent of B, we study in depth the distribution of T^{-1/2}B_{UT}, that is the rescaled Brownian motion sampled at uniform time. In particular, we show that this variable is centered.

preprint2013arXiv

When terminal facelift enforces Delta constraints

This paper deals with the super-replication of non path-dependent European claims under additional convex constraints on the number of shares held in the portfolio. The corresponding super-replication price of a given claim has been widely studied in the literature and its terminal value, which dominates the claim of interest, is the so-called facelift transform of the claim. We investigate under which conditions the super-replication price and strategy of a large class of claims coincide with the exact replication price and strategy of the facelift transform of this claim. In one dimension, we observe that this property is satisfied for any local volatility model. In any dimension, we exhibit an analytical necessary and sufficient condition for this property, which combines the dynamics of the stock together with the characteristics of the closed convex set of constraints. To obtain this condition, we introduce the notion of first order viability property for linear parabolic PDEs. We investigate in details several practical cases of interest: multidimensional Black Scholes model, non-tradable assets or short selling restrictions.

preprint2012arXiv

Discrete-time approximation of multidimensional BSDEs with oblique reflections

In this paper, we study the discrete-time approximation of multidimensional reflected BSDEs of the type of those presented by Hu and Tang [Probab. Theory Related Fields 147 (2010) 89-121] and generalized by Hamadène and Zhang [Stochastic Process. Appl. 120 (2010) 403-426]. In comparison to the penalizing approach followed by Hamadène and Jeanblanc [Math. Oper. Res. 32 (2007) 182-192] or Elie and Kharroubi [Statist. Probab. Lett. 80 (2010) 1388-1396], we study a more natural scheme based on oblique projections. We provide a control on the error of the algorithm by introducing and studying the notion of multidimensional discretely reflected BSDE. In the particular case where the driver does not depend on the variable $Z$, the error on the grid points is of order $1/2-\varepsilon$, $\varepsilon>0$.

preprint2011arXiv

Adding constraints to BSDEs with Jumps: an alternative to multidimensional reflections

This paper is dedicated to the analysis of backward stochastic differential equations (BSDEs) with jumps, subject to an additional global constraint involving all the components of the solution. We study the existence and uniqueness of a minimal solution for these so-called constrained BSDEs with jumps via a penalization procedure. This new type of BSDE offers a nice and practical unifying framework to the notions of constrained BSDEs presented in [19] and BSDEs with constrained jumps introduced in [14]. More remarkably, the solution of a multidimensional Brownian reflected BSDE studied in [11] and [13] can also be represented via a well chosen one-dimensional constrained BSDE with jumps.This last result is very promising from a numerical point of view for the resolution of high dimensional optimal switching problems and more generally for systems of coupled variational inequalities

preprint2011arXiv

Probabilistic Representation and Approximation for Coupled Systems of Variational Inequalities

Our study is dedicated to the probabilistic representation and numerical approximation of solutions to coupled systems of variational inequalities. The dynamics of each component of the solution is driven by a different linear parabolic operator and suffers a non-linear dependence in all the components of the solution. This dynamics is combined with a global structural constraint between all the components of the solution including the practical example of optimal switching problems. In this paper, we interpret the unique viscosity solution to this type of coupled systems of variational inequalities as the solution to one-dimensional constrained BSDEs with jumps introduced recently in [6]. In the spirit of [3], this new representation allows for the introduction of a natural entirely probabilistic numerical scheme for the resolution of these systems.

Romuald Elie

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

MIND: Monge Inception Distance for Generative Models Evaluation

Concave Utility Reinforcement Learning: the Mean-Field Game Viewpoint

Learning Correlated Equilibria in Mean-Field Games

Learning Equilibria in Mean-Field Games: Introducing Mean-Field PSRO

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Scaling up Mean Field Games with Online Mirror Descent

Contact rate epidemic control of COVID-19: an equilibrium view

COVID-19 pandemic control: balancing detection policy and lockdown intervention under ICU sustainability

Mean-field moral hazard for optimal energy demand response management

On the Convergence of Model Free Learning in Mean Field Games

Reinforcement Learning in Economics and Finance

BSDEs with mean reflection

Contracting theory with competitive interacting agents

BSDEs with weak terminal condition

Regularity of BSDEs with a convex constraint on the gains-process

On the expectation of normalized Brownian functionals up to first hitting times

When terminal facelift enforces Delta constraints

Discrete-time approximation of multidimensional BSDEs with oblique reflections

Adding constraints to BSDEs with Jumps: an alternative to multidimensional reflections

Probabilistic Representation and Approximation for Coupled Systems of Variational Inequalities