Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
15works
0followers
15topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

15 published item(s)

preprint2026arXiv

Population-Aware Imitation Learning in Mean-field Games with Common Noise

Mean Field Games (MFGs) provide a powerful framework for modeling the collective behavior of large populations of interacting agents. In this paper, we address the problem of Imitation Learning (IL) in MFGs subject to common noise, where the population distribution evolves stochastically. This stochasticity compels agents to adopt population-aware policies to respond to aggregate shocks. We formulate two distinct learning objectives: recovering a Nash equilibrium and maximizing performance against an expert population. We investigate two imitation proxies: Behavioral Cloning (BC) and Adversarial (ADV) divergence. We then establish finite-sample error bounds showing that minimizing these proxies effectively controls both the policy's exploitability and its performance gap relative to the expert. Furthermore, we propose a numerical framework using generalized Fictitious Play and Deep Learning to compute expert population-aware policies. Through experiments on three environments we demonstrate that standard population-unaware policies fail to capture the equilibrium dynamics. Our results highlight that learning population-aware policies is crucial to avoid being misled by the randomness inherent in common noise.

preprint2026arXiv

Scalable Method for Mean Field Control with Kernel Interactions via Random Fourier Features

We develop a scalable algorithm for mean field control problems with kernel interactions by combining particle system simulations with random Fourier feature approximations. The method replaces the quadratic-cost kernel evaluations by linear-time estimates, enabling efficient stochastic gradient descent for training feedback controls in large populations. We provide theoretical complexity bounds and demonstrate through crowd motion and flocking examples that the approach preserves control performance while substantially reducing computational cost. The results indicate that random feature approximations offer an effective and practical tool for high dimensional and large scale mean field control.

preprint2025arXiv

Discrete-Time Mean Field Type Games: Probabilistic Setup

We introduce a general probabilistic framework for discrete-time, infinite-horizon discounted Mean Field Type Games (MFTGs) with both global common noise and team-specific common noises. In our model, agents are allowed to use randomized actions, both at the individual level and at the team level. We formalize the concept of Mean Field Markov Games (MFMGs) and establish a connection between closed-loop policies in MFTGs and Markov policies in MFMGs through different layers of randomization. By leveraging recent results on infinite-horizon discounted games with infinite compact state-action spaces, we prove the existence of an optimal closed-loop policy for the original MFTG when the state spaces are at most countable and the action spaces are general Polish spaces. We also present an example satisfying our assumptions, called Mean Field Drift of Intentions, where the dynamics are strongly randomized, and we establish the existence of a Nash equilibrium using our theoretical results.

preprint2022arXiv

Concave Utility Reinforcement Learning: the Mean-Field Game Viewpoint

Concave Utility Reinforcement Learning (CURL) extends RL from linear to concave utilities in the occupancy measure induced by the agent's policy. This encompasses not only RL but also imitation learning and exploration, among others. Yet, this more general paradigm invalidates the classical Bellman equations, and calls for new algorithms. Mean-field Games (MFGs) are a continuous approximation of many-agent RL. They consider the limit case of a continuous distribution of identical agents, anonymous with symmetric interests, and reduce the problem to the study of a single representative agent in interaction with the full population. Our core contribution consists in showing that CURL is a subclass of MFGs. We think this important to bridge together both communities. It also allows to shed light on aspects of both fields: we show the equivalence between concavity in CURL and monotonicity in the associated MFG, between optimality conditions in CURL and Nash equilibrium in MFG, or that Fictitious Play (FP) for this class of MFGs is simply Frank-Wolfe, bringing the first convergence rate for discrete-time FP for MFGs. We also experimentally demonstrate that, using algorithms recently introduced for solving MFGs, we can address the CURL problem more efficiently.

preprint2022arXiv

DeepSets and their derivative networks for solving symmetric PDEs

Machine learning methods for solving nonlinear partial differential equations (PDEs) are hot topical issues, and different algorithms proposed in the literature show efficient numerical approximation in high dimension. In this paper, we introduce a class of PDEs that are invariant to permutations, and called symmetric PDEs. Such problems are widespread, ranging from cosmology to quantum mechanics, and option pricing/hedging in multi-asset market with exchangeable payoff. Our main application comes actually from the particles approximation of mean-field control problems. We design deep learning algorithms based on certain types of neural networks, named PointNet and DeepSet (and their associated derivative networks), for computing simultaneously an approximation of the solution and its gradient to symmetric PDEs. We illustrate the performance and accuracy of the PointNet/DeepSet networks compared to classical feedforward ones, and provide several numerical results of our algorithm for the examples of a mean-field systemic risk, mean-variance problem and a min/max linear quadratic McKean-Vlasov control problem.

preprint2022arXiv

Reinforcement Learning for Intra-and-Inter-Bank Borrowing and Lending Mean Field Control Game

We propose a mean field control game model for the intra-and-inter-bank borrowing and lending problem. This framework allows to study the competitive game arising between groups of collaborative banks. The solution is provided in terms of an asymptotic Nash equilibrium between the groups in the infinite horizon. A three-timescale reinforcement learning algorithm is applied to learn the optimal borrowing and lending strategy in a data driven way when the model is unknown. An empirical numerical analysis shows the importance of the three-timescale, the impact of the exploration strategy when the model is unknown, and the convergence of the algorithm.

preprint2022arXiv

Scalable Deep Reinforcement Learning Algorithms for Mean Field Games

Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents. Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods. One limiting factor to further scale up using RL is that existing algorithms to solve MFGs require the mixing of approximated quantities such as strategies or $q$-values. This is far from being trivial in the case of non-linear function approximation that enjoy good generalization properties, e.g. neural networks. We propose two methods to address this shortcoming. The first one learns a mixed strategy from distillation of historical data into a neural network and is applied to the Fictitious Play algorithm. The second one is an online mixing method based on regularization that does not require memorizing historical data or previous estimates. It is used to extend Online Mirror Descent. We demonstrate numerically that these methods efficiently enable the use of Deep RL algorithms to solve various MFGs. In addition, we show that these methods outperform SotA baselines from the literature.

preprint2021arXiv

Learning a functional control for high-frequency finance

We use a deep neural network to generate controllers for optimal trading on high frequency data. For the first time, a neural network learns the mapping between the preferences of the trader, i.e. risk aversion parameters, and the optimal controls. An important challenge in learning this mapping is that in intraday trading, trader's actions influence price dynamics in closed loop via the market impact. The exploration--exploitation tradeoff generated by the efficient execution is addressed by tuning the trader's preferences to ensure long enough trajectories are produced during the learning phase. The issue of scarcity of financial data is solved by transfer learning: the neural network is first trained on trajectories generated thanks to a Monte-Carlo scheme, leading to a good initialization before training on historical trajectories. Moreover, to answer to genuine requests of financial regulators on the explainability of machine learning generated controls, we project the obtained "blackbox controls" on the space usually spanned by the closed-form solution of the stylized optimal trading problem, leading to a transparent structure. For more realistic loss functions that have no closed-form solution, we show that the average distance between the generated controls and their explainable version remains small. This opens the door to the acceptance of ML-generated controls by financial regulators.

preprint2021arXiv

Scaling up Mean Field Games with Online Mirror Descent

We address scaling up equilibrium computation in Mean Field Games (MFGs) using Online Mirror Descent (OMD). We show that continuous-time OMD provably converges to a Nash equilibrium under a natural and well-motivated set of monotonicity assumptions. This theoretical result nicely extends to multi-population games and to settings involving common noise. A thorough experimental investigation on various single and multi-population MFGs shows that OMD outperforms traditional algorithms such as Fictitious Play (FP). We empirically show that OMD scales up and converges significantly faster than FP by solving, for the first time to our knowledge, examples of MFGs with hundreds of billions states. This study establishes the state-of-the-art for learning in large-scale multi-agent and multi-population games.

preprint2020arXiv

COVID-19 pandemic control: balancing detection policy and lockdown intervention under ICU sustainability

We consider here an extended SIR model, including several features of the recent COVID-19 outbreak: in particular the infected and recovered individuals can either be detected (+) or undetected (-) and we also integrate an intensive care unit (ICU) capacity. Our model enables a tractable quantitative analysis of the optimal policy for the control of the epidemic dynamics using both lockdown and detection intervention levers. With parametric specification based on literature on COVID-19, we investigate the sensitivities of various quantities on the optimal strategies, taking into account the subtle trade-off between the sanitary and the socio-economic cost of the pandemic, together with the limited capacity level of ICU. We identify the optimal lockdown policy as an intervention structured in 4 successive phases: First a quick and strong lockdown intervention to stop the exponential growth of the contagion; second a short transition phase to reduce the prevalence of the virus; third a long period with full ICU capacity and stable virus prevalence; finally a return to normal social interactions with disappearance of the virus. The optimal scenario hereby avoids the second wave of infection, provided the lockdown is released sufficiently slowly. We also provide optimal intervention measures with increasing ICU capacity, as well as optimization over the effort on detection of infectious and immune individuals. Whenever massive resources are introduced to detect infected individuals, the pressure on social distancing can be released, whereas the impact of detection of immune individuals reveals to be more moderate.

preprint2020arXiv

Large Banking Systems with Default and Recovery: A Mean Field Game Model

We consider a mean-field model for large banking systems, which takes into account default and recovery of the institutions. Building on models used for groups of interacting neurons, we first study a McKean-Vlasov dynamics and its evolutionary Fokker-Planck equation in which the mean-field interactions occur through a mean-reverting term and through a hitting time corresponding to a default level. The latter feature reflects the impact of a financial institution's default on the global distribution of reserves in the banking system. The systemic risk problem of financial institutions is understood as a blow-up phenomenon of the Fokker-Planck equation. Then, we incorporate in the model an optimization component by letting the institutions control part of their dynamics in order to minimize their expected risk. Phrasing this optimization problem as a mean-field game, we provide an explicit solution in a special case and, in the general case, we report numerical experiments based on a finite difference scheme.

preprint2020arXiv

Linear-Quadratic Zero-Sum Mean-Field Type Games: Optimality Conditions and Policy Optimization

In this paper, zero-sum mean-field type games (ZSMFTG) with linear dynamics and quadratic cost are studied under infinite-horizon discounted utility function. ZSMFTG are a class of games in which two decision makers whose utilities sum to zero, compete to influence a large population of indistinguishable agents. In particular, the case in which the transition and utility functions depend on the state, the action of the controllers, and the mean of the state and the actions, is investigated. The optimality conditions of the game are analysed for both open-loop and closed-loop controls, and explicit expressions for the Nash equilibrium strategies are derived. Moreover, two policy optimization methods that rely on policy gradient are proposed for both model-based and sample-based frameworks. In the model-based case, the gradients are computed exactly using the model, whereas they are estimated using Monte-Carlo simulations in the sample-based case. Numerical experiments are conducted to show the convergence of the utility function as well as the two players' controls.

preprint2020arXiv

Mean Field Games and Applications: Numerical Aspects

The theory of mean field games aims at studying deterministic or stochastic differential games (Nash equilibria) as the number of agents tends to infinity. Since very few mean field games have explicit or semi-explicit solutions, numerical simulations play a crucial role in obtaining quantitative information from this class of models. They may lead to systems of evolutive partial differential equations coupling a backward Bellman equation and a forward Fokker-Planck equation. In the present survey, we focus on such systems. The forward-backward structure is an important feature of this system, which makes it necessary to design unusual strategies for mathematical analysis and numerical approximation. In this survey, several aspects of a finite difference method used to approximate the previously mentioned system of PDEs are discussed, including convergence, variational aspects and algorithms for solving the resulting systems of nonlinear equations. Finally, we discuss in details two applications of mean field games to the study of crowd motion and to macroeconomics, a comparison with mean field type control, and present numerical simulations.

preprint2020arXiv

On the Convergence of Model Free Learning in Mean Field Games

Learning by experience in Multi-Agent Systems (MAS) is a difficult and exciting task, due to the lack of stationarity of the environment, whose dynamics evolves as the population learns. In order to design scalable algorithms for systems with a large population of interacting agents (e.g. swarms), this paper focuses on Mean Field MAS, where the number of agents is asymptotically infinite. Recently, a very active burgeoning field studies the effects of diverse reinforcement learning algorithms for agents with no prior information on a stationary Mean Field Game (MFG) and learn their policy through repeated experience. We adopt a high perspective on this problem and analyze in full generality the convergence of a fictitious iterative scheme using any single agent learning algorithm at each step. We quantify the quality of the computed approximate Nash equilibrium, in terms of the accumulated errors arising at each learning iteration step. Notably, we show for the first time convergence of model free learning algorithms towards non-stationary MFG equilibria, relying only on classical assumptions on the MFG dynamics. We illustrate our theoretical results with a numerical experiment in a continuous action-space environment, where the approximate best response of the iterative fictitious play scheme is computed with a deep RL algorithm.

preprint2020arXiv

Policy Optimization for Linear-Quadratic Zero-Sum Mean-Field Type Games

In this paper, zero-sum mean-field type games (ZSMFTG) with linear dynamics and quadratic utility are studied under infinite-horizon discounted utility function. ZSMFTG are a class of games in which two decision makers whose utilities sum to zero, compete to influence a large population of agents. In particular, the case in which the transition and utility functions depend on the state, the action of the controllers, and the mean of the state and the actions, is investigated. The game is analyzed and explicit expressions for the Nash equilibrium strategies are derived. Moreover, two policy optimization methods that rely on policy gradient are proposed for both model-based and sample-based frameworks. In the first case, the gradients are computed exactly using the model whereas they are estimated using Monte-Carlo simulations in the second case. Numerical experiments show the convergence of the two players' controls as well as the utility function when the two algorithms are used in different scenarios.