Researcher profile

Amaury Gouverneur

Amaury Gouverneur contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2022arXiv

An Information-Theoretic Analysis of Bayesian Reinforcement Learning

Building on the framework introduced by Xu and Raginksy [1] for supervised learning problems, we study the best achievable performance for model-based Bayesian reinforcement learning problems. With this purpose, we define minimum Bayesian regret (MBR) as the difference between the maximum expected cumulative reward obtainable either by learning from the collected data or by knowing the environment and its dynamics. We specialize this definition to reinforcement learning problems modeled as Markov decision processes (MDPs) whose kernel parameters are unknown to the agent and whose uncertainty is expressed by a prior distribution. One method for deriving upper bounds on the MBR is presented and specific bounds based on the relative entropy and the Wasserstein distance are given. We then focus on two particular cases of MDPs, the multi-armed bandit problem (MAB) and the online optimization with partial feedback problem. For the latter problem, we show that our bounds can recover from below the current information-theoretic bounds by Russo and Van Roy [2].

preprint2022arXiv

Optimal Intermittent Particle Filter

The problem of the optimal allocation (in the expected mean square error sense) of a measurement budget for particle filtering is addressed. We propose three different optimal intermittent filters, whose optimality criteria depend on the information available at the time of decision making. For the first, the stochastic program filter, the measurement times are given by a policy that determines whether a measurement should be taken based on the measurements already acquired. The second, called the offline filter, determines all measurement times at once by solving a combinatorial optimization program before any measurement acquisition. For the third one, which we call online filter, each time a new measurement is received, the next measurement time is recomputed to take all the information that is then available into account. We prove that in terms of expected mean square error, the stochastic program filter outperforms the online filter, which itself outperforms the offline filter. However, these filters are generally intractable. For this reason, the filter estimate is approximated by a particle filter. Moreover, the mean square error is approximated using a Monte-Carlo approach, and different optimization algorithms are compared to approximately solve the combinatorial programs (a random trial algorithm, greedy forward and backward algorithms, a simulated annealing algorithm, and a genetic algorithm). Finally, the performance of the proposed methods is illustrated on two examples: a tumor motion model and a common benchmark for particle filtering.

preprint2020arXiv

Optimal measurement budget allocation for particle filtering

Particle filtering is a powerful tool for target tracking. When the budget for observations is restricted, it is necessary to reduce the measurements to a limited amount of samples carefully selected. A discrete stochastic nonlinear dynamical system is studied over a finite time horizon. The problem of selecting the optimal measurement times for particle filtering is formalized as a combinatorial optimization problem. We propose an approximated solution based on the nesting of a genetic algorithm, a Monte Carlo algorithm and a particle filter. Firstly, an example demonstrates that the genetic algorithm outperforms a random trial optimization. Then, the interest of non-regular measurements versus measurements performed at regular time intervals is illustrated and the efficiency of our proposed solution is quantified: better filtering performances are obtained in 87.5% of the cases and on average, the relative improvement is 27.7%.