Source author record

Amaury Gouverneur

Amaury Gouverneur appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.SY Systems and Control Machine Learning math.OC

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

An Information-Theoretic Analysis of Bayesian Reinforcement Learning

Building on the framework introduced by Xu and Raginksy [1] for supervised learning problems, we study the best achievable performance for model-based Bayesian reinforcement learning problems. With this purpose, we define minimum Bayesian regret (MBR) as the difference between the maximum expected cumulative reward obtainable either by learning from the collected data or by knowing the environment and its dynamics. We specialize this definition to reinforcement learning problems modeled as Markov decision processes (MDPs) whose kernel parameters are unknown to the agent and whose uncertainty is expressed by a prior distribution. One method for deriving upper bounds on the MBR is presented and specific bounds based on the relative entropy and the Wasserstein distance are given. We then focus on two particular cases of MDPs, the multi-armed bandit problem (MAB) and the online optimization with partial feedback problem. For the latter problem, we show that our bounds can recover from below the current information-theoretic bounds by Russo and Van Roy [2].

preprint2022arXiv

Optimal Intermittent Particle Filter

The problem of the optimal allocation (in the expected mean square error sense) of a measurement budget for particle filtering is addressed. We propose three different optimal intermittent filters, whose optimality criteria depend on the information available at the time of decision making. For the first, the stochastic program filter, the measurement times are given by a policy that determines whether a measurement should be taken based on the measurements already acquired. The second, called the offline filter, determines all measurement times at once by solving a combinatorial optimization program before any measurement acquisition. For the third one, which we call online filter, each time a new measurement is received, the next measurement time is recomputed to take all the information that is then available into account. We prove that in terms of expected mean square error, the stochastic program filter outperforms the online filter, which itself outperforms the offline filter. However, these filters are generally intractable. For this reason, the filter estimate is approximated by a particle filter. Moreover, the mean square error is approximated using a Monte-Carlo approach, and different optimization algorithms are compared to approximately solve the combinatorial programs (a random trial algorithm, greedy forward and backward algorithms, a simulated annealing algorithm, and a genetic algorithm). Finally, the performance of the proposed methods is illustrated on two examples: a tumor motion model and a common benchmark for particle filtering.

preprint2020arXiv

Optimal measurement budget allocation for particle filtering

Particle filtering is a powerful tool for target tracking. When the budget for observations is restricted, it is necessary to reduce the measurements to a limited amount of samples carefully selected. A discrete stochastic nonlinear dynamical system is studied over a finite time horizon. The problem of selecting the optimal measurement times for particle filtering is formalized as a combinatorial optimization problem. We propose an approximated solution based on the nesting of a genetic algorithm, a Monte Carlo algorithm and a particle filter. Firstly, an example demonstrates that the genetic algorithm outperforms a random trial optimization. Then, the interest of non-regular measurements versus measurements performed at regular time intervals is illustrated and the efficiency of our proposed solution is quantified: better filtering performances are obtained in 87.5% of the cases and on average, the relative improvement is 27.7%.