Quantitative propagation of chaos for mean field Markov decision process with common noise
We investigate propagation of chaos for mean field Markov Decision Process with common noise (CMKV-MDP), and when the optimization is performed over randomized open-loop controls on infinite horizon. We first state a rate of convergence of order $M_N^γ$, where $M_N$ is the mean rate of convergence in Wasserstein distance of the empirical measure, and $γ\in (0,1]$ is an explicit constant, in the limit of the value functions of $N$-agent control problem with asymmetric open-loop controls, towards the value function of CMKV-MDP. Furthermore, we show how to explicitly construct $(ε+\mathcal{O}(M_N^γ))$-optimal policies for the $N$-agent model from $ε$-optimal policies for the CMKV-MDP. Our approach relies on sharp comparison between the Bellman operators in the $N$-agent problem and the CMKV-MDP, and fine coupling of empirical measures.