Empirical Measure Large Deviations for Reinforced Chains on Finite Spaces
Let $A$ be a transition probability kernel on a finite state space $Δ^o =\{1, \ldots , d\}$ such that $A(x,y)>0$ for all $x,y \in Δ^o$. Consider a reinforced chain given as a sequence $\{X_n, \; n \in \mathbb{N}_0\}$ of $Δ^o$-valued random variables, defined recursively according to, $$L^n = \frac{1}{n}\sum_{i=0}^{n-1} δ_{X_i}, \;\; P(X_{n+1} \in \cdot \mid X_0, \ldots, X_n) = L^n A(\cdot).$$ We establish a large deviation principle for $\{L^n\}$. The rate function takes a strikingly different form than the Donsker-Varadhan rate function associated with the empirical measure of the Markov chain with transition kernel $A$ and is described in terms of a novel deterministic infinite horizon discounted cost control problem with an associated linear controlled dynamics and a nonlinear running cost involving the relative entropy function. Proofs are based on an analysis of time-reversal of controlled dynamics in representations for log-transforms of exponential moments, and on weak convergence methods.