Higher Order Generalization Error for First Order Discretization of Langevin Diffusion
We propose a novel approach to analyze generalization error for discretizations of Langevin diffusion, such as the stochastic gradient Langevin dynamics (SGLD). For an $ε$ tolerance of expected generalization error, it is known that a first order discretization can reach this target if we run $Ω(ε^{-1} \log (ε^{-1}) )$ iterations with $Ω(ε^{-1})$ samples. In this article, we show that with additional smoothness assumptions, even first order methods can achieve arbitrarily runtime complexity. More precisely, for each $N>0$, we provide a sufficient smoothness condition on the loss function such that a first order discretization can reach $ε$ expected generalization error given $Ω( ε^{-1/N} \log (ε^{-1}) )$ iterations with $Ω(ε^{-1})$ samples.