Source author record

Alexey Naumov

Alexey Naumov appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.PR Machine Learning math.SP math.ST Statistics Theory

Catalog footprint

What is connected

11works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

On Gaussian approximation for entropy-regularized Q-learning with function approximation

In this paper, we derive rates of convergence in the high-dimensional central limit theorem for Polyak--Ruppert averaged iterates generated by entropy-regularized asynchronous Q-learning with linear function approximation and a polynomial stepsize $k^{-ω}$, $ω\in (1/2,1)$. Assuming that the sequence of observed triples $(s_k,a_k,s_{k+1})_{k \geq 0}$ forms a uniformly geometrically ergodic Markov chain, and under suitable regularity conditions for the projected soft Bellman equation, we establish a Gaussian approximation bound in the convex distance with rate of order $n^{-1/4}$, up to polylogarithmic factors in $n$, where $n$ is the number of samples used by the algorithm. To obtain this result, we combine a linearization of the soft Bellman recursion with a Gaussian approximation for the leading martingale term. Finally, we derive high-order moment bounds for the algorithm's last iterate, which might be of independent interest.

preprint2022arXiv

From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses

We propose the Bayes-UCBVI algorithm for reinforcement learning in tabular, stage-dependent, episodic Markov decision process: a natural extension of the Bayes-UCB algorithm by Kaufmann et al. (2012) for multi-armed bandits. Our method uses the quantile of a Q-value function posterior as upper confidence bound on the optimal Q-value function. For Bayes-UCBVI, we prove a regret bound of order $\widetilde{O}(\sqrt{H^3SAT})$ where $H$ is the length of one episode, $S$ is the number of states, $A$ the number of actions, $T$ the number of episodes, that matches the lower-bound of $Ω(\sqrt{H^3SAT})$ up to poly-$\log$ terms in $H,S,A,T$ for a large enough $T$. To the best of our knowledge, this is the first algorithm that obtains an optimal dependence on the horizon $H$ (and $S$) without the need for an involved Bernstein-like bonus or noise. Crucial to our analysis is a new fine-grained anti-concentration bound for a weighted Dirichlet sum that can be of independent interest. We then explain how Bayes-UCBVI can be easily extended beyond the tabular setting, exhibiting a strong link between our algorithm and Bayesian bootstrap (Rubin, 1981).

preprint2021arXiv

On the Stability of Random Matrix Product with Markovian Noise: Application to Linear Stochastic Approximation and TD Learning

This paper studies the exponential stability of random matrix products driven by a general (possibly unbounded) state space Markov chain. It is a cornerstone in the analysis of stochastic algorithms in machine learning (e.g. for parameter tracking in online learning or reinforcement learning). The existing results impose strong conditions such as uniform boundedness of the matrix-valued functions and uniform ergodicity of the Markov chains. Our main contribution is an exponential stability result for the $p$-th moment of random matrix product, provided that (i) the underlying Markov chain satisfies a super-Lyapunov drift condition, (ii) the growth of the matrix-valued functions is controlled by an appropriately defined function (related to the drift condition). Using this result, we give finite-time $p$-th moment bounds for constant and decreasing stepsize linear stochastic approximation schemes with Markovian noise on general state space. We illustrate these findings for linear value-function estimation in reinforcement learning. We provide finite-time $p$-th moment bound for various members of temporal difference (TD) family of algorithms.

preprint2020arXiv

Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise

Linear two-timescale stochastic approximation (SA) scheme is an important class of algorithms which has become popular in reinforcement learning (RL), particularly for the policy evaluation problem. Recently, a number of works have been devoted to establishing the finite time analysis of the scheme, especially under the Markovian (non-i.i.d.) noise settings that are ubiquitous in practice. In this paper, we provide a finite-time analysis for linear two timescale SA. Our bounds show that there is no discrepancy in the convergence rate between Markovian and martingale noise, only the constants are affected by the mixing time of the Markov chain. With an appropriate step size schedule, the transient term in the expected error bound is $o(1/k^c)$ and the steady-state term is ${\cal O}(1/k)$, where $c>1$ and $k$ is the iteration number. Furthermore, we present an asymptotic expansion of the expected error with a matching lower bound of $Ω(1/k)$. A simple numerical experiment is presented to support our theory.

preprint2016arXiv

Asymptotic analysis of symmetric functions

In this paper we consider asymptotic expansions for a class of sequences of symmetric functions of many variables. Applications to classical and free probability theory are discussed.

preprint2016arXiv

Local semicircle law under moment conditions. Part I: The Stieltjes transform

We consider a random symmetric matrix ${\bf X} = [X_{jk}]_{j,k=1}^n$ in which the upper triangular entries are independent identically distributed random variables with mean zero and unit variance. We additionally suppose that $\mathbb E |X_{11}|^{4 + δ} =: μ_4 < \infty$ for some $δ> 0$. Under these conditions we show that the typical distance between the Stieltjes transform of the empirical spectral distribution (ESD) of the matrix $n^{-\frac{1}{2}} {\bf X}$ and Wigner's semicircle law is of order $(nv)^{-1}$, where $v$ is the distance in the complex plane to the real line. Furthermore we outline applications which are deferred to a subsequent paper, such as the rate of convergence in probability of the ESD to the distribution function of the semicircle law, rigidity of the eigenvalues and eigenvector delocalization.

preprint2016arXiv

Local semicircle law under moment conditions. Part II: Localization and delocalization

We consider a random symmetric matrix ${\bf X} = [X_{jk}]_{j,k=1}^n$ with upper triangular entries being independent identically distributed random variables with mean zero and unit variance. We additionally suppose that $\mathbb E |X_{11}|^{4 + δ} =: μ_{4+δ} < C$ for some $δ> 0$ and some absolute constant $C$. Under these conditions we show that the typical Kolmogorov distance between the empirical spectral distribution function of eigenvalues of $n^{-1/2} {\bf X}$ and Wigner's semicircle law is of order $1/n$ up to some logarithmic correction factor. As a direct consequence of this result we establish that the semicircle law holds on a short scale. Furthermore, we show for this finite moment ensemble rigidity of eigenvalues and delocalization properties of the eigenvectors. Some numerical experiments are included illustrating the influence of the tail behavior of the matrix entries when only a small number of moments exist.

preprint2015arXiv

Distribution of Linear Statistics of Singular Values of the Product of Random Matrices

In this paper we consider the product of two independent random matrices $\mathbb X^{(1)}$ and $\mathbb X^{(2)}$. Assume that $X_{jk}^{(q)}, 1 \le j,k \le n, q = 1, 2,$ are i.i.d. random variables with $\mathbb E X_{jk}^{(q)} = 0, \mathbb E (X_{jk}^{(q)})^2 = 1$. Denote by $s_1, ..., s_n$ the singular values of $\mathbb W: = \frac{1}{n} \mathbb X^{(1)} \mathbb X^{(2)}$. We prove the central limit theorem for linear statistics of the squared singular values $s_1^2, ..., s_n^2$ showing that the limiting variance depends on $κ_4: = \mathbb E (X_{11}^{1})^4 - 3$.

preprint2014arXiv

On one generalization of the elliptic law for random matrices

We consider the products of $m\ge 2$ independent large real random matrices with independent vectors $(X_{jk}^{(q)},X_{kj}^{(q)})$ of entries. The entries $X_{jk}^{(q)},X_{kj}^{(q)}$ are correlated with $ρ=\mathbb E X_{jk}^{(q)}X_{kj}^{(q)}$. The limit distribution of the empirical spectral distribution of the eigenvalues of such products doesn't depend on $ρ$ and equals to the distribution of $m$th power of the random variable uniformly distributed on the unit disc.

preprint2013arXiv

On minimal singular values of random matrices with correlated entries

Let $\mathbf X$ be a random matrix whose pairs of entries $X_{jk}$ and $X_{kj}$ are correlated and vectors $ (X_{jk},X_{kj})$, for $1\le j<k\le n$, are mutually independent. Assume that the diagonal entries are independent from off-diagonal entries as well. We assume that $\mathbb{E} X_{jk}=0$, $\mathbb{E} X_{jk}^2=1$, for any $j,k=1,\ldots,n$ and $\mathbb{E} X_{jk}X_{kj}=ρ$ for $1\le j<k\le n$. Let $\mathbf M_n$ be a non-random $n\times n$ matrix with $\|\mathbf M_n\|\le Kn^Q$, for some positive constants $K>0$ and $Q\ge 0$. Let $s_n(\mathbf X+\mathbf M_n)$ denote the least singular value of the matrix $\mathbf X+\mathbf M_n$. It is shown that there exist positive constants $A$ and $B$ depending on $K,Q,ρ$ only such that $$ \mathbb{P}(s_n(\mathbf X+\mathbf M_n)\le n^{-A})\le n^{-B}. $$ As an application of this result we prove the elliptic law for this class of matrices with non identically distributed correlated entries.

preprint2012arXiv

Elliptic law for real random matrices

In this paper we consider ensemble of random matrices $\X_n$ with independent identically distributed vectors $(X_{ij}, X_{ji})_{i \neq j}$ of entries. Under assumption of finite fourth moment of matrix entries it is proved that empirical spectral distribution of eigenvalues converges in probability to a uniform distribution on the ellipse. The axis of the ellipse are determined by correlation between $X_{12}$ and $X_{21}$. This result is called Elliptic Law. Limit distribution doesn't depend on distribution of matrix elements and the result in this sence is universal.

Alexey Naumov

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

On Gaussian approximation for entropy-regularized Q-learning with function approximation

From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses

On the Stability of Random Matrix Product with Markovian Noise: Application to Linear Stochastic Approximation and TD Learning

Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise

Asymptotic analysis of symmetric functions

Local semicircle law under moment conditions. Part I: The Stieltjes transform

Local semicircle law under moment conditions. Part II: Localization and delocalization

Distribution of Linear Statistics of Singular Values of the Product of Random Matrices

On one generalization of the elliptic law for random matrices

On minimal singular values of random matrices with correlated entries

Elliptic law for real random matrices