Paper detail

Central Limit Theorem for Two-Time-Scale Approximate Distributionally Robust RL

Designing model-free algorithms for distributionally robust reinforcement learning (DRRL) poses fundamental challenges. The robust Bellman operator is nonlinear in the transition kernel, which makes one-sample Bellman updates biased, while the adversarial optimization underlying robustness makes robust evaluation computationally demanding. To address these difficulties, we consider the natural small-ambiguity regime under Kullback--Leibler ambiguity sets and propose an approximate DRRL framework based on a first-order expansion of the relevant robust functional. This yields an approximate robust Bellman equation that removes the adversarial optimization while remaining first-order accurate in the ambiguity radius. To learn the fixed point of this approximate equation, we propose Mean-Variance Stochastic Approximation (MVSA), a model-free algorithm that uses only one-sample updates. This is achieved via a lifted stochastic approximation dynamics and a two-time-scale design. We then prove convergence and a central limit theorem for MVSA: its main iterate satisfies a central limit theorem at the canonical $n^{-1/2}$ scale, with explicitly characterized asymptotic covariances. Finally, we validate our theoretical findings with a numerical experiment.

preprint2026arXivOpen access
0citations
0reviews
0saves
Nocode
Nodataset
0institutions

Next steps

Decide what to do with this paper

Use like or dislike for the fast social read. The more specific scholarly feedback stays available below when needed.

Log in to curate

Reading frame

Keep the important context close to the paper

Keep the important signals around this paper in one place: votes, save state, collection context, reviews and the metadata you need before deciding what to do next.

Institutions

Add specific reaction

Move through the context

Research map

Open full explorer

Move through nearby people, institutions, topics and adjacent work without leaving the paper page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Structured reviews

0 review(s)

ContributeLeave structured feedbackUse the review template when you have a concrete strength, concern or method question.Open review form

No structured reviews yet. High-signal critique starts here.

Work discussion

0 comment(s)

DiscussAdd a high-signal commentKeep quick notes, caveats and replication pointers separate from formal reviews.Open comment form

No discussion yet. The first strong comment sets the tone.