Researcher profile

Stefana-Lucia Anita

Stefana-Lucia Anita contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - Baseline
2works
0followers
3topics
1close collaborators

Actions

Decide how to stay connected

Follow researcher0

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

Vanishing L2 regularization for the softmax Multi Armed Bandit

Multi Armed Bandit (MAB) algorithms are a cornerstone of reinforcement learning and have been studied both theoretically and numerically. One of the most commonly used implementation uses a softmax mapping to prescribe the optimal policy and served as the foundation for downstream algorithms, including REINFORCE. Distinct from vanilla approaches, we consider here the L2 regularized softmax policy gradient where a quadratic term is subtracted from the mean reward. Previous studies exploiting convexity failed to identify a suitable theoretical framework to analyze its convergence when the regularization parameter vanishes. We prove here theoretical convergence results and confirm empirically that this regime makes the L2 regularization numerically advantageous on standard benchmarks.

preprint2022arXiv

Controlling a nonlinear Fokker-Planck equation via inputs with nonlocal action

This paper concerns an optimal control problem $(P)$ related to a nonlinear Fokker-Planck equation. The problem is deeply related to a stochastic optimal control problem $(P_S)$ for a McKean-Vlasov equation. The existence of an optimal control is obtained for the deterministic problem $(P)$. The existence of an optimal control is established and necessary optimality conditions are derived for a penalized optimal control problem $(P_h)$ related to a backward Euler approximation of the nonlinear Fokker-Planck equation (with a constant discretization step $h$). Passing to the limit ($h\rightarrow 0$) one derives the necessary optimality conditions for problem $(P)$.