Source author record

Stefana-Lucia Anita

Stefana-Lucia Anita appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.OC math.ST

Catalog footprint

What is connected

2works

3topics

1close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Vanishing L2 regularization for the softmax Multi Armed Bandit

Multi Armed Bandit (MAB) algorithms are a cornerstone of reinforcement learning and have been studied both theoretically and numerically. One of the most commonly used implementation uses a softmax mapping to prescribe the optimal policy and served as the foundation for downstream algorithms, including REINFORCE. Distinct from vanilla approaches, we consider here the L2 regularized softmax policy gradient where a quadratic term is subtracted from the mean reward. Previous studies exploiting convexity failed to identify a suitable theoretical framework to analyze its convergence when the regularization parameter vanishes. We prove here theoretical convergence results and confirm empirically that this regime makes the L2 regularization numerically advantageous on standard benchmarks.

preprint2022arXiv

Controlling a nonlinear Fokker-Planck equation via inputs with nonlocal action

This paper concerns an optimal control problem $(P)$ related to a nonlinear Fokker-Planck equation. The problem is deeply related to a stochastic optimal control problem $(P_S)$ for a McKean-Vlasov equation. The existence of an optimal control is obtained for the deterministic problem $(P)$. The existence of an optimal control is established and necessary optimality conditions are derived for a penalized optimal control problem $(P_h)$ related to a backward Euler approximation of the nonlinear Fokker-Planck equation (with a constant discretization step $h$). Passing to the limit ($h\rightarrow 0$) one derives the necessary optimality conditions for problem $(P)$.