Researcher profile

Gabriel Turinici

Gabriel Turinici contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2026arXiv

Convergence of a L2 regularized Policy Gradient Algorithm for the Multi Armed Bandit

Although Multi Armed Bandit (MAB) on one hand and the policy gradient approach on the other hand are among the most used frameworks of Reinforcement Learning, the theoretical properties of the policy gradient algorithm used for MAB have not been given enough attention. We investigate in this work the convergence of such a procedure for the situation when a $L2$ regularization term is present jointly with the 'softmax' parametrization. We prove convergence under appropriate technical hypotheses and test numerically the procedure including situations beyond the theoretical setting. The tests show that a time dependent regularized procedure can improve over the canonical approach especially when the initial guess is far from the solution.

preprint2026arXiv

Vanishing L2 regularization for the softmax Multi Armed Bandit

Multi Armed Bandit (MAB) algorithms are a cornerstone of reinforcement learning and have been studied both theoretically and numerically. One of the most commonly used implementation uses a softmax mapping to prescribe the optimal policy and served as the foundation for downstream algorithms, including REINFORCE. Distinct from vanilla approaches, we consider here the L2 regularized softmax policy gradient where a quadratic term is subtracted from the mean reward. Previous studies exploiting convexity failed to identify a suitable theoretical framework to analyze its convergence when the regularization parameter vanishes. We prove here theoretical convergence results and confirm empirically that this regime makes the L2 regularization numerically advantageous on standard benchmarks.

preprint2022arXiv

COVID-19 adaptive humoral immunity models: weakly neutralizing versus antibody-disease enhancement scenarios

The interplay between the virus, infected cells and the immune responses to SARS-CoV-2 is still under debate. Extending the basic model of viral dynamics we propose here a formal approach to describe the neutralizing versus weakly (or non-)neutralizing scenarios and compare with the possible effects of antibody-dependent enhancement (ADE). The theoretical model is consistent with data available from the literature; we show that weakly neutralizing antibodies or ADE can both give rise to either final virus clearance or disease progression, but the immuno-dynamic is different in each case. Given that a significant part of the world population is already naturally immunized or vaccinated, we also discuss the implications on secondary infections infections following vaccination or in presence of immune system dysfunctions.

preprint2020arXiv

Contact rate epidemic control of COVID-19: an equilibrium view

We consider the control of the COVID-19 pandemic through a standard SIR compartmental model. This control is induced by the aggregation of individuals' decisions to limit their social interactions: when the epidemic is ongoing, an individual can diminish his/her contact rate in order to avoid getting infected, but this effort comes at a social cost. If each individual lowers his/her contact rate, the epidemic vanishes faster, but the effort cost may be high. A Mean Field Nash equilibrium at the population level is formed, resulting in a lower effective transmission rate of the virus. We prove theoretically that equilibrium exists and compute it numerically. However, this equilibrium selects a sub-optimal solution in comparison to the societal optimum (a centralized decision respected fully by all individuals), meaning that the cost of anarchy is strictly positive. We provide numerical examples and a sensitivity analysis, as well as an extension to a SEIR compartmental model to account for the relatively long latent phase of the COVID-19 disease. In all the scenarii considered, the divergence between the individual and societal strategies happens both before the peak of the epidemic, due to individuals' fears, and after, when a significant propagation is still underway.

preprint2020arXiv

Heterogeneous social interactions and the COVID-19 lockdown outcome in a multi-group SEIR model

We study variants of the SEIR model for interpreting some qualitative features of the statistics of the Covid-19 epidemic in France. Standard SEIR models distinguish essentially two regimes: either the disease is controlled and the number of infected people rapidly decreases, or the disease spreads and contaminates a significant fraction of the population until herd immunity is achieved. After lockdown, at first sight it seems that social distancing is not enough to control the outbreak. We discuss here a possible explanation, namely that the lockdown is creating social heterogeneity: even if a large majority of the population complies with the lockdown rules, a small fraction of the population still has to maintain a normal or high level of social interactions, such as health workers, providers of essential services, etc. This results in an apparent high level of epidemic propagation as measured through re-estimations of the basic reproduction ratio. However, these measures are limited to averages, while variance inside the population plays an essential role on the peak and the size of the epidemic outbreak and tends to lower these two indicators. We provide theoretical and numerical results to sustain such a view.

preprint2020arXiv

Stochastic Runge-Kutta methods and adaptive SGD-G2 stochastic gradient descent

The minimization of the loss function is of paramount importance in deep neural networks. On the other hand, many popular optimization algorithms have been shown to correspond to some evolution equation of gradient flow type. Inspired by the numerical schemes used for general evolution equations we introduce a second order stochastic Runge Kutta method and show that it yields a consistent procedure for the minimization of the loss function. In addition it can be coupled, in an adaptive framework, with a Stochastic Gradient Descent (SGD) to adjust automatically the learning rate of the SGD, without the need of any additional information on the Hessian of the loss functional. The adaptive SGD, called SGD-G2, is successfully tested on standard datasets.

preprint2012arXiv

Control through operators for quantum chemistry

We consider the problem of operator identification in quantum control. The free Hamiltonian and the dipole moment are searched such that a given target state is reached at a given time. A local existence result is obtained. As a by-product, our works reveals necessary conditions on the laser field to make the identification feasible. In the last part of this work, some algorithms are proposed to compute effectively these operators.

preprint2012arXiv

Critical points of the optimal quantum control landscape: a propagator approach

Numerical and experimental realizations of quantum control are closely connected to the properties of the mapping from the control to the unitary propagator. For bilinear quantum control problems, no general results are available to fully determine when this mapping is singular or not. In this paper we give suffcient conditions, in terms of elements of the evolution semigroup, for a trajectory to be non-singular. We identify two lists of "way-points" that, when reached, ensure the non-singularity of the control trajectory. It is found that under appropriate hypotheses one of those lists does not depend on the values of the coupling operator matrix.

preprint2010arXiv

A monotonic method for solving nonlinear optimal control problems

Initially introduced in the framework of quantum control, the so-called "monotonic algorithms" have demonstrated excellent numerical performance when dealing with bilinear optimal control problems. This paper presents a unified formulation that can be applied to more nonlinear settings compatible with the hypothesis detailed below. In this framework, we show that the well-posedness of the general algorithm is related to a nonlinear evolution equation. We prove the existence of the solution to this equation and give important properties of the optimal control functional. Finally we show how the algorithm works for selected models from the literature and compare it with the gradient algorithm.

preprint2010arXiv

Analysis of the Toolkit method for the time-dependant Schrödinger equation

The goal of this paper is to provide an analysis of the "toolkit" method used in the numerical approximation of the time-dependent Schrödinger equation. The "toolkit" method is based on precomputation of elementary propagators and was seen to be very efficient in the optimal control framework. Our analysis shows that this method provides better results than the second order Strang operator splitting. In addition, we present two improvements of the method in the limit of low and large intensity control fields.