Source author record

Gabriel Turinici

Gabriel Turinici appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning math.NA Populations and Evolution quant-ph Numerical Analysis physics.soc-ph Artificial Intelligence Data Structures and Algorithms math.DS math.ST physics.chem-ph

Catalog footprint

What is connected

12works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Convergence of a L2 regularized Policy Gradient Algorithm for the Multi Armed Bandit

Although Multi Armed Bandit (MAB) on one hand and the policy gradient approach on the other hand are among the most used frameworks of Reinforcement Learning, the theoretical properties of the policy gradient algorithm used for MAB have not been given enough attention. We investigate in this work the convergence of such a procedure for the situation when a $L2$ regularization term is present jointly with the 'softmax' parametrization. We prove convergence under appropriate technical hypotheses and test numerically the procedure including situations beyond the theoretical setting. The tests show that a time dependent regularized procedure can improve over the canonical approach especially when the initial guess is far from the solution.

preprint2026arXiv

Vanishing L2 regularization for the softmax Multi Armed Bandit

Multi Armed Bandit (MAB) algorithms are a cornerstone of reinforcement learning and have been studied both theoretically and numerically. One of the most commonly used implementation uses a softmax mapping to prescribe the optimal policy and served as the foundation for downstream algorithms, including REINFORCE. Distinct from vanilla approaches, we consider here the L2 regularized softmax policy gradient where a quadratic term is subtracted from the mean reward. Previous studies exploiting convexity failed to identify a suitable theoretical framework to analyze its convergence when the regularization parameter vanishes. We prove here theoretical convergence results and confirm empirically that this regime makes the L2 regularization numerically advantageous on standard benchmarks.

preprint2022arXiv

COVID-19 adaptive humoral immunity models: weakly neutralizing versus antibody-disease enhancement scenarios

The interplay between the virus, infected cells and the immune responses to SARS-CoV-2 is still under debate. Extending the basic model of viral dynamics we propose here a formal approach to describe the neutralizing versus weakly (or non-)neutralizing scenarios and compare with the possible effects of antibody-dependent enhancement (ADE). The theoretical model is consistent with data available from the literature; we show that weakly neutralizing antibodies or ADE can both give rise to either final virus clearance or disease progression, but the immuno-dynamic is different in each case. Given that a significant part of the world population is already naturally immunized or vaccinated, we also discuss the implications on secondary infections infections following vaccination or in presence of immune system dysfunctions.

preprint2020arXiv

Contact rate epidemic control of COVID-19: an equilibrium view

We consider the control of the COVID-19 pandemic through a standard SIR compartmental model. This control is induced by the aggregation of individuals' decisions to limit their social interactions: when the epidemic is ongoing, an individual can diminish his/her contact rate in order to avoid getting infected, but this effort comes at a social cost. If each individual lowers his/her contact rate, the epidemic vanishes faster, but the effort cost may be high. A Mean Field Nash equilibrium at the population level is formed, resulting in a lower effective transmission rate of the virus. We prove theoretically that equilibrium exists and compute it numerically. However, this equilibrium selects a sub-optimal solution in comparison to the societal optimum (a centralized decision respected fully by all individuals), meaning that the cost of anarchy is strictly positive. We provide numerical examples and a sensitivity analysis, as well as an extension to a SEIR compartmental model to account for the relatively long latent phase of the COVID-19 disease. In all the scenarii considered, the divergence between the individual and societal strategies happens both before the peak of the epidemic, due to individuals' fears, and after, when a significant propagation is still underway.

preprint2020arXiv

Heterogeneous social interactions and the COVID-19 lockdown outcome in a multi-group SEIR model

We study variants of the SEIR model for interpreting some qualitative features of the statistics of the Covid-19 epidemic in France. Standard SEIR models distinguish essentially two regimes: either the disease is controlled and the number of infected people rapidly decreases, or the disease spreads and contaminates a significant fraction of the population until herd immunity is achieved. After lockdown, at first sight it seems that social distancing is not enough to control the outbreak. We discuss here a possible explanation, namely that the lockdown is creating social heterogeneity: even if a large majority of the population complies with the lockdown rules, a small fraction of the population still has to maintain a normal or high level of social interactions, such as health workers, providers of essential services, etc. This results in an apparent high level of epidemic propagation as measured through re-estimations of the basic reproduction ratio. However, these measures are limited to averages, while variance inside the population plays an essential role on the peak and the size of the epidemic outbreak and tends to lower these two indicators. We provide theoretical and numerical results to sustain such a view.

preprint2020arXiv

Stochastic Runge-Kutta methods and adaptive SGD-G2 stochastic gradient descent

The minimization of the loss function is of paramount importance in deep neural networks. On the other hand, many popular optimization algorithms have been shown to correspond to some evolution equation of gradient flow type. Inspired by the numerical schemes used for general evolution equations we introduce a second order stochastic Runge Kutta method and show that it yields a consistent procedure for the minimization of the loss function. In addition it can be coupled, in an adaptive framework, with a Stochastic Gradient Descent (SGD) to adjust automatically the learning rate of the SGD, without the need of any additional information on the Hessian of the loss functional. The adaptive SGD, called SGD-G2, is successfully tested on standard datasets.

preprint2012arXiv

Control through operators for quantum chemistry

We consider the problem of operator identification in quantum control. The free Hamiltonian and the dipole moment are searched such that a given target state is reached at a given time. A local existence result is obtained. As a by-product, our works reveals necessary conditions on the laser field to make the identification feasible. In the last part of this work, some algorithms are proposed to compute effectively these operators.

preprint2012arXiv

Critical points of the optimal quantum control landscape: a propagator approach

Numerical and experimental realizations of quantum control are closely connected to the properties of the mapping from the control to the unitary propagator. For bilinear quantum control problems, no general results are available to fully determine when this mapping is singular or not. In this paper we give suffcient conditions, in terms of elements of the evolution semigroup, for a trajectory to be non-singular. We identify two lists of "way-points" that, when reached, ensure the non-singularity of the control trajectory. It is found that under appropriate hypotheses one of those lists does not depend on the values of the coupling operator matrix.

preprint2011arXiv

Hamiltonian identification through enhanced observability utilizing quantum control

This paper considers Hamiltonian identification for a controllable quantum system with non-degenerate transitions and a known initial state. We assume to have at our disposal a single scalar control input and the population measure of only one state at an (arbitrarily large) final time T. We prove that the quantum dipole moment matrix is locally observable in the following sense: for any two close but distinct dipole moment matrices, we construct discriminating controls giving two different measurements. Such discriminating controls are constructed to have three well defined temporal components, as inspired by Ramsey interferometry. This result suggests that what may appear at first to be very restrictive measurements are actually rich for identification, when combined with well designed discriminating controls, to uniquely identify the complete dipole moment of such systems. The assessment supports the employment of quantum control as a promising means to achieve high quality identification of a Hamiltonian.

preprint2010arXiv

A monotonic method for solving nonlinear optimal control problems

Initially introduced in the framework of quantum control, the so-called "monotonic algorithms" have demonstrated excellent numerical performance when dealing with bilinear optimal control problems. This paper presents a unified formulation that can be applied to more nonlinear settings compatible with the hypothesis detailed below. In this framework, we show that the well-posedness of the general algorithm is related to a nonlinear evolution equation. We prove the existence of the solution to this equation and give important properties of the optimal control functional. Finally we show how the algorithm works for selected models from the literature and compare it with the gradient algorithm.

preprint2010arXiv

A smoothing monotonic convergent optimal control algorithm for NMR pulse sequence design

The past decade has demonstrated increasing interests in using optimal control based methods within coherent quantum controllable systems. The versatility of such methods has been demonstrated with particular elegance within nuclear magnetic resonance (NMR) where natural separation between coherent and dissipative spin dynamics processes has enabled coherent quantum control over long periods of time to shape the experiment to almost ideal adoption to the spin system and external manipulations. This has led to new design principles as well as powerful new experimental methods within magnetic resonance imaging, liquid-state and solid-state NMR spectroscopy. For this development to continue and expand, it is crucially important to constantly improve the underlying numerical algorithms to provide numerical solutions which are optimally compatible with implementation on current instrumentation and at same time are numerically stable and offer fast monotonic convergence towards the target. Addressing such aims, we here present a smoothing monotonically convergent algorithm for pulse sequence design in magnetic resonance which with improved optimization stability lead to smooth pulse sequence easier to implement experimentally and potentially understand within the analytical framework of modern NMR spectroscopy.

preprint2010arXiv

Analysis of the Toolkit method for the time-dependant Schrödinger equation

The goal of this paper is to provide an analysis of the "toolkit" method used in the numerical approximation of the time-dependent Schrödinger equation. The "toolkit" method is based on precomputation of elementary propagators and was seen to be very efficient in the optimal control framework. Our analysis shows that this method provides better results than the second order Strang operator splitting. In addition, we present two improvements of the method in the limit of low and large intensity control fields.

Gabriel Turinici

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Convergence of a L2 regularized Policy Gradient Algorithm for the Multi Armed Bandit

Vanishing L2 regularization for the softmax Multi Armed Bandit

COVID-19 adaptive humoral immunity models: weakly neutralizing versus antibody-disease enhancement scenarios

Contact rate epidemic control of COVID-19: an equilibrium view

Heterogeneous social interactions and the COVID-19 lockdown outcome in a multi-group SEIR model

Stochastic Runge-Kutta methods and adaptive SGD-G2 stochastic gradient descent

Control through operators for quantum chemistry

Critical points of the optimal quantum control landscape: a propagator approach

Hamiltonian identification through enhanced observability utilizing quantum control

A monotonic method for solving nonlinear optimal control problems

A smoothing monotonic convergent optimal control algorithm for NMR pulse sequence design

Analysis of the Toolkit method for the time-dependant Schrödinger equation