Researcher profile

Pavel Osinenko

Pavel Osinenko contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2025arXiv

Some remarks on stochastic converse Lyapunov theorems

In this brief note, we investigate some constructions of Lyapunov functions for stochastic discrete-time stabilizable dynamical systems, in other words, controlled Markov chains. The main question here is whether a Lyapunov function in some statistical sense exists if the respective controlled Markov chain admits a stabilizing policy. We demonstrate some constructions extending on the classical results for deterministic systems. Some limitations of the constructed Lyapunov functions for stabilization are discussed, particularly for stabilization in mean. Although results for deterministic systems are well known, the stochastic case was addressed in less detail, which the current paper remarks on. A distinguishable feature of this work is the study of stabilizers that possess computationally tractable convergence certificates.

preprint2022arXiv

A note on stabilizing reinforcement learning

Reinforcement learning is a general methodology of adaptive optimal control that has attracted much attention in various fields ranging from video game industry to robot manipulators. Despite its remarkable performance demonstrations, plain reinforcement learning controllers do not guarantee stability which compromises their applicability in industry. To provide such guarantees, measures have to be taken. This gives rise to what could generally be called stabilizing reinforcement learning. Concrete approaches range from employment of human overseers to filter out unsafe actions to formally verified shields and fusion with classical stabilizing controllers. A line of attack that utilizes elements of adaptive control has become fairly popular in the recent years. In this note, we critically address such an approach in a fairly general actor-critic setup for nonlinear time-continuous environments. The actor network utilizes a so-called robustifying term that is supposed to compensate for the neural network errors. The corresponding stability analysis is based on the value function itself. We indicate a problem in such a stability analysis and provide a counterexample to the overall control scheme. Implications for such a line of attack in stabilizing reinforcement learning are discussed. Furthermore, unfortunately the said problem possess no fix without a substantial reconsideration of the whole approach. As a positive message, we derive a stochastic critic neural network weight convergence analysis provided that the environment was stabilized.

preprint2022arXiv

A stabilizing reinforcement learning approach for sampled systems with partially unknown models

Reinforcement learning is commonly associated with training of reward-maximizing (or cost-minimizing) agents, in other words, controllers. It can be applied in model-free or model-based fashion, using a priori or online collected system data to train involved parametric architectures. In general, online reinforcement learning does not guarantee closed loop stability unless special measures are taken, for instance, through learning constraints or tailored training rules. Particularly promising are hybrids of reinforcement learning with "classical" control approaches. In this work, we suggest a method to guarantee practical stability of the system-controller closed loop in a purely online learning setting, i.e., without offline training. Moreover, we assume only partial knowledge of the system model. To achieve the claimed results, we employ techniques of classical adaptive control. The implementation of the overall control scheme is provided explicitly in a digital, sampled setting. That is, the controller receives the state of the system and computes the control action at discrete, specifically, equidistant moments in time. The method is tested in adaptive traction control and cruise control where it proved to significantly reduce the cost.

preprint2021arXiv

On inf-convolution-based robust practical stabilization under computational uncertainty

This work is concerned with practical stabilization of nonlinear systems by means of inf-convolution-based sample-and-hold control. It is a fairly general stabilization technique based on a generic non-smooth control Lyapunov function (CLF) and robust to actuator uncertainty, measurement noise, etc. The stabilization technique itself involves computation of descent directions of the CLF. It turns out that non-exact realization of this computation leads not just to a quantitative, but also qualitative obstruction in the sense that the result of the computation might fail to be a descent direction altogether and there is also no straightforward way to relate it to a descent direction. Disturbance, primarily measurement noise, complicate the described issue even more. This work suggests a modified inf-convolution-based control that is robust w. r. t. system and measurement noise, as well as computational uncertainty. The assumptions on the CLF are mild, as, e. g., any piece-wise smooth function, which often results from a numerical LF/CLF construction, satisfies them. A computational study with a three-wheel robot with dynamical steering and throttle under various tolerances w. r. t. computational uncertainty demonstrates the relevance of the addressed issue and the necessity of modifying the used stabilization technique. Similar analyses may be extended to other methods which involve optimization, such as Dini aiming or steepest descent.

preprint2020arXiv

A reinforcement learning method with closed-loop stability guarantee

Reinforcement learning (RL) in the context of control systems offers wide possibilities of controller adaptation. Given an infinite-horizon cost function, the so-called critic of RL approximates it with a neural net and sends this information to the controller (called "actor"). However, the issue of closed-loop stability under an RL-method is still not fully addressed. Since the critic delivers merely an approximation to the value function of the corresponding infinite-horizon problem, no guarantee can be given in general as to whether the actor's actions stabilize the system. Different approaches to this issue exist. The current work offers a particular one, which, starting with a (not necessarily smooth) control Lyapunov function (CLF), derives an online RL-scheme in such a way that practical semi-global stability property of the closed-loop can be established. The approach logically continues the work of the authors on parameterized controllers and Lyapunov-like constraints for RL, whereas the CLF now appears merely in one of the constraints of the control scheme. The analysis of the closed-loop behavior is done in a sample-and-hold (SH) manner thus offering a certain insight into the digital realization. The case study with a non-holonomic integrator shows the capabilities of the derived method to optimize the given cost function compared to a nominal stabilizing controller.

preprint2020arXiv

Model predictive control with stage cost shaping inspired by reinforcement learning

This work presents a suboptimality study of a particular model predictive control with a stage cost shaping based on the ideas of reinforcement learning. The focus of the suboptimality study is to derive quantities relating the infinite-horizon cost function under the said variant of model predictive control to the respective infinite-horizon value function. The basis control scheme involves usual stabilizing constraints comprising of a terminal set and a terminal cost in the form of a local Lyapunov function. The stage cost is adapted using the principles of Q-learning, a particular approach to reinforcement learning. The work is concluded by case studies with two systems for wide ranges of initial conditions.

preprint2020arXiv

Nonsmooth stabilization and its computational aspects

This work has the goal of briefly surveying some key stabilization techniques for general nonlinear systems, for which, as it is well known, a smooth control Lyapunov function may fail to exist. A general overview of the situation with smooth and nonsmooth stabilization is provided, followed by a concise summary of basic tools and techniques, including general stabilization, sliding-mode control and nonsmooth backstepping. Their presentation is accompanied with examples. The survey is concluded with some remarks on computational aspects related to determination of sampling times and control actions.

preprint2020arXiv

Stacked adaptive dynamic programming with unknown system model

Adaptive dynamic programming is a collective term for a variety of approaches to infinite-horizon optimal control. Common to all approaches is approximation of the infinite-horizon cost function based on dynamic programming philosophy. Typically, they also require knowledge of a dynamical model of the system. In the current work, application of adaptive dynamic programming to a system whose dynamical model is unknown to the controller is addressed. In order to realize the control algorithm, a model of the system dynamics is estimated with a Kalman filter. A stacked control scheme to boost the controller performance is suggested. The functioning of the new approach was verified in simulation and compared to the baseline represented by gradient descent on the running cost.