Source author record

Pavel Osinenko

Pavel Osinenko appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC math.DS eess.SY Systems and Control Machine Learning math.MG

Catalog footprint

What is connected

10works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

Some remarks on stochastic converse Lyapunov theorems

In this brief note, we investigate some constructions of Lyapunov functions for stochastic discrete-time stabilizable dynamical systems, in other words, controlled Markov chains. The main question here is whether a Lyapunov function in some statistical sense exists if the respective controlled Markov chain admits a stabilizing policy. We demonstrate some constructions extending on the classical results for deterministic systems. Some limitations of the constructed Lyapunov functions for stabilization are discussed, particularly for stabilization in mean. Although results for deterministic systems are well known, the stochastic case was addressed in less detail, which the current paper remarks on. A distinguishable feature of this work is the study of stabilizers that possess computationally tractable convergence certificates.

preprint2022arXiv

A note on stabilizing reinforcement learning

Reinforcement learning is a general methodology of adaptive optimal control that has attracted much attention in various fields ranging from video game industry to robot manipulators. Despite its remarkable performance demonstrations, plain reinforcement learning controllers do not guarantee stability which compromises their applicability in industry. To provide such guarantees, measures have to be taken. This gives rise to what could generally be called stabilizing reinforcement learning. Concrete approaches range from employment of human overseers to filter out unsafe actions to formally verified shields and fusion with classical stabilizing controllers. A line of attack that utilizes elements of adaptive control has become fairly popular in the recent years. In this note, we critically address such an approach in a fairly general actor-critic setup for nonlinear time-continuous environments. The actor network utilizes a so-called robustifying term that is supposed to compensate for the neural network errors. The corresponding stability analysis is based on the value function itself. We indicate a problem in such a stability analysis and provide a counterexample to the overall control scheme. Implications for such a line of attack in stabilizing reinforcement learning are discussed. Furthermore, unfortunately the said problem possess no fix without a substantial reconsideration of the whole approach. As a positive message, we derive a stochastic critic neural network weight convergence analysis provided that the environment was stabilized.

preprint2022arXiv

A stabilizing reinforcement learning approach for sampled systems with partially unknown models

Reinforcement learning is commonly associated with training of reward-maximizing (or cost-minimizing) agents, in other words, controllers. It can be applied in model-free or model-based fashion, using a priori or online collected system data to train involved parametric architectures. In general, online reinforcement learning does not guarantee closed loop stability unless special measures are taken, for instance, through learning constraints or tailored training rules. Particularly promising are hybrids of reinforcement learning with "classical" control approaches. In this work, we suggest a method to guarantee practical stability of the system-controller closed loop in a purely online learning setting, i.e., without offline training. Moreover, we assume only partial knowledge of the system model. To achieve the claimed results, we employ techniques of classical adaptive control. The implementation of the overall control scheme is provided explicitly in a digital, sampled setting. That is, the controller receives the state of the system and computes the control action at discrete, specifically, equidistant moments in time. The method is tested in adaptive traction control and cruise control where it proved to significantly reduce the cost.

preprint2021arXiv

On inf-convolution-based robust practical stabilization under computational uncertainty

This work is concerned with practical stabilization of nonlinear systems by means of inf-convolution-based sample-and-hold control. It is a fairly general stabilization technique based on a generic non-smooth control Lyapunov function (CLF) and robust to actuator uncertainty, measurement noise, etc. The stabilization technique itself involves computation of descent directions of the CLF. It turns out that non-exact realization of this computation leads not just to a quantitative, but also qualitative obstruction in the sense that the result of the computation might fail to be a descent direction altogether and there is also no straightforward way to relate it to a descent direction. Disturbance, primarily measurement noise, complicate the described issue even more. This work suggests a modified inf-convolution-based control that is robust w. r. t. system and measurement noise, as well as computational uncertainty. The assumptions on the CLF are mild, as, e. g., any piece-wise smooth function, which often results from a numerical LF/CLF construction, satisfies them. A computational study with a three-wheel robot with dynamical steering and throttle under various tolerances w. r. t. computational uncertainty demonstrates the relevance of the addressed issue and the necessity of modifying the used stabilization technique. Similar analyses may be extended to other methods which involve optimization, such as Dini aiming or steepest descent.

preprint2020arXiv

A reinforcement learning method with closed-loop stability guarantee

Reinforcement learning (RL) in the context of control systems offers wide possibilities of controller adaptation. Given an infinite-horizon cost function, the so-called critic of RL approximates it with a neural net and sends this information to the controller (called "actor"). However, the issue of closed-loop stability under an RL-method is still not fully addressed. Since the critic delivers merely an approximation to the value function of the corresponding infinite-horizon problem, no guarantee can be given in general as to whether the actor's actions stabilize the system. Different approaches to this issue exist. The current work offers a particular one, which, starting with a (not necessarily smooth) control Lyapunov function (CLF), derives an online RL-scheme in such a way that practical semi-global stability property of the closed-loop can be established. The approach logically continues the work of the authors on parameterized controllers and Lyapunov-like constraints for RL, whereas the CLF now appears merely in one of the constraints of the control scheme. The analysis of the closed-loop behavior is done in a sample-and-hold (SH) manner thus offering a certain insight into the digital realization. The case study with a non-holonomic integrator shows the capabilities of the derived method to optimize the given cost function compared to a nominal stabilizing controller.

preprint2020arXiv

Model predictive control with stage cost shaping inspired by reinforcement learning

This work presents a suboptimality study of a particular model predictive control with a stage cost shaping based on the ideas of reinforcement learning. The focus of the suboptimality study is to derive quantities relating the infinite-horizon cost function under the said variant of model predictive control to the respective infinite-horizon value function. The basis control scheme involves usual stabilizing constraints comprising of a terminal set and a terminal cost in the form of a local Lyapunov function. The stage cost is adapted using the principles of Q-learning, a particular approach to reinforcement learning. The work is concluded by case studies with two systems for wide ranges of initial conditions.

preprint2020arXiv

Nonsmooth stabilization and its computational aspects

This work has the goal of briefly surveying some key stabilization techniques for general nonlinear systems, for which, as it is well known, a smooth control Lyapunov function may fail to exist. A general overview of the situation with smooth and nonsmooth stabilization is provided, followed by a concise summary of basic tools and techniques, including general stabilization, sliding-mode control and nonsmooth backstepping. Their presentation is accompanied with examples. The survey is concluded with some remarks on computational aspects related to determination of sampling times and control actions.

preprint2020arXiv

Stacked adaptive dynamic programming with unknown system model

Adaptive dynamic programming is a collective term for a variety of approaches to infinite-horizon optimal control. Common to all approaches is approximation of the infinite-horizon cost function based on dynamic programming philosophy. Typically, they also require knowledge of a dynamical model of the system. In the current work, application of adaptive dynamic programming to a system whose dynamical model is unknown to the controller is addressed. In order to realize the control algorithm, a model of the system dynamics is estimated with a Kalman filter. A stacked control scheme to boost the controller performance is suggested. The functioning of the new approach was verified in simulation and compared to the baseline represented by gradient descent on the running cost.

preprint2016arXiv

A note on Brehm's extension theorem

Brehm's extension theorem states that a non-expansive map on a finite subset of a Euclidean space can be extended to a piecewise-linear map on the entire space. In this note, it is verified that the proof of the theorem is constructive provided that the finite subset consists of points with rational coordinates. Additionally, the initial non-expansive map needs to send points with rational coordinates to points with rational coordinates. The two-dimensional case is considered.

preprint2016arXiv

A note on constructive treatment of eigenvectors

The eigenvalue problem plays a central role in linear algebra and its applications in control and optimization methods. In particular, many matrix decompositions rely upon computation of eigenvalue-eigenvector pairs, such as diagonal or Jordan normal forms. Unfortunately, numerical algorithms computing eigenvectors are prone to errors. Due to uncomputability of eigenpairs, perturbation theory and various regularization techniques only help if the matrix at hand possesses certain properties such as the absence of non-zero singular values, or the presence of a distinguishable gap between the large and small singular values. Posing such a requirement might be restrictive in some practical applications. In this note, we propose an alternative treatment of eigenvectors which is approximate and constructive. In comparison to classical eigenvectors whose computation is often prone to numerical instability, a constructive treatment allows addressing the computational uncertainty in a controlled way.

Pavel Osinenko

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Some remarks on stochastic converse Lyapunov theorems

A note on stabilizing reinforcement learning

A stabilizing reinforcement learning approach for sampled systems with partially unknown models

On inf-convolution-based robust practical stabilization under computational uncertainty

A reinforcement learning method with closed-loop stability guarantee

Model predictive control with stage cost shaping inspired by reinforcement learning

Nonsmooth stabilization and its computational aspects

Stacked adaptive dynamic programming with unknown system model

A note on Brehm's extension theorem

A note on constructive treatment of eigenvectors