Researcher profile

Nikolay Atanasov

Nikolay Atanasov contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
25works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

25 published item(s)

preprint2026arXiv

Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions

Establishing stability certificates for closed-loop systems under reinforcement learning (RL) policies is essential to move beyond empirical performance and offer guarantees of system behavior. Classical Lyapunov methods require a strict stepwise decrease in the Lyapunov function but such certificates are difficult to construct for learned policies. The RL value function is a natural candidate but it is not well understood how it can be adapted for this purpose. To gain intuition, we first study the linear quadratic regulator (LQR) problem and make two key observations. First, a Lyapunov function can be obtained from the value function of an LQR policy by augmenting it with a residual term related to the system dynamics and stage cost. Second, the classical Lyapunov decrease requirement can be relaxed to a generalized Lyapunov condition requiring only decrease on average over multiple time steps. Using this intuition, we consider the nonlinear setting and formulate an approach to learn generalized Lyapunov functions by augmenting RL value functions with neural network residual terms. Our approach successfully certifies the stability of RL policies trained on Gymnasium and DeepMind Control benchmarks. We also extend our method to jointly train neural controllers and stability certificates using a multi-step Lyapunov loss, resulting in larger certified inner approximations of the region of attraction compared to the classical Lyapunov approach. Overall, our formulation enables stability certification for a broad class of systems with learned policies by making certificates easier to construct, thereby bridging classical control theory and modern learning-based methods.

preprint2023arXiv

Feasibility Analysis and Regularity Characterization of Distributionally Robust Safe Stabilizing Controllers

This paper studies the well-posedness and regularity of safe stabilizing optimization-based controllers for control-affine systems in the presence of model uncertainty. When the system dynamics contain unknown parameters, a finite set of samples can be used to formulate distributionally robust versions of control barrier function and control Lyapunov function constraints. Control synthesis with such distributionally robust constraints can be achieved by solving a (convex) second-order cone program (SOCP). We provide one necessary and two sufficient conditions to check the feasibility of such optimization problems, characterize their computational complexity and numerically show that they are significantly faster to check than direct use of SOCP solvers. Finally, we also analyze the regularity of the resulting control laws.

preprint2022arXiv

Active Mapping via Gradient Ascent Optimization of Shannon Mutual Information over Continuous SE(3) Trajectories

The problem of active mapping aims to plan an informative sequence of sensing views given a limited budget such as distance traveled. This paper consider active occupancy grid mapping using a range sensor, such as LiDAR or depth camera. State-of-the-art methods optimize information-theoretic measures relating the occupancy grid probabilities with the range sensor measurements. The non-smooth nature of ray-tracing within a grid representation makes the objective function non-differentiable, forcing existing methods to search over a discrete space of candidate trajectories. This work proposes a differentiable approximation of the Shannon mutual information between a grid map and ray-based observations that enables gradient ascent optimization in the continuous space of SE(3) sensor poses. Our gradient-based formulation leads to more informative sensing trajectories, while avoiding occlusions and collisions. The proposed method is demonstrated in simulated and real-world experiments in 2-D and 3-D environments.

preprint2022arXiv

Adaptive Control of SE(3) Hamiltonian Dynamics with Learned Disturbance Features

Adaptive control is a critical component of reliable robot autonomy in rapidly changing operational conditions. Adaptive control designs benefit from a disturbance model, which is often unavailable in practice. This motivates the use of machine learning techniques to learn disturbance features from training data offline, which can subsequently be employed to compensate the disturbances online. This paper develops geometric adaptive control with a learned disturbance model for rigid-body systems, such as ground, aerial, and underwater vehicles, that satisfy Hamilton's equations of motion over the $SE(3)$ manifold. Our design consists of an \emph{offline disturbance model identification stage}, using a Hamiltonian-based neural ordinary differential equation (ODE) network trained from state-control trajectory data, and an \emph{online adaptive control stage}, estimating and compensating the disturbances based on geometric tracking errors. We demonstrate our adaptive geometric controller in trajectory tracking simulations of fully-actuated pendulum and under-actuated quadrotor systems.

preprint2022arXiv

Control Synthesis for Stability and Safety by Differential Complementarity Problem

This paper develops a novel control synthesis method for safe stabilization of control-affine systems as a Differential Complementarity Problem (DCP). Our design uses a control Lyapunov function (CLF) and a control barrier function (CBF) to define complementarity constraints in the DCP formulation to certify stability and safety, respectively. The CLF-CBF-DCP controller imposes stability as a soft constraint, which is automatically relaxed when the safety constraint is active, without the need for parameter tuning or optimization. We study the closed-loop system behavior with the CLF-CBF-DCP controller and identify conditions on the existence of local equilibria. Although in certain cases the controller yields undesirable local equilibria, those can be confined to a small subset of the safe set boundary by proper choice of the control parameters. Then, our method can avoid undesirable equilibria that CLF-CBF quadratic programming techniques encounter.

preprint2022arXiv

Latent Policies for Adversarial Imitation Learning

This paper considers learning robot locomotion and manipulation tasks from expert demonstrations. Generative adversarial imitation learning (GAIL) trains a discriminator that distinguishes expert from agent transitions, and in turn use a reward defined by the discriminator output to optimize a policy generator for the agent. This generative adversarial training approach is very powerful but depends on a delicate balance between the discriminator and the generator training. In high-dimensional problems, the discriminator training may easily overfit or exploit associations with task-irrelevant features for transition classification. A key insight of this work is that performing imitation learning in a suitable latent task space makes the training process stable, even in challenging high-dimensional problems. We use an action encoder-decoder model to obtain a low-dimensional latent action space and train a LAtent Policy using Adversarial imitation Learning (LAPAL). The encoder-decoder model can be trained offline from state-action pairs to obtain a task-agnostic latent action representation or online, simultaneously with the discriminator and generator training, to obtain a task-aware latent action representation. We demonstrate that LAPAL training is stable, with near-monotonic performance improvement, and achieves expert performance in most locomotion and manipulation tasks, while a GAIL baseline converges slower and does not achieve expert performance in high-dimensional environments.

preprint2022arXiv

Physics-guided Learning-based Adaptive Control on the SE(3) Manifold

In real-world robotics applications, accurate models of robot dynamics are critical for safe and stable control in rapidly changing operational conditions. This motivates the use of machine learning techniques to approximate robot dynamics and their disturbances over a training set of state-control trajectories. This paper demonstrates that inductive biases arising from physics laws can be used to improve the data efficiency and accuracy of the approximated dynamics model. For example, the dynamics of many robots, including ground, aerial, and underwater vehicles, are described using their $SE(3)$ pose and satisfy conservation of energy principles. We design a physically plausible model of the robot dynamics by imposing the structure of Hamilton's equations of motion in the design of a neural ordinary differential equation (ODE) network. The Hamiltonian structure guarantees satisfaction of $SE(3)$ kinematic constraints and energy conservation by construction. It also allows us to derive an energy-based adaptive controller that achieves trajectory tracking while compensating for disturbances. Our learning-based adaptive controller is verified on an under-actuated quadrotor robot.

preprint2022arXiv

Robust and Safe Autonomous Navigation for Systems with Learned SE(3) Hamiltonian Dynamics

Stability and safety are critical properties for successful deployment of automatic control systems. As a motivating example, consider autonomous mobile robot navigation in a complex environment. A control design that generalizes to different operational conditions requires a model of the system dynamics, robustness to modeling errors, and satisfaction of safety \NEWZL{constraints}, such as collision avoidance. This paper develops a neural ordinary differential equation network to learn the dynamics of a Hamiltonian system from trajectory data. The learned Hamiltonian model is used to synthesize an energy-shaping passivity-based controller and analyze its \emph{robustness} to uncertainty in the learned model and its \emph{safety} with respect to constraints imposed by the environment. Given a desired reference path for the system, we extend our design using a virtual reference governor to achieve tracking control. The governor state serves as a regulation point that moves along the reference path adaptively, balancing the system energy level, model uncertainty bounds, and distance to safety violation to guarantee robustness and safety. Our Hamiltonian dynamics learning and tracking control techniques are demonstrated on \Revised{simulated hexarotor and quadrotor robots} navigating in cluttered 3D environments.

preprint2022arXiv

Safe Autonomous Navigation for Systems with Learned SE(3) Hamiltonian Dynamics

Safe autonomous navigation in unknown environments is an important problem for mobile robots. This paper proposes techniques to learn the dynamics model of a mobile robot from trajectory data and synthesize a tracking controller with safety and stability guarantees. The state of a rigid-body robot usually contains its position, orientation, and generalized velocity and satisfies Hamilton's equations of motion. Instead of a hand-derived dynamics model, we use a dataset of state-control trajectories to train a translation-equivariant nonlinear Hamiltonian model represented as a neural ordinary differential equation (ODE) network. The learned Hamiltonian model is used to synthesize an energy-shaping passivity-based controller and derive conditions which guarantee safe regulation to a desired reference pose. We enable adaptive tracking of a desired path, subject to safety constraints obtained from obstacle distance measurements. The trade-off between the robot's energy and the distance to safety constraint violation is used to adaptively govern a reference pose along the desired path. Our safe adaptive controller is demonstrated on a simulated hexarotor robot navigating in an unknown environments.

preprint2021arXiv

Active Bayesian Multi-class Mapping from Range and Semantic Segmentation Observation

Many robot applications call for autonomous exploration and mapping of unknown and unstructured environments. Information-based exploration techniques, such as Cauchy-Schwarz quadratic mutual information (CSQMI) and fast Shannon mutual information (FSMI), have successfully achieved active binary occupancy mapping with range measurements. However, as we envision robots performing complex tasks specified with semantically meaningful objects, it is necessary to capture semantic categories in the measurements, map representation, and exploration objective. This work develops a Bayesian multi-class mapping algorithm utilizing range-category measurements. We derive a closed-form efficiently computable lower bound for the Shannon mutual information between the multi-class map and the measurements. The bound allows rapid evaluation of many potential robot trajectories for autonomous exploration and mapping. We compare our method against frontier-based and FSMI exploration and apply it in a 3-D photo-realistic simulation environment.

preprint2021arXiv

Coding for Distributed Multi-Agent Reinforcement Learning

This paper aims to mitigate straggler effects in synchronous distributed learning for multi-agent reinforcement learning (MARL) problems. Stragglers arise frequently in a distributed learning system, due to the existence of various system disturbances such as slow-downs or failures of compute nodes and communication bottlenecks. To resolve this issue, we propose a coded distributed learning framework, which speeds up the training of MARL algorithms in the presence of stragglers, while maintaining the same accuracy as the centralized approach. As an illustration, a coded distributed version of the multi-agent deep deterministic policy gradient(MADDPG) algorithm is developed and evaluated. Different coding schemes, including maximum distance separable (MDS)code, random sparse code, replication-based code, and regular low density parity check (LDPC) code are also investigated. Simulations in several multi-robot problems demonstrate the promising performance of the proposed framework.

preprint2021arXiv

Control Barriers in Bayesian Learning of System Dynamics

This paper focuses on learning a model of system dynamics online while satisfying safety constraints. Our objective is to avoid offline system identification or hand-specified models and allow a system to safely and autonomously estimate and adapt its own model during operation. Given streaming observations of the system state, we use Bayesian learning to obtain a distribution over the system dynamics. Specifically, we propose a new matrix variate Gaussian process (MVGP) regression approach with an efficient covariance factorization to learn the drift and input gain terms of a nonlinear control-affine system. The MVGP distribution is then used to optimize the system behavior and ensure safety with high probability, by specifying control Lyapunov function (CLF) and control barrier function (CBF) chance constraints. We show that a safe control policy can be synthesized for systems with arbitrary relative degree and probabilistic CLF-CBF constraints by solving a second order cone program (SOCP). Finally, we extend our design to a self-triggering formulation, adaptively determining the time at which a new control input needs to be applied in order to guarantee safety.

preprint2021arXiv

CORSAIR: Convolutional Object Retrieval and Symmetry-AIded Registration

This paper considers online object-level mapping using partial point-cloud observations obtained online in an unknown environment. We develop and approach for fully Convolutional Object Retrieval and Symmetry-AIded Registration (CORSAIR). Our model extends the Fully Convolutional Geometric Features model to learn a global object-shape embedding in addition to local point-wise features from the point-cloud observations. The global feature is used to retrieve a similar object from a category database, and the local features are used for robust pose registration between the observed and the retrieved object. Our formulation also leverages symmetries, present in the object shapes, to obtain promising local-feature pairs from different symmetry classes for matching. We present results from synthetic and real-world datasets with different object categories to verify the robustness of our method.

preprint2021arXiv

ELLIPSDF: Joint Object Pose and Shape Optimization with a Bi-level Ellipsoid and Signed Distance Function Description

Autonomous systems need to understand the semantics and geometry of their surroundings in order to comprehend and safely execute object-level task specifications. This paper proposes an expressive yet compact model for joint object pose and shape optimization, and an associated optimization algorithm to infer an object-level map from multi-view RGB-D camera observations. The model is expressive because it captures the identities, positions, orientations, and shapes of objects in the environment. It is compact because it relies on a low-dimensional latent representation of implicit object shape, allowing onboard storage of large multi-category object maps. Different from other works that rely on a single object representation format, our approach has a bi-level object model that captures both the coarse level scale as well as the fine level shape details. Our approach is evaluated on the large-scale real-world ScanNet dataset and compared against state-of-the-art methods.

preprint2021arXiv

Fully Convolutional Geometric Features for Category-level Object Alignment

This paper focuses on pose registration of different object instances from the same category. This is required in online object mapping because object instances detected at test time usually differ from the training instances. Our approach transforms instances of the same category to a normalized canonical coordinate frame and uses metric learning to train fully convolutional geometric features. The resulting model is able to generate pairs of matching points between the instances, allowing category-level registration. Evaluation on both synthetic and real-world data shows that our method provides robust features, leading to accurate alignment of instances with different shapes.

preprint2021arXiv

Inverse reinforcement learning for autonomous navigation via differentiable semantic mapping and planning

This paper focuses on inverse reinforcement learning for autonomous navigation using distance and semantic category observations. The objective is to infer a cost function that explains demonstrated behavior while relying only on the expert's observations and state-control trajectory. We develop a map encoder, that infers semantic category probabilities from the observation sequence, and a cost encoder, defined as a deep neural network over the semantic features. Since the expert cost is not directly observable, the model parameters can only be optimized by differentiating the error between demonstrated controls and a control policy computed from the cost estimate. We propose a new model of expert behavior that enables error minimization using a closed-form subgradient computed only over a subset of promising states via a motion planning algorithm. Our approach allows generalizing the learned behavior to new environments with new spatial configurations of the semantic categories. We analyze the different components of our model in a minigrid environment. We also demonstrate that our approach learns to follow traffic rules in the autonomous driving CARLA simulator by relying on semantic observations of buildings, sidewalks, and road lanes.

preprint2021arXiv

Learning Barrier Functions with Memory for Robust Safe Navigation

Control barrier functions are widely used to enforce safety properties in robot motion planning and control. However, the problem of constructing barrier functions online and synthesizing safe controllers that can deal with the associated uncertainty has received little attention. This paper investigates safe navigation in unknown environments, using onboard range sensing to construct control barrier functions online. To represent different objects in the environment, we use the distance measurements to train neural network approximations of the signed distance functions incrementally with replay memory. This allows us to formulate a novel robust control barrier safety constraint which takes into account the error in the estimated distance fields and its gradient. Our formulation leads to a second-order cone program, enabling safe and stable control synthesis in a priori unknown environments.

preprint2021arXiv

Localization and Mapping using Instance-specific Mesh Models

This paper focuses on building semantic maps, containing object poses and shapes, using a monocular camera. This is an important problem because robots need rich understanding of geometry and context if they are to shape the future of transportation, construction, and agriculture. Our contribution is an instance-specific mesh model of object shape that can be optimized online based on semantic information extracted from camera images. Multi-view constraints on the object shape are obtained by detecting objects and extracting category-specific keypoints and segmentation masks. We show that the errors between projections of the mesh model and the observed keypoints and masks can be differentiated in order to obtain accurate instance-specific object shapes. We evaluate the performance of the proposed approach in simulation and on the KITTI dataset by building maps of car poses and shapes.

preprint2021arXiv

Mesh Reconstruction from Aerial Images for Outdoor Terrain Mapping Using Joint 2D-3D Learning

This paper addresses outdoor terrain mapping using overhead images obtained from an unmanned aerial vehicle. Dense depth estimation from aerial images during flight is challenging. While feature-based localization and mapping techniques can deliver real-time odometry and sparse points reconstruction, a dense environment model is generally recovered offline with significant computation and storage. This paper develops a joint 2D-3D learning approach to reconstruct local meshes at each camera keyframe, which can be assembled into a global environment model. Each local mesh is initialized from sparse depth measurements. We associate image features with the mesh vertices through camera projection and apply graph convolution to refine the mesh vertices based on joint 2-D reprojected depth and 3-D mesh supervision. Quantitative and qualitative evaluations using real aerial images show the potential of our method to support environmental monitoring and surveillance applications.

preprint2020arXiv

Autonomous Navigation in Unknown Environments using Sparse Kernel-based Occupancy Mapping

This paper focuses on real-time occupancy mapping and collision checking onboard an autonomous robot navigating in an unknown environment. We propose a new map representation, in which occupied and free space are separated by the decision boundary of a kernel perceptron classifier. We develop an online training algorithm that maintains a very sparse set of support vectors to represent obstacle boundaries in configuration space. We also derive conditions that allow complete (without sampling) collision-checking for piecewise-linear and piecewise-polynomial robot trajectories. We demonstrate the effectiveness of our mapping and collision checking algorithms for autonomous navigation of an Ackermann-drive robot in unknown environments.

preprint2020arXiv

Fast and Safe Path-Following Control using a State-Dependent Directional Metric

This paper considers the problem of fast and safe autonomous navigation in partially known environments. Our main contribution is a control policy design based on ellipsoidal trajectory bounds obtained from a quadratic state-dependent distance metric. The ellipsoidal bounds are used to embed directional preference in the control design, leading to system behavior that is adapted to the local environment geometry, carefully considering medial obstacles while paying less attention to lateral ones. We use a virtual reference governor system to adaptively follow a desired navigation path, slowing down when system safety may be violated and speeding up otherwise. The resulting controller is able to navigate complex environments faster than common Euclidean-norm and Lyapunov-function-based designs, while retaining stability and collision avoidance guarantees.

preprint2020arXiv

Learning Navigation Costs from Demonstration in Partially Observable Environments

This paper focuses on inverse reinforcement learning (IRL) to enable safe and efficient autonomous navigation in unknown partially observable environments. The objective is to infer a cost function that explains expert-demonstrated navigation behavior while relying only on the observations and state-control trajectory used by the expert. We develop a cost function representation composed of two parts: a probabilistic occupancy encoder, with recurrent dependence on the observation sequence, and a cost encoder, defined over the occupancy features. The representation parameters are optimized by differentiating the error between demonstrated controls and a control policy computed from the cost encoder. Such differentiation is typically computed by dynamic programming through the value function over the whole state space. We observe that this is inefficient in large partially observable environments because most states are unexplored. Instead, we rely on a closed-form subgradient of the cost-to-go obtained only over a subset of promising states via an efficient motion-planning algorithm such as A* or RRT. Our experiments show that our model exceeds the accuracy of baseline IRL algorithms in robot navigation tasks, while substantially improving the efficiency of training and test-time inference.

preprint2020arXiv

Learning Navigation Costs from Demonstration with Semantic Observations

This paper focuses on inverse reinforcement learning (IRL) for autonomous robot navigation using semantic observations. The objective is to infer a cost function that explains demonstrated behavior while relying only on the expert's observations and state-control trajectory. We develop a map encoder, which infers semantic class probabilities from the observation sequence, and a cost encoder, defined as deep neural network over the semantic features. Since the expert cost is not directly observable, the representation parameters can only be optimized by differentiating the error between demonstrated controls and a control policy computed from the cost estimate. The error is optimized using a closed-form subgradient computed only over a subset of promising states via a motion planning algorithm. We show that our approach learns to follow traffic rules in the autonomous driving CARLA simulator by relying on semantic observations of cars, sidewalks and road lanes.

preprint2020arXiv

Probabilistic Safety Constraints for Learned High Relative Degree System Dynamics

This paper focuses on learning a model of system dynamics online while satisfying safety constraints.Our motivation is to avoid offline system identification or hand-specified dynamics models and allowa system to safely and autonomously estimate and adapt its own model during online operation.Given streaming observations of the system state, we use Bayesian learning to obtain a distributionover the system dynamics. In turn, the distribution is used to optimize the system behavior andensure safety with high probability, by specifying a chance constraint over a control barrier function.

preprint2020arXiv

Safe Robot Navigation in Cluttered Environments using Invariant Ellipsoids and a Reference Governor

This paper considers the problem of safe autonomous navigation in unknown environments, relying on local obstacle sensing. We consider a control-affine nonlinear robot system subject to bounded input noise and rely on feedback linearization to determine ellipsoid output bounds on the closed-loop robot trajectory under stabilizing control. A virtual governor system is developed to adaptively track a desired navigation path, while relying on the robot trajectory bounds to slow down if safety is endangered and speed up otherwise. The main contribution is the derivation of theoretical guarantees for safe nonlinear system path-following control and its application to autonomous robot navigation in unknown environments.