Source author record

Manan Gandhi

Manan Gandhi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.SY Systems and Control math.OC Machine Learning

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Robustifying Reinforcement Learning Policies with $\mathcal{L}_1$ Adaptive Control

A reinforcement learning (RL) policy trained in a nominal environment could fail in a new/perturbed environment due to the existence of dynamic variations. Existing robust methods try to obtain a fixed policy for all envisioned dynamic variation scenarios through robust or adversarial training. These methods could lead to conservative performance due to emphasis on the worst case, and often involve tedious modifications to the training environment. We propose an approach to robustifying a pre-trained non-robust RL policy with $\mathcal{L}_1$ adaptive control. Leveraging the capability of an $\mathcal{L}_1$ control law in the fast estimation of and active compensation for dynamic variations, our approach can significantly improve the robustness of an RL policy trained in a standard (i.e., non-robust) way, either in a simulator or in the real world. Numerical experiments are provided to validate the efficacy of the proposed approach.

preprint2022arXiv

Safety in Augmented Importance Sampling: Performance Bounds for Robust MPPI

This work explores the nature of augmented importance sampling in safety-constrained model predictive control problems. When operating in a constrained environment, sampling based model predictive control and motion planning typically utilizes penalty functions or expensive optimization based control barrier algorithms to maintain feasibility of forward sampling. In contrast the presented algorithm utilizes discrete embedded barrier states in augmented importance sampling to apply feedback with respect to a nominal state when sampling. We will demonstrate that this approach of safety of discrete embedded barrier states in augmented importance sampling is more sample efficient by metric of collision free trajectories, is computationally feasible to perform per sample, and results in better safety performance on a cluttered navigation task with extreme un-modeled disturbances. In addition, we will utilize the theoretical properties of augmented importance sampling and safety control to derive a new bound on the free energy of the system.

preprint2021arXiv

Robust Model Predictive Path Integral Control: Analysis and Performance Guarantees

In this paper we propose a novel decision making architecture for Robust Model Predictive Path Integral control (RMPPI) and investigate its performance guarantees and applicability to off-road navigation. Key building blocks of the proposed architecture are an augmented state space representation of the system consisting of nominal and actual dynamics, a placeholder for different types of tracking controllers, a safety logic for nominal state propagation, and an importance sampling scheme that takes into account the capabilities of the underlying tracking control. Using these ingredients, we derive a bound on the free energy growth of the dynamical system which is a function of task constraint satisfaction level, the performance of the underlying tracking controller, and the sampling error of the stochastic optimization used within RMPPI. To validate the bound on free energy growth, we perform experiments in simulation using two types of tracking controllers, namely the iterative Linear Quadratic Gaussian and Contraction-Metric based control. We further demonstrate the applicability of RMPPI in real hardware using the GT AutoRally vehicle. Our experiments demonstrate that RMPPI outperforms MPPI and Tube-MPPI by alleviating issues of the aforementioned model predictive controllers related to either lack of robustness or excessive conservatism. RMPPI provides the best of the two worlds in terms of agility and robustness to disturbances.

preprint2015arXiv

Trajectory Optimization Algorithm Studies

In complex engineered systems, completing an objective is sometimes not enough. The system must be able to reach a set performance characteristic, such as an unmanned aerial vehicle flying from point A to point B, \textit{under 10 seconds}. This introduces the notion of optimality, what is the most efficient, the fastest, the cheapest way to complete a task. This report explores the two pillars of optimal control, Bellman's Dynamic Programming and Pontryagin's Maximum Principle, and compares implementations of both theories onto simulated systems. Dynamic Programming is realized through a Differential Dynamic Programming Algorithm, where utilizes a forward-backward pass to iteratively optimize a control sequence and trajectory. The Maximum Principle is realized via Gauss Pseudospectral Optimal Control, where the optimal control problem is first approximated through polynomial basis functions, then solved, with optimality being achieved through the costate equations of the Maximum Principle. The results of the report show that, for short time Horizons, DDP can optimize quickly and can generate a trajectory that utilizes less control effort for the same problem formulation. On the other hand Pseudospectral methods can optimize faster for longer time horizons, but require a key understanding of the problem structure. Future work involves completing an implementation of DDP in a C++ code, and testing the speed of convergence for both methods, as well as extended the Pseudospectral Optimal Control framework in to the world of stochastic optimal control.