Researcher profile

Pan Zhao

Pan Zhao contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2023arXiv

Efficient and robust transfer learning of optimal individualized treatment regimes with right-censored survival data

An individualized treatment regime (ITR) is a decision rule that assigns treatments based on patients' characteristics. The value function of an ITR is the expected outcome in a counterfactual world had this ITR been implemented. Recently, there has been increasing interest in combining heterogeneous data sources, such as leveraging the complementary features of randomized controlled trial (RCT) data and a large observational study (OS). Usually, a covariate shift exists between the source and target population, rendering the source-optimal ITR unnecessarily optimal for the target population. We present an efficient and robust transfer learning framework for estimating the optimal ITR with right-censored survival data that generalizes well to the target population. The value function accommodates a broad class of functionals of survival distributions, including survival probabilities and restrictive mean survival times (RMSTs). We propose a doubly robust estimator of the value function, and the optimal ITR is learned by maximizing the value function within a pre-specified class of ITRs. We establish the $N^{-1/3}$ rate of convergence for the estimated parameter indexing the optimal ITR, and show that the proposed optimal value estimator is consistent and asymptotically normal even with flexible machine learning methods for nuisance parameter estimation. We evaluate the empirical performance of the proposed method by simulation studies and a real data application of sodium bicarbonate therapy for patients with severe metabolic acidaemia in the intensive care unit (ICU), combining a RCT and an observational study with heterogeneity.

preprint2022arXiv

$\mathcal{L}_1$ Adaptive Control with Switched Reference Models: Application to Learn-to-Fly

Learn-to-Fly (L2F) is a new framework that aims to replace the traditional iterative development paradigm for aerial vehicles with a combination of real-time aerodynamic modeling, guidance, and learning control. To ensure safe learning of the vehicle dynamics on the fly, this paper presents an $\mathcal{L}_1$ adaptive control ($\mathcal{L}_1$AC) based scheme, which actively estimates and compensates for the discrepancy between the intermediately learned dynamics and the actual dynamics. First, to incorporate the periodic update of the learned model within the L2F framework, this paper extends the $\mathcal{L}_1$AC architecture to handle a switched reference system subject to unknown time-varying parameters and disturbances. The paper also includes an analysis of both transient and steady-state performance of the $\mathcal{L}_1$AC architecture in the presence of non-zero initialization error for the state predictor. Second, the paper presents how the proposed $\mathcal{L}_1$AC scheme is integrated into the L2F framework, including its interaction with the baseline controller and the real-time modeling module. Finally, flight tests on an unmanned aerial vehicle (UAV) validate the efficacy of the proposed control and learning scheme.

preprint2022arXiv

CVFNet: Real-time 3D Object Detection by Learning Cross View Features

In recent years 3D object detection from LiDAR point clouds has made great progress thanks to the development of deep learning technologies. Although voxel or point based methods are popular in 3D object detection, they usually involve time-consuming operations such as 3D convolutions on voxels or ball query among points, making the resulting network inappropriate for time critical applications. On the other hand, 2D view-based methods feature high computing efficiency while usually obtaining inferior performance than the voxel or point based methods. In this work, we present a real-time view-based single stage 3D object detector, namely CVFNet to fulfill this task. To strengthen the cross-view feature learning under the condition of demanding efficiency, our framework extracts the features of different views and fuses them in an efficient progressive way. We first propose a novel Point-Range feature fusion module that deeply integrates point and range view features in multiple stages. Then, a special Slice Pillar is designed to well maintain the 3D geometry when transforming the obtained deep point-view features into bird's eye view. To better balance the ratio of samples, a sparse pillar detection head is presented to focus the detection on the nonempty grids. We conduct experiments on the popular KITTI and NuScenes benchmark, and state-of-the-art performances are achieved in terms of both accuracy and speed.

preprint2022arXiv

Integrated Adaptive Control and Reference Governors for Constrained Systems with State-Dependent Uncertainties

This paper presents an adaptive reference governor (RG) framework for a linear system with matched nonlinear uncertainties that can depend on both time and states, subject to both state and input constraints. The proposed framework leverages an L1 adaptive controller (L1AC) that estimates and compensates for the uncertainties, and provides guaranteed transient performance, in terms of uniform bounds on the error between actual states and inputs and those of a nominal (i.e., uncertainty-free) system. The uniform performance bounds provided by the L1AC are used to tighten the pre-specified state and control constraints. A reference governor is then designed for the nominal system using the tightened constraints, and guarantees robust constraint satisfaction. Moreover, the conservatism introduced by the constraint tightening can be systematically reduced by tuning some parameters within the L1AC. Compared with existing solutions, the proposed adaptive RG framework can potentially yield less conservative results for constraint enforcement due to the removal of uncertainty propagation along a prediction horizon, and improved tracking performance due to the inherent uncertainty compensation mechanism. Simulation results for a flight control example illustrate the efficacy of the proposed framework.

preprint2022arXiv

Optimizing Nitrogen Management with Deep Reinforcement Learning and Crop Simulations

Nitrogen (N) management is critical to sustain soil fertility and crop production while minimizing the negative environmental impact, but is challenging to optimize. This paper proposes an intelligent N management system using deep reinforcement learning (RL) and crop simulations with Decision Support System for Agrotechnology Transfer (DSSAT). We first formulate the N management problem as an RL problem. We then train management policies with deep Q-network and soft actor-critic algorithms, and the Gym-DSSAT interface that allows for daily interactions between the simulated crop environment and RL agents. According to the experiments on the maize crop in both Iowa and Florida in the US, our RL-trained policies outperform previous empirical methods by achieving higher or similar yield while using less fertilizers

preprint2022arXiv

Robustifying Reinforcement Learning Policies with $\mathcal{L}_1$ Adaptive Control

A reinforcement learning (RL) policy trained in a nominal environment could fail in a new/perturbed environment due to the existence of dynamic variations. Existing robust methods try to obtain a fixed policy for all envisioned dynamic variation scenarios through robust or adversarial training. These methods could lead to conservative performance due to emphasis on the worst case, and often involve tedious modifications to the training environment. We propose an approach to robustifying a pre-trained non-robust RL policy with $\mathcal{L}_1$ adaptive control. Leveraging the capability of an $\mathcal{L}_1$ control law in the fast estimation of and active compensation for dynamic variations, our approach can significantly improve the robustness of an RL policy trained in a standard (i.e., non-robust) way, either in a simulator or in the real world. Numerical experiments are provided to validate the efficacy of the proposed approach.

preprint2021arXiv

$\mathcal{L}_1$ Adaptive Control for Switching Reference Systems: Application to Flight Control

This paper presents a framework for the design and analysis of an $\mathcal{L}_1$ adaptive controller with a switching reference system. The use of a switching reference system allows the desired behavior to be scheduled across the operating envelope, which is often required in aerospace applications. The analysis uses a switched reference system that assumes perfect knowledge of uncertainties and uses a corresponding non-adaptive controller. Provided that this switched reference system is stable, it is shown that the closed-loop system with unknown parameters and disturbances and the $\mathcal{L}_1$ adaptive controller can behave arbitrarily close to this reference system. Simulations of the short period dynamics of a transport class aircraft during the approach phase illustrate the theoretical results.

preprint2020arXiv

$\mathcal{L}_1$-$\mathcal{GP}$: $\mathcal{L}_1$ Adaptive Control with Bayesian Learning

We present $\mathcal{L}_1$-$\mathcal{GP}$, an architecture based on $\mathcal{L}_1$ adaptive control and Gaussian Process Regression (GPR) for safe simultaneous control and learning. On one hand, the $\mathcal{L}_1$ adaptive control provides stability and transient performance guarantees, which allows for GPR to efficiently and safely learn the uncertain dynamics. On the other hand, the learned dynamics can be conveniently incorporated into the $\mathcal{L}_1$ control architecture without sacrificing robustness and tracking performance. Subsequently, the learned dynamics can lead to less conservative designs for performance/robustness tradeoff. We illustrate the efficacy of the proposed architecture via numerical simulations.

preprint2020arXiv

Novel Stealthy Attack and Defense Strategies for Networked Control Systems

This paper studies novel attack and defense strategies, based on a class of stealthy attacks, namely the zero-dynamics attack (ZDA), for multi-agent control systems. ZDA poses a formidable security challenge since its attack signal is hidden in the null-space of the state-space representation of the control system and hence it can evade conventional detection methods. An intuitive defense strategy builds on changing the aforementioned representation via switching through a set of carefully crafted topologies. In this paper, we propose realistic ZDA variations where the attacker is aware of this topology-switching strategy, and hence employs the following policies to avoid detection: (i) pause, update and resume ZDA according to the knowledge of switching topologies; (ii) cooperate with a concurrent stealthy topology attack that alters network topology at switching times, such that the original ZDA is feasible under the corrupted topology. We first systematically study the proposed ZDA variations, and then develop defense strategies against them under the realistic assumption that the defender has no knowledge of attack starting, pausing, and resuming times and the number of misbehaving agents. Particularly, we characterize conditions for detectability of the proposed ZDA variations, in terms of the network topologies to be maintained, the set of agents to be monitored, and the measurements of the monitored agents that should be extracted, while simultaneously preserving the privacy of the states of the non-monitored agents. We then propose an attack detection algorithm based on the Luenberger observer, using the characterized detectability conditions. We provide numerical simulation results to demonstrate our theoretical findings.