Researcher profile

Yibiao Zhao

Yibiao Zhao contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2022arXiv

Energy-Based Continuous Inverse Optimal Control

The problem of continuous inverse optimal control (over finite time horizon) is to learn the unknown cost function over the sequence of continuous control variables from expert demonstrations. In this article, we study this fundamental problem in the framework of energy-based model, where the observed expert trajectories are assumed to be random samples from a probability density function defined as the exponential of the negative cost function up to a normalizing constant. The parameters of the cost function are learned by maximum likelihood via an "analysis by synthesis" scheme, which iterates (1) synthesis step: sample the synthesized trajectories from the current probability density using the Langevin dynamics via back-propagation through time, and (2) analysis step: update the model parameters based on the statistical difference between the synthesized trajectories and the observed trajectories. Given the fact that an efficient optimization algorithm is usually available for an optimal control problem, we also consider a convenient approximation of the above learning method, where we replace the sampling in the synthesis step by optimization. Moreover, to make the sampling or optimization more efficient, we propose to train the energy-based model simultaneously with a top-down trajectory generator via cooperative learning, where the trajectory generator is used to fast initialize the synthesis step of the energy-based model. We demonstrate the proposed methods on autonomous driving tasks, and show that they can learn suitable cost functions for optimal control.

preprint2019arXiv

PODDP: Partially Observable Differential Dynamic Programming for Latent Belief Space Planning

Autonomous agents are limited in their ability to observe the world state. Partially observable Markov decision processes (POMDPs) formally model the problem of planning under world state uncertainty, but POMDPs with continuous actions and nonlinear dynamics suitable for robotics applications are challenging to solve. In this paper, we present an efficient differential dynamic programming (DDP) algorithm for belief space planning in POMDPs with uncertainty over a discrete latent state, and continuous states, actions, observations, and nonlinear dynamics. This representation allows planning of dynamic trajectories which are sensitive to structured uncertainty over discrete latent world states. We develop dynamic programming techniques to optimize a contingency plan over a tree of possible observations and belief space trajectories, and also derive a hierarchical version of the algorithm. Our method is applicable to problems with uncertainty over the cost or reward function (e.g., the configuration of goals or obstacles), uncertainty over the dynamics (e.g., the dynamical mode of a hybrid system), and uncertainty about interactions, where other agents' behavior is conditioned on latent intentions. Benchmarks show that our algorithm outperforms popular heuristic approaches to planning under uncertainty, and results from an autonomous lane changing task demonstrate that our algorithm can synthesize robust interactive trajectories.