Researcher profile

Huizhen Yu

Huizhen Yu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
4topics
1close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2020arXiv

On the Minimum Pair Approach for Average-Cost Markov Decision Processes with Countable Discrete Action Spaces and Strictly Unbounded Costs

We consider average-cost Markov decision processes (MDPs) with Borel state spaces, countable, discrete action spaces, and strictly unbounded one-stage costs. For the minimum pair approach, we introduce a new majorization condition on the state transition stochastic kernel, in place of the commonly required continuity conditions on the MDP model. We combine this majorization condition with Lusin's theorem to prove the existence of a stationary minimum pair, i.e., a stationary policy paired with an invariant probability measure induced on the state space, with the property that the pair attains the minimum long-run average cost over all policies and initial distributions. We also establish other optimality properties of a stationary minimum pair, and for the stationary policy in such a pair, under additional recurrence or regularity conditions, we prove its pathwise optimality and strong optimality. Our results can be applied to a class of countable action space MDPs in which the dynamics and one-stage costs are discontinuous with respect to the state variable.

preprint2012arXiv

A Function Approximation Approach to Estimation of Policy Gradient for POMDP with Structured Policies

We consider the estimation of the policy gradient in partially observable Markov decision processes (POMDP) with a special class of structured policies that are finite-state controllers. We show that the gradient estimation can be done in the Actor-Critic framework, by making the critic compute a "value" function that does not depend on the states of POMDP. This function is the conditional mean of the true value function that depends on the states. We show that the critic can be implemented using temporal difference (TD) methods with linear function approximations, and the analytical results on TD and Actor-Critic can be transfered to this case. Although Actor-Critic algorithms have been used extensively in Markov decision processes (MDP), up to now they have not been proposed for POMDP as an alternative to the earlier proposal GPOMDP algorithm, an actor-only method. Furthermore, we show that the same idea applies to semi-Markov problems with a subset of finite-state controllers.

preprint2012arXiv

Discretized Approximations for POMDP with Average Cost

In this paper, we propose a new lower approximation scheme for POMDP with discounted and average cost criterion. The approximating functions are determined by their values at a finite number of belief points, and can be computed efficiently using value iteration algorithms for finite-state MDP. While for discounted problems several lower approximation schemes have been proposed earlier, ours seems the first of its kind for average cost problems. We focus primarily on the average cost case, and we show that the corresponding approximation can be computed efficiently using multi-chain algorithms for finite-state MDP. We give a preliminary analysis showing that regardless of the existence of the optimal average cost J in the POMDP, the approximation obtained is a lower bound of the liminf optimal average cost function, and can also be used to calculate an upper bound on the limsup optimal average cost function, as well as bounds on the cost of executing the stationary policy associated with the approximation. Weshow the convergence of the cost approximation, when the optimal average cost is constant and the optimal differential cost is continuous.