Source author record

Yijie Peng

Yijie Peng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence math.NA math.OC math.PR Numerical Analysis

Catalog footprint

What is connected

3works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Quantile-Based Policy Optimization for Reinforcement Learning

Classical reinforcement learning (RL) aims to optimize the expected cumulative rewards. In this work, we consider the RL setting where the goal is to optimize the quantile of the cumulative rewards. We parameterize the policy controlling actions by neural networks and propose a novel policy gradient algorithm called Quantile-Based Policy Optimization (QPO) and its variant Quantile-Based Proximal Policy Optimization (QPPO) to solve deep RL problems with quantile objectives. QPO uses two coupled iterations running at different time scales for simultaneously estimating quantiles and policy parameters and is shown to converge to the global optimal policy under certain conditions. Our numerical results demonstrate that the proposed algorithms outperform the existing baseline algorithms under the quantile criterion.

preprint2021arXiv

Noise Optimization for Artificial Neural Networks

Adding noises to artificial neural network(ANN) has been shown to be able to improve robustness in previous work. In this work, we propose a new technique to compute the pathwise stochastic gradient estimate with respect to the standard deviation of the Gaussian noise added to each neuron of the ANN. By our proposed technique, the gradient estimate with respect to noise levels is a byproduct of the backpropagation algorithm for estimating gradient with respect to synaptic weights in ANN. Thus, the noise level for each neuron can be optimized simultaneously in the processing of training the synaptic weights at nearly no extra computational cost. In numerical experiments, our proposed method can achieve significant performance improvement on robustness of several popular ANN structures under both black box and white box attacks tested in various computer vision datasets.

preprint2020arXiv

Optimal Unbiased Estimation for Expected Cumulative Cost

We consider estimating an expected infinite-horizon cumulative discounted cost/reward contingent on an underlying stochastic process by Monte Carlo simulation. An unbiased estimator based on truncating the cumulative cost at a random horizon is proposed. Explicit forms for the optimal distributions of the random horizon are given, and explicit expressions for the optimal random truncation level are obtained, leading to a full analysis of the bias-variance tradeoff when comparing this new class of randomized estimators with traditional fixed truncation estimators. Moreover, we characterize when the optimal randomized estimator is preferred over a fixed truncation estimator by considering the tradeoff between bias and variance. This comparison provides guidance on when to choose randomized estimators over fixed truncation estimators in practice. Numerical experiments substantiate the theoretical results.