Source author record

Daniel R. Jiang

Daniel R. Jiang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning math.NA Numerical Analysis

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Efficient Nonmyopic Bayesian Optimization via One-Shot Multi-Step Trees

Bayesian optimization is a sequential decision making framework for optimizing expensive-to-evaluate black-box functions. Computing a full lookahead policy amounts to solving a highly intractable stochastic dynamic program. Myopic approaches, such as expected improvement, are often adopted in practice, but they ignore the long-term impact of the immediate decision. Existing nonmyopic approaches are mostly heuristic and/or computationally expensive. In this paper, we provide the first efficient implementation of general multi-step lookahead Bayesian optimization, formulated as a sequence of nested optimization problems within a multi-step scenario tree. Instead of solving these problems in a nested way, we equivalently optimize all decision variables in the full tree jointly, in a ``one-shot'' fashion. Combining this with an efficient method for implementing multi-step Gaussian process ``fantasization,'' we demonstrate that multi-step expected improvement is computationally tractable and exhibits performance superior to existing methods on a wide range of benchmarks.

preprint2020arXiv

Lookahead-Bounded Q-Learning

We introduce the lookahead-bounded Q-learning (LBQL) algorithm, a new, provably convergent variant of Q-learning that seeks to improve the performance of standard Q-learning in stochastic environments through the use of ``lookahead'' upper and lower bounds. To do this, LBQL employs previously collected experience and each iteration's state-action values as dual feasible penalties to construct a sequence of sampled information relaxation problems. The solutions to these problems provide estimated upper and lower bounds on the optimal value, which we track via stochastic approximation. These quantities are then used to constrain the iterates to stay within the bounds at every iteration. Numerical experiments on benchmark problems show that LBQL exhibits faster convergence and more robustness to hyperparameters when compared to standard Q-learning and several related techniques. Our approach is particularly appealing in problems that require expensive simulations or real-world interactions.

preprint2015arXiv

An Approximate Dynamic Programming Algorithm for Monotone Value Functions

Many sequential decision problems can be formulated as Markov Decision Processes (MDPs) where the optimal value function (or cost-to-go function) can be shown to satisfy a monotone structure in some or all of its dimensions. When the state space becomes large, traditional techniques, such as the backward dynamic programming algorithm (i.e., backward induction or value iteration), may no longer be effective in finding a solution within a reasonable time frame, and thus we are forced to consider other approaches, such as approximate dynamic programming (ADP). We propose a provably convergent ADP algorithm called Monotone-ADP that exploits the monotonicity of the value functions in order to increase the rate of convergence. In this paper, we describe a general finite-horizon problem setting where the optimal value function is monotone, present a convergence proof for Monotone-ADP under various technical assumptions, and show numerical results for three application domains: optimal stopping, energy storage/allocation, and glycemic control for diabetes patients. The empirical results indicate that by taking advantage of monotonicity, we can attain high quality solutions within a relatively small number of iterations, using up to two orders of magnitude less computation than is needed to compute the optimal solution exactly.

preprint2015arXiv

Optimal Hour-Ahead Bidding in the Real-Time Electricity Market with Battery Storage using Approximate Dynamic Programming

There is growing interest in the use of grid-level storage to smooth variations in supply that are likely to arise with increased use of wind and solar energy. Energy arbitrage, the process of buying, storing, and selling electricity to exploit variations in electricity spot prices, is becoming an important way of paying for expensive investments into grid-level storage. Independent system operators such as the NYISO (New York Independent System Operator) require that battery storage operators place bids into an hour-ahead market (although settlements may occur in increments as small as 5 minutes, which is considered near "real-time"). The operator has to place these bids without knowing the energy level in the battery at the beginning of the hour, while simultaneously accounting for the value of leftover energy at the end of the hour. The problem is formulated as a dynamic program. We describe and employ a convergent approximate dynamic programming (ADP) algorithm that exploits monotonicity of the value function to find a revenue-generating bidding policy; using optimal benchmarks, we empirically show the computational benefits of the algorithm. Furthermore, we propose a distribution-free variant of the ADP algorithm that does not require any knowledge of the distribution of the price process (and makes no assumptions regarding a specific real-time price model). We demonstrate that a policy trained on historical real-time price data from the NYISO using this distribution-free approach is indeed effective.

Daniel R. Jiang

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Efficient Nonmyopic Bayesian Optimization via One-Shot Multi-Step Trees

Lookahead-Bounded Q-Learning

An Approximate Dynamic Programming Algorithm for Monotone Value Functions

Optimal Hour-Ahead Bidding in the Real-Time Electricity Market with Battery Storage using Approximate Dynamic Programming