Source author record

Warren B. Powell

Warren B. Powell appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning Artificial Intelligence Systems and Control Applications Computation

Catalog footprint

What is connected

15works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

The Information-Collecting Vehicle Routing Problem: Stochastic Optimization for Emergency Storm Response

We address the problem of mitigating damage to a power grid following a storm by managing a vehicle that has to be routed while simultaneously performing two tasks: learning about damage from the grid (which requires direct observation) and repairing damage that it observes. The learning process is assisted by calls from customers notifying the utility that they have lost power (``lights-out calls''). However, when a tree falls and damages a line, it triggers the first upstream circuit breaker, which results in power outages for everyone on the grid below the circuit breaker. We present a dynamic routing model that captures observable state variables such as the location of the truck and the state of the grid on segments the truck has visited, and beliefs about outages on segments that have not been visited. Trucks are routed over a physical transportation network, but the pattern of outages is governed by the structure of the power grid. We introduce a form of Monte Carlo tree search based on information relaxation that we call {\it optimistic MCTS} which improves its application to problems with larger action spaces. We show that the method significantly outperforms standard escalation heuristics used in industry.}

preprint2022arXiv

Stochastic Search for a Parametric Cost Function Approximation: Energy storage with rolling forecasts

Rolling forecasts have been almost overlooked in the renewable energy storage literature. In this paper, we provide a new approach for handling uncertainty not just in the accuracy of a forecast, but in the evolution of forecasts over time. Our approach shifts the focus from modeling the uncertainty in a lookahead model to accurate simulations in a stochastic base model. We develop a robust policy for making energy storage decisions by creating a parametrically modified lookahead model, where the parameters are tuned in the stochastic base model. Since computing unbiased stochastic gradients with respect to the parameters require restrictive assumptions, we propose a simulation-based stochastic approximation algorithm based on numerical derivatives to optimize these parameters. While numerical derivatives, calculated based on the noisy function evaluations, provide biased gradient estimates, an online variance reduction technique built in the framework of our proposed algorithm, will enable us to control the accumulated bias errors and establish the finite-time rate of convergence of the algorithm. Our numerical experiments show the performance of this algorithm in finding policies outperforming the deterministic benchmark policy.

preprint2020arXiv

Backward Approximate Dynamic Programming with Hidden Semi-Markov Stochastic Models in Energy Storage Optimization

We consider an energy storage problem involving a wind farm with a forecasted power output, a stochastic load, an energy storage device, and a connection to the larger power grid with stochastic prices. Electricity prices and wind power forecast errors are modeled using a novel hidden semi-Markov model that accurately replicates not just the distribution of the errors, but also crossing times, capturing the amount of time each process stays above or below some benchmark such as the forecast. This is an important property of stochastic processes involved in storage problems. We show that we achieve more robust solutions using this model than when more common stochastic models are considered. The new model introduces some additional complexity to the problem as its information states are partially hidden, forming a partially observable Markov decision process. We derive a near-optimal time-dependent policy using backward approximate dynamic programming, which overcomes the computational hurdles of classical (exact) backward dynamic programming, with higher quality solutions than the more familiar forward approximate dynamic programming methods.

preprint2020arXiv

Reinforcement Learning via Parametric Cost Function Approximation for Multistage Stochastic Programming

The most common approaches for solving stochastic resource allocation problems in the research literature is to either use value functions ("dynamic programming") or scenario trees ("stochastic programming") to approximate the impact of a decision now on the future. By contrast, common industry practice is to use a deterministic approximation of the future which is easier to understand and solve, but which is criticized for ignoring uncertainty. We show that a parameterized version of a deterministic lookahead can be an effective way of handling uncertainty, while enjoying the computational simplicity of a deterministic lookahead. We present the parameterized lookahead model as a form of policy for solving a stochastic base model, which is used as the basis for optimizing the parameterized policy. This approach can handle complex, high-dimensional state variables, and avoids the usual approximations associated with scenario trees. We formalize this approach and demonstrate its use in the context of a complex, nonstationary energy storage problem.

preprint2020arXiv

Risk Directed Importance Sampling in Stochastic Dual Dynamic Programming with Hidden Markov Models for Grid Level Energy Storage

Power systems that need to integrate renewables at a large scale must account for the high levels of uncertainty introduced by these power sources. This can be accomplished with a system of many distributed grid-level storage devices. However, developing a cost-effective and robust control policy in this setting is a challenge due to the high dimensionality of the resource state and the highly volatile stochastic processes involved. We first model the problem using a carefully calibrated power grid model and a specialized hidden Markov stochastic model for wind power which replicates crossing times. We then base our control policy on a variant of stochastic dual dynamic programming, an algorithm well suited for certain high dimensional control problems, that is modified to accommodate hidden Markov uncertainty in the stochastics. However, the algorithm may be impractical to use as it exhibits relatively slow convergence. To accelerate the algorithm, we apply both quadratic regularization and a risk-directed importance sampling technique for sampling the outcome space at each time step in the backward pass of the algorithm. We show that the resulting policies are more robust than those developed using classical SDDP modeling assumptions and algorithms.

preprint2016arXiv

Optimal Learning for Stochastic Optimization with Nonlinear Parametric Belief Models

We consider the problem of estimating the expected value of information (the knowledge gradient) for Bayesian learning problems where the belief model is nonlinear in the parameters. Our goal is to maximize some metric, while simultaneously learning the unknown parameters of the nonlinear belief model, by guiding a sequential experimentation process which is expensive. We overcome the problem of computing the expected value of an experiment, which is computationally intractable, by using a sampled approximation, which helps to guide experiments but does not provide an accurate estimate of the unknown parameters. We then introduce a resampling process which allows the sampled model to adapt to new information, exploiting past experiments. We show theoretically that the method converges asymptotically to the true parameters, while simultaneously maximizing our metric. We show empirically that the process exhibits rapid convergence, yielding good results with a very small number of experiments.

preprint2016arXiv

SDDP vs. ADP: The Effect of Dimensionality in Multistage Stochastic Optimization for Grid Level Energy Storage

There has been widespread interest in the use of grid-level storage to handle the variability from increasing penetrations of wind and solar energy. This problem setting requires optimizing energy storage and release decisions for anywhere from a half-dozen, to potentially hundreds of storage devices spread around the grid as new technologies evolve. We approach this problem using two competing algorithmic strategies. The first, developed within the stochastic programming literature, is stochastic dual dynamic programming (SDDP) which uses Benders decomposition to create a multidimensional value function approximations, which have been widely used to manage hydro reservoirs. The second approach, which has evolved using the language of approximate dynamic programming, uses separable, piecewise linear value function approximations, a method which has been successfully applied to high-dimensional fleet management problems. This paper brings these two approaches together using a common notational system, and contrasts the algorithmic strategies (which are both a form of approximate dynamic programming) used by each approach. The methods are then subjected to rigorous testing using the context of optimizing grid level storage.

preprint2016arXiv

The Information-Collecting Vehicle Routing Problem: Stochastic Optimization for Emergency Storm Response

Utilities face the challenge of responding to power outages due to storms and ice damage, but most power grids are not equipped with sensors to pinpoint the precise location of the faults causing the outage. Instead, utilities have to depend primarily on phone calls (trouble calls) from customers who have lost power to guide the dispatching of utility trucks. In this paper, we develop a policy that routes a utility truck to restore outages in the power grid as quickly as possible, using phone calls to create beliefs about outages, but also using utility trucks as a mechanism for collecting additional information. This means that routing decisions change not only the physical state of the truck (as it moves from one location to another) and the grid (as the truck performs repairs), but also our belief about the network, creating the first stochastic vehicle routing problem that explicitly models information collection and belief modeling. We address the problem of managing a single utility truck, which we start by formulating as a sequential stochastic optimization model which captures our belief about the state of the grid. We propose a stochastic lookahead policy, and use Monte Carlo tree search (MCTS) to produce a practical policy that is asymptotically optimal. Simulation results show that the developed policy restores the power grid much faster compared to standard industry heuristics.

preprint2015arXiv

A Knowledge Gradient Policy for Sequencing Experiments to Identify the Structure of RNA Molecules Using a Sparse Additive Belief Model

We present a sparse knowledge gradient (SpKG) algorithm for adaptively selecting the targeted regions within a large RNA molecule to identify which regions are most amenable to interactions with other molecules. Experimentally, such regions can be inferred from fluorescence measurements obtained by binding a complementary probe with fluorescence markers to the targeted regions. We use a biophysical model which shows that the fluorescence ratio under the log scale has a sparse linear relationship with the coefficients describing the accessibility of each nucleotide, since not all sites are accessible (due to the folding of the molecule). The SpKG algorithm uniquely combines the Bayesian ranking and selection problem with the frequentist $\ell_1$ regularized regression approach Lasso. We use this algorithm to identify the sparsity pattern of the linear model as well as sequentially decide the best regions to test before experimental budget is exhausted. Besides, we also develop two other new algorithms: batch SpKG algorithm, which generates more suggestions sequentially to run parallel experiments; and batch SpKG with a procedure which we call length mutagenesis. It dynamically adds in new alternatives, in the form of types of probes, are created by inserting, deleting or mutating nucleotides within existing probes. In simulation, we demonstrate these algorithms on the Group I intron (a mid-size RNA molecule), showing that they efficiently learn the correct sparsity pattern, identify the most accessible region, and outperform several other policies.

preprint2015arXiv

An Approximate Dynamic Programming Algorithm for Monotone Value Functions

Many sequential decision problems can be formulated as Markov Decision Processes (MDPs) where the optimal value function (or cost-to-go function) can be shown to satisfy a monotone structure in some or all of its dimensions. When the state space becomes large, traditional techniques, such as the backward dynamic programming algorithm (i.e., backward induction or value iteration), may no longer be effective in finding a solution within a reasonable time frame, and thus we are forced to consider other approaches, such as approximate dynamic programming (ADP). We propose a provably convergent ADP algorithm called Monotone-ADP that exploits the monotonicity of the value functions in order to increase the rate of convergence. In this paper, we describe a general finite-horizon problem setting where the optimal value function is monotone, present a convergence proof for Monotone-ADP under various technical assumptions, and show numerical results for three application domains: optimal stopping, energy storage/allocation, and glycemic control for diabetes patients. The empirical results indicate that by taking advantage of monotonicity, we can attain high quality solutions within a relatively small number of iterations, using up to two orders of magnitude less computation than is needed to compute the optimal solution exactly.

preprint2015arXiv

Optimal Hour-Ahead Bidding in the Real-Time Electricity Market with Battery Storage using Approximate Dynamic Programming

There is growing interest in the use of grid-level storage to smooth variations in supply that are likely to arise with increased use of wind and solar energy. Energy arbitrage, the process of buying, storing, and selling electricity to exploit variations in electricity spot prices, is becoming an important way of paying for expensive investments into grid-level storage. Independent system operators such as the NYISO (New York Independent System Operator) require that battery storage operators place bids into an hour-ahead market (although settlements may occur in increments as small as 5 minutes, which is considered near "real-time"). The operator has to place these bids without knowing the energy level in the battery at the beginning of the hour, while simultaneously accounting for the value of leftover energy at the end of the hour. The problem is formulated as a dynamic program. We describe and employ a convergent approximate dynamic programming (ADP) algorithm that exploits monotonicity of the value function to find a revenue-generating bidding policy; using optimal benchmarks, we empirically show the computational benefits of the algorithm. Furthermore, we propose a distribution-free variant of the ADP algorithm that does not require any knowledge of the distribution of the price process (and makes no assumptions regarding a specific real-time price model). We demonstrate that a policy trained on historical real-time price data from the NYISO using this distribution-free approach is indeed effective.

preprint2014arXiv

A New Optimal Stepsize For Approximate Dynamic Programming

Approximate dynamic programming (ADP) has proven itself in a wide range of applications spanning large-scale transportation problems, health care, revenue management, and energy systems. The design of effective ADP algorithms has many dimensions, but one crucial factor is the stepsize rule used to update a value function approximation. Many operations research applications are computationally intensive, and it is important to obtain good results quickly. Furthermore, the most popular stepsize formulas use tunable parameters and can produce very poor results if tuned improperly. We derive a new stepsize rule that optimizes the prediction error in order to improve the short-term performance of an ADP algorithm. With only one, relatively insensitive tunable parameter, the new rule adapts to the level of noise in the problem and produces faster convergence in numerical experiments.

preprint2014arXiv

Least Squares Policy Iteration with Instrumental Variables vs. Direct Policy Search: Comparison Against Optimal Benchmarks Using Energy Storage

This paper studies approximate policy iteration (API) methods which use least-squares Bellman error minimization for policy evaluation. We address several of its enhancements, namely, Bellman error minimization using instrumental variables, least-squares projected Bellman error minimization, and projected Bellman error minimization using instrumental variables. We prove that for a general discrete-time stochastic control problem, Bellman error minimization using instrumental variables is equivalent to both variants of projected Bellman error minimization. An alternative to these API methods is direct policy search based on knowledge gradient. The practical performance of these three approximate dynamic programming methods are then investigated in the context of an application in energy storage, integrated with an intermittent wind energy supply to fully serve a stochastic time-varying electricity demand. We create a library of test problems using real-world data and apply value iteration to find their optimal policies. These benchmarks are then used to compare the developed policies. Our analysis indicates that API with instrumental variables Bellman error minimization prominently outperforms API with least-squares Bellman error minimization. However, these approaches underperform our direct policy search implementation.

preprint2010arXiv

Dirichlet Process Mixtures of Generalized Linear Models

We propose Dirichlet Process mixtures of Generalized Linear Models (DP-GLM), a new method of nonparametric regression that accommodates continuous and categorical inputs, and responses that can be modeled by a generalized linear model. We prove conditions for the asymptotic unbiasedness of the DP-GLM regression mean function estimate. We also give examples for when those conditions hold, including models for compactly supported continuous distributions and a model with continuous covariates and categorical response. We empirically analyze the properties of the DP-GLM and why it provides better results than existing Dirichlet process mixture regression models. We evaluate DP-GLM on several data sets, comparing it to modern methods of nonparametric regression like CART, Bayesian trees and Gaussian processes. Compared to existing techniques, the DP-GLM provides a single model (and corresponding inference algorithms) that performs well in many regression settings.

preprint2010arXiv

Stochastic Search with an Observable State Variable

In this paper we study convex stochastic search problems where a noisy objective function value is observed after a decision is made. There are many stochastic search problems whose behavior depends on an exogenous state variable which affects the shape of the objective function. Currently, there is no general purpose algorithm to solve this class of problems. We use nonparametric density estimation to take observations from the joint state-outcome distribution and use them to infer the optimal decision for a given query state. We propose two solution methods that depend on the problem characteristics: function-based and gradient-based optimization. We examine two weighting schemes, kernel-based weights and Dirichlet process-based weights, for use with the solution methods. The weights and solution methods are tested on a synthetic multi-product newsvendor problem and the hour-ahead wind commitment problem. Our results show that in some cases Dirichlet process weights offer substantial benefits over kernel based weights and more generally that nonparametric estimation methods provide good solutions to otherwise intractable problems.

Warren B. Powell

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

The Information-Collecting Vehicle Routing Problem: Stochastic Optimization for Emergency Storm Response

Stochastic Search for a Parametric Cost Function Approximation: Energy storage with rolling forecasts

Backward Approximate Dynamic Programming with Hidden Semi-Markov Stochastic Models in Energy Storage Optimization

Reinforcement Learning via Parametric Cost Function Approximation for Multistage Stochastic Programming

Risk Directed Importance Sampling in Stochastic Dual Dynamic Programming with Hidden Markov Models for Grid Level Energy Storage

Optimal Learning for Stochastic Optimization with Nonlinear Parametric Belief Models

SDDP vs. ADP: The Effect of Dimensionality in Multistage Stochastic Optimization for Grid Level Energy Storage

The Information-Collecting Vehicle Routing Problem: Stochastic Optimization for Emergency Storm Response

A Knowledge Gradient Policy for Sequencing Experiments to Identify the Structure of RNA Molecules Using a Sparse Additive Belief Model

An Approximate Dynamic Programming Algorithm for Monotone Value Functions

Optimal Hour-Ahead Bidding in the Real-Time Electricity Market with Battery Storage using Approximate Dynamic Programming

A New Optimal Stepsize For Approximate Dynamic Programming

Least Squares Policy Iteration with Instrumental Variables vs. Direct Policy Search: Comparison Against Optimal Benchmarks Using Energy Storage

Dirichlet Process Mixtures of Generalized Linear Models

Stochastic Search with an Observable State Variable