Source author record

Peter I. Frazier

Peter I. Frazier appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.OC Artificial Intelligence Information Retrieval math.ST Methodology Statistics Theory Applications Computation Computer Science and Game Theory Information Theory math.IT math.PR Quantitative Methods Social and Information Networks Systems and Control

Catalog footprint

What is connected

21works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Better Protein Function Prediction by Modeling Survivorship Bias

Protein sequence data from nature exhibits survivorship bias: we only observe data from those organisms that survive and reproduce, while non-functional protein mutations are eliminated by natural selection. Thus, predicting whether a protein sequence is functional often requires learning from positive examples alone. While positive-unlabeled (PU) learning frameworks offer a generic solution to this problem, existing PU methods ignore the evolutionary processes that shape sequence observability and cause survivorship bias. Consider a sequence that is one mutation away from a commonly-observed protein variant in a well-surveilled organism. If the sequence were functional, it would likely be observed. If it is not observed, this suggests non-functionality. In contrast, sequences that are unlikely to arise through mutation may be missing simply because they never arose. Thus, these two kinds of missing sequences should be treated differently when training models. In this work, we propose Evo-PU, a PU learning framework that uses a scientific understanding of nucleotide mutation to model survivorship bias for well-surveilled single-organism sequence data. On three prediction tasks using single-organism uniform-coverage surveillance data -- predicting results from held-out influenza and respiratory syncytial virus (RSV) mutagenesis studies, and predicting future SARS-CoV-2 variants -- Evo-PU outperforms standard PU learning, one-class classification (OCC), and protein language models (PLMs). On prediction tasks from multi-organism ProteinGym datasets with more heterogeneous surveillance coverage, we identify opportunities to generalize our approach.

preprint2022arXiv

Dynamic Pricing Provides Robust Equilibria in Stochastic Ridesharing Networks

Ridesharing markets are complex: drivers are strategic, rider demand and driver availability are stochastic, and complex city-scale phenomena like weather induce large scale correlation across space and time. At the same time, past work has focused on a subset of these challenges. We propose a model of ridesharing networks with strategic drivers, spatiotemporal dynamics, and stochasticity. Supporting both computational tractability and better modeling flexibility than classical fluid limits, we use a two-level stochastic model that allows correlated shocks caused by weather or large public events. Using this model, we propose a novel pricing mechanism: stochastic spatiotemporal pricing (SSP). We show that the SSP mechanism is asymptotically incentive-compatible and that all (approximate) equilibria of the resulting game are asymptotically welfare-maximizing when the market is large enough. The SSP mechanism iteratively recomputes prices based on realized demand and supply, and in this sense prices dynamically. We show that this is critical: while a static variant of the SSP mechanism (whose prices vary with the market-level stochastic scenario but not individual rider and driver decisions) has a sequence of asymptotically welfare-optimal approximate equilibria, we demonstrate that it also has other equilibria producing extremely low social welfare. Thus, we argue that dynamic pricing is important for ensuring robustness in stochastic ride-sharing networks.

preprint2022arXiv

Near-optimality for infinite-horizon restless bandits with many arms

Restless bandits are an important class of problems with applications in recommender systems, active learning, revenue management and other areas. We consider infinite-horizon discounted restless bandits with many arms where a fixed proportion of arms may be pulled in each period and where arms share a finite state space. Although an average-case-optimal policy can be computed via stochastic dynamic programming, the computation required grows exponentially with the number of arms $N$. Thus, it is important to find scalable policies that can be computed efficiently for large $N$ and that are near optimal in this regime, in the sense that the optimality gap (i.e. the loss of expected performance against an optimal policy) per arm vanishes for large $N$. However, the most popular approach, the Whittle index, requires a hard-to-verify indexability condition to be well-defined and another hard-to-verify condition to guarantee a $o(N)$ optimality gap. We present a method resolving these difficulties. By replacing a global Lagrange multiplier used by the Whittle index with a sequence of Lagrangian multipliers, one per time period up to a finite truncation point, we derive a class of policies, called fluid-balance policies, that have a $O(\sqrt{N})$ optimality gap. Unlike the Whittle index, fluid-balance policies do not require indexability to be well-defined and their $O(\sqrt{N})$ optimality gap bound holds universally without sufficient conditions. We also demonstrate empirically that fluid-balance policies provide state-of-the-art performance on specific problems.

preprint2022arXiv

Thinking inside the box: A tutorial on grey-box Bayesian optimization

Bayesian optimization (BO) is a framework for global optimization of expensive-to-evaluate objective functions. Classical BO methods assume that the objective function is a black box. However, internal information about objective function computation is often available. For example, when optimizing a manufacturing line's throughput with simulation, we observe the number of parts waiting at each workstation, in addition to the overall throughput. Recent BO methods leverage such internal information to dramatically improve performance. We call these "grey-box" BO methods because they treat objective computation as partially observable and even modifiable, blending the black-box approach with so-called "white-box" first-principles knowledge of objective function computation. This tutorial describes these methods, focusing on BO of composite objective functions, where one can observe and selectively evaluate individual constituents that feed into the overall objective; and multi-fidelity BO, where one can evaluate cheaper approximations of the objective function by varying parameters of the evaluation oracle.

preprint2021arXiv

Bayesian Optimization of Function Networks

We consider Bayesian optimization of the output of a network of functions, where each function takes as input the output of its parent nodes, and where the network takes significant time to evaluate. Such problems arise, for example, in reinforcement learning, engineering design, and manufacturing. While the standard Bayesian optimization approach observes only the final output, our approach delivers greater query efficiency by leveraging information that the former ignores: intermediate output within the network. This is achieved by modeling the nodes of the network using Gaussian processes and choosing the points to evaluate using, as our acquisition function, the expected improvement computed with respect to the implied posterior on the objective. Although the non-Gaussian nature of this posterior prevents computing our acquisition function in closed form, we show that it can be efficiently maximized via sample average approximation. In addition, we prove that our method is asymptotically consistent, meaning that it finds a globally optimal solution as the number of evaluations grows to infinity, thus generalizing previously known convergence results for the expected improvement. Notably, this holds even though our method might not evaluate the domain densely, instead leveraging problem structure to leave regions unexplored. Finally, we show that our approach dramatically outperforms standard Bayesian optimization methods in several synthetic and real-world problems.

preprint2020arXiv

Multi-Attribute Bayesian Optimization With Interactive Preference Learning

We consider black-box global optimization of time-consuming-to-evaluate functions on behalf of a decision-maker (DM) whose preferences must be learned. Each feasible design is associated with a time-consuming-to-evaluate vector of attributes and each vector of attributes is assigned a utility by the DM's utility function, which may be learned approximately using preferences expressed over pairs of attribute vectors. Past work has used a point estimate of this utility function as if it were error-free within single-objective optimization. However, utility estimation errors may yield a poor suggested design. Furthermore, this approach produces a single suggested "best" design, whereas DMs often prefer to choose from a menu. We propose a novel multi-attribute Bayesian optimization with preference learning approach. Our approach acknowledges the uncertainty in preference estimation and implicitly chooses designs to evaluate that are good not just for a single estimated utility function but a range of likely ones. The outcome of our approach is a menu of designs and evaluated attributes from which the DM makes a final selection. We demonstrate the value and flexibility of our approach in a variety of experiments.

preprint2016arXiv

Multi-Information Source Optimization

We consider Bayesian optimization of an expensive-to-evaluate black-box objective function, where we also have access to cheaper approximations of the objective. In general, such approximations arise in applications such as reinforcement learning, engineering, and the natural sciences, and are subject to an inherent, unknown bias. This model discrepancy is caused by an inadequate internal model that deviates from reality and can vary over the domain, making the utilization of these approximations a non-trivial task. We present a novel algorithm that provides a rigorous mathematical treatment of the uncertainties arising from model discrepancies and noisy observations. Its optimization decisions rely on a value of information analysis that extends the Knowledge Gradient factor to the setting of multiple information sources that vary in cost: each sampling decision maximizes the predicted benefit per unit cost. We conduct an experimental evaluation that demonstrates that the method consistently outperforms other state-of-the-art techniques: it finds designs of considerably higher objective value and additionally inflicts less cost in the exploration process.

preprint2016arXiv

Multi-Step Bayesian Optimization for One-Dimensional Feasibility Determination

Bayesian optimization methods allocate limited sampling budgets to maximize expensive-to-evaluate functions. One-step-lookahead policies are often used, but computing optimal multi-step-lookahead policies remains a challenge. We consider a specialized Bayesian optimization problem: finding the superlevel set of an expensive one-dimensional function, with a Markov process prior. We compute the Bayes-optimal sampling policy efficiently, and characterize the suboptimality of one-step lookahead. Our numerical experiments demonstrate that the one-step lookahead policy is close to optimal in this problem, performing within 98% of optimal in the experimental settings considered.

preprint2016arXiv

Probabilistic Bisection Converges Almost as Quickly as Stochastic Approximation

The probabilistic bisection algorithm (PBA) solves a class of stochastic root-finding problems in one dimension by successively updating a prior belief on the location of the root based on noisy responses to queries at chosen points. The responses indicate the direction of the root from the queried point, and are incorrect with a fixed probability. The fixed-probability assumption is problematic in applications, and so we extend the PBA to apply when this assumption is relaxed. The extension involves the use of a power-one test at each queried point. We explore the convergence behavior of the extended PBA, showing that it converges at a rate arbitrarily close to, but slower than, the canonical "square root" rate of stochastic approximation.

preprint2016arXiv

Stratified Bayesian Optimization

We consider derivative-free black-box global optimization of expensive noisy functions, when most of the randomness in the objective is produced by a few influential scalar random inputs. We present a new Bayesian global optimization algorithm, called Stratified Bayesian Optimization (SBO), which uses this strong dependence to improve performance. Our algorithm is similar in spirit to stratification, a technique from simulation, which uses strong dependence on a categorical representation of the random input to reduce variance. We demonstrate in numerical experiments that SBO outperforms state-of-the-art Bayesian optimization benchmarks that do not leverage this dependence.

preprint2016arXiv

The Bayesian Linear Information Filtering Problem

We present a Bayesian sequential decision-making formulation of the information filtering problem, in which an algorithm presents items (news articles, scientific papers, tweets) arriving in a stream, and learns relevance from user feedback on presented items. We model user preferences using a Bayesian linear model, similar in spirit to a Bayesian linear bandit. We compute a computational upper bound on the value of the optimal policy, which allows computing an optimality gap for implementable policies. We then use this analysis as motivation in introducing a pair of new Decompose-Then-Decide (DTD) heuristic policies, DTD-Dynamic-Programming (DTD-DP) and DTD-Upper-Confidence-Bound (DTD-UCB). We compare DTD-DP and DTD-UCB against several benchmarks on real and simulated data, demonstrating significant improvement, and show that the achieved performance is close to the upper bound.

preprint2016arXiv

Warm Starting Bayesian Optimization

We develop a framework for warm-starting Bayesian optimization, that reduces the solution time required to solve an optimization problem that is one in a sequence of related problems. This is useful when optimizing the output of a stochastic simulator that fails to provide derivative information, for which Bayesian optimization methods are well-suited. Solving sequences of related optimization problems arises when making several business decisions using one optimization model and input data collected over different time periods or markets. While many gradient-based methods can be warm started by initiating optimization at the solution to the previous problem, this warm start approach does not apply to Bayesian optimization methods, which carry a full metamodel of the objective function from iteration to iteration. Our approach builds a joint statistical model of the entire collection of related objective functions, and uses a value of information calculation to recommend points to evaluate.

preprint2015arXiv

Asymptotic Validity of the Bayes-Inspired Indifference Zone Procedure: The Non-Normal Known Variance Case

We consider the indifference-zone (IZ) formulation of the ranking and selection problem in which the goal is to choose an alternative with the largest mean with guaranteed probability, as long as the difference between this mean and the second largest exceeds a threshold. Conservatism leads classical IZ procedures to take too many samples in problems with many alternatives. The Bayes-inspired Indifference Zone (BIZ) procedure, proposed in Frazier (2014), is less conservative than previous procedures, but its proof of validity requires strong assumptions, specifically that samples are normal, and variances are known with an integer multiple structure. In this paper, we show asymptotic validity of a slight modification of the original BIZ procedure as the difference between the best alternative and the second best goes to zero,when the variances are known and finite, and samples are independent and identically distributed, but not necessarily normal.

preprint2015arXiv

Bayes-Optimal Effort Allocation in Crowdsourcing: Bounds and Index Policies

We consider effort allocation in crowdsourcing, where we wish to assign labeling tasks to imperfect homogeneous crowd workers to maximize overall accuracy in a continuous-time Bayesian setting, subject to budget and time constraints. The Bayes-optimal policy for this problem is the solution to a partially observable Markov decision process, but the curse of dimensionality renders the computation infeasible. Based on the Lagrangian Relaxation technique in Adelman & Mersereau (2008), we provide a computationally tractable instance-specific upper bound on the value of this Bayes-optimal policy, which can in turn be used to bound the optimality gap of any other sub-optimal policy. In an approach similar in spirit to the Whittle index for restless multiarmed bandits, we provide an index policy for effort allocation in crowdsourcing and demonstrate numerically that it outperforms other stateof- arts and performs close to optimal solution.

preprint2015arXiv

Clustering via Content-Augmented Stochastic Blockmodels

Much of the data being created on the web contains interactions between users and items. Stochastic blockmodels, and other methods for community detection and clustering of bipartite graphs, can infer latent user communities and latent item clusters from this interaction data. These methods, however, typically ignore the items' contents and the information they provide about item clusters, despite the tendency of items in the same latent cluster to share commonalities in content. We introduce content-augmented stochastic blockmodels (CASB), which use item content together with user-item interaction data to enhance the user communities and item clusters learned. Comparisons to several state-of-the-art benchmark methods, on datasets arising from scientists interacting with scientific articles, show that content-augmented stochastic blockmodels provide highly accurate clusters with respect to metrics representative of the underlying community structure.

preprint2015arXiv

Exploration vs. Exploitation in the Information Filtering Problem

We consider information filtering, in which we face a stream of items too voluminous to process by hand (e.g., scientific articles, blog posts, emails), and must rely on a computer system to automatically filter out irrelevant items. Such systems face the exploration vs. exploitation tradeoff, in which it may be beneficial to present an item despite a low probability of relevance, just to learn about future items with similar content. We present a Bayesian sequential decision-making model of this problem, show how it may be solved to optimality using a decomposition to a collection of two-armed bandit problems, and show structural results for the optimal policy. We show that the resulting method is especially useful when facing the cold start problem, i.e., when filtering items for new users without a long history of past interactions. We then present an application of this information filtering method to a historical dataset from the arXiv.org repository of scientific articles.

preprint2015arXiv

Probabilistic Group Testing under Sum Observations: A Parallelizable 2-Approximation for Entropy Loss

We consider the problem of group testing with sum observations and noiseless answers, in which we aim to locate multiple objects by querying the number of objects in each of a sequence of chosen sets. We study a probabilistic setting with entropy loss, in which we assume a joint Bayesian prior density on the locations of the objects and seek to choose the sets queried to minimize the expected entropy of the Bayesian posterior distribution after a fixed number of questions. We present a new non-adaptive policy, called the dyadic policy, show it is optimal among non-adaptive policies, and is within a factor of two of optimal among adaptive policies. This policy is quick to compute, its nonadaptive nature makes it easy to parallelize, and our bounds show it performs well even when compared with adaptive policies. We also study an adaptive greedy policy, which maximizes the one-step expected reduction in entropy, and show that it performs at least as well as the dyadic policy, offering greater query efficiency but reduced parallelism. Numerical experiments demonstrate that both procedures outperform a divide-and-conquer benchmark policy from the literature, called sequential bifurcation, and show how these procedures may be applied in a stylized computer vision problem.

preprint2014arXiv

A Markov Decision Process Analysis of the Cold Start Problem in Bayesian Information Filtering

We consider the information filtering problem, in which we face a stream of items, and must decide which ones to forward to a user to maximize the number of relevant items shown, minus a penalty for each irrelevant item shown. Forwarding decisions are made separately in a personalized way for each user. We focus on the cold-start setting for this problem, in which we have limited historical data on the user's preferences, and must rely on feedback from forwarded articles to learn which the fraction of items relevant to the user in each of several item categories. Performing well in this setting requires trading exploration vs. exploitation, forwarding items that are likely to be irrelevant, to allow learning that will improve later performance. In a Bayesian setting, and using Markov decision processes, we show how the Bayes-optimal forwarding algorithm can be computed efficiently when the user will examine each forwarded article, and how an upper bound on the Bayes-optimal procedure and a heuristic index policy can be obtained for the setting when the user will examine only a limited number of forwarded items. We present results from simulation experiments using parameters estimated using historical data from arXiv.org.

preprint2014arXiv

A New Optimal Stepsize For Approximate Dynamic Programming

Approximate dynamic programming (ADP) has proven itself in a wide range of applications spanning large-scale transportation problems, health care, revenue management, and energy systems. The design of effective ADP algorithms has many dimensions, but one crucial factor is the stepsize rule used to update a value function approximation. Many operations research applications are computationally intensive, and it is important to obtain good results quickly. Furthermore, the most popular stepsize formulas use tunable parameters and can produce very poor results if tuned improperly. We derive a new stepsize rule that optimizes the prediction error in order to improve the short-term performance of an ADP algorithm. With only one, relatively insensitive tunable parameter, the new rule adapts to the level of noise in the problem and produces faster convergence in numerical experiments.

preprint2012arXiv

Distance Dependent Infinite Latent Feature Models

Latent feature models are widely used to decompose data into a small number of components. Bayesian nonparametric variants of these models, which use the Indian buffet process (IBP) as a prior over latent features, allow the number of features to be determined from the data. We present a generalization of the IBP, the distance dependent Indian buffet process (dd-IBP), for modeling non-exchangeable data. It relies on distances defined between data points, biasing nearby data to share more features. The choice of distance measure allows for many kinds of dependencies, including temporal and spatial. Further, the original IBP is a special case of the dd-IBP. In this paper, we develop the dd-IBP and theoretically characterize its feature-sharing properties. We derive a Markov chain Monte Carlo sampler for a linear Gaussian model with a dd-IBP prior and study its performance on several non-exchangeable data sets.

preprint2011arXiv

Distance Dependent Chinese Restaurant Processes

We develop the distance dependent Chinese restaurant process (CRP), a flexible class of distributions over partitions that allows for non-exchangeability. This class can be used to model many kinds of dependencies between data in infinite clustering models, including dependencies across time or space. We examine the properties of the distance dependent CRP, discuss its connections to Bayesian nonparametric mixture models, and derive a Gibbs sampler for both observed and mixture settings. We study its performance with three text corpora. We show that relaxing the assumption of exchangeability with distance dependent CRPs can provide a better fit to sequential data. We also show its alternative formulation of the traditional CRP leads to a faster-mixing Gibbs sampling algorithm than the one based on the original formulation.

Peter I. Frazier

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

Better Protein Function Prediction by Modeling Survivorship Bias

Dynamic Pricing Provides Robust Equilibria in Stochastic Ridesharing Networks

Near-optimality for infinite-horizon restless bandits with many arms

Thinking inside the box: A tutorial on grey-box Bayesian optimization

Bayesian Optimization of Function Networks

Multi-Attribute Bayesian Optimization With Interactive Preference Learning

Multi-Information Source Optimization

Multi-Step Bayesian Optimization for One-Dimensional Feasibility Determination

Probabilistic Bisection Converges Almost as Quickly as Stochastic Approximation

Stratified Bayesian Optimization

The Bayesian Linear Information Filtering Problem

Warm Starting Bayesian Optimization

Asymptotic Validity of the Bayes-Inspired Indifference Zone Procedure: The Non-Normal Known Variance Case

Bayes-Optimal Effort Allocation in Crowdsourcing: Bounds and Index Policies

Clustering via Content-Augmented Stochastic Blockmodels

Exploration vs. Exploitation in the Information Filtering Problem

Probabilistic Group Testing under Sum Observations: A Parallelizable 2-Approximation for Entropy Loss

A Markov Decision Process Analysis of the Cold Start Problem in Bayesian Information Filtering

A New Optimal Stepsize For Approximate Dynamic Programming

Distance Dependent Infinite Latent Feature Models

Distance Dependent Chinese Restaurant Processes