Researcher profile

Enlu Zhou

Enlu Zhou contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2022arXiv

Contextual Ranking and Selection with Gaussian Processes

In many real world problems, we are faced with the problem of selecting the best among a finite number of alternatives, where the best alternative is determined based on context specific information. In this work, we study the contextual Ranking and Selection problem under a finite-alternative-finite-context setting, where we aim to find the best alternative for each context. We use a separate Gaussian process to model the reward for each alternative, and derive the large deviations rate function for both the expected and worst-case contextual probability of correct selection. We propose the GP-C-OCBA sampling policy, which uses the Gaussian process posterior to iteratively allocate observations to maximize the rate function. We prove its consistency and show that it achieves the optimal convergence rate under the assumption of a non-informative prior. Numerical experiments show that our algorithm is highly competitive in terms of sampling efficiency, while having significantly smaller computational overhead.

preprint2022arXiv

Data-driven Ranking and Selection under Input Uncertainty

We consider a simulation-based Ranking and Selection (R&S) problem with input uncertainty, where unknown input distributions can be estimated using input data arriving in batches of varying sizes over time. Each time a batch arrives, additional simulations can be run using updated input distribution estimates. The goal is to confidently identify the best design after collecting as few batches as possible. We first introduce a moving average estimator for aggregating simulation outputs generated under heterogenous input distributions. Then, based on a Sequential Elimination framework, we devise two major R&S procedures by establishing exact and asymptotic confidence bands for the estimator. In deriving the latter confidence bands, we incorporate the result of "Multiple Comparison with Best" and establish an asymptotic normality result which explicitly characterizes the tradeoff between input uncertainty and stochastic uncertainty in an online environment. We also extend our procedures to the indifference zone setting, which helps save simulation effort for practical usage. Numerical results show the effectiveness and necessity of our procedures. Moreover, the efficiency can be further boosted through optimizing the "drop rate" parameter of the estimator.

preprint2022arXiv

Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably

We investigate the role of noise in optimization algorithms for learning over-parameterized models. Specifically, we consider the recovery of a rank one matrix $Y^*\in R^{d\times d}$ from a noisy observation $Y$ using an over-parameterization model. We parameterize the rank one matrix $Y^*$ by $XX^\top$, where $X\in R^{d\times d}$. We then show that under mild conditions, the estimator, obtained by the randomly perturbed gradient descent algorithm using the square loss function, attains a mean square error of $O(σ^2/d)$, where $σ^2$ is the variance of the observational noise. In contrast, the estimator obtained by gradient descent without random perturbation only attains a mean square error of $O(σ^2)$. Our result partially justifies the implicit regularization effect of noise when learning over-parameterized models, and provides new understanding of training over-parameterized neural networks.

preprint2022arXiv

Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion. At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the corresponding reward. In particular, we consider mean-variance as the risk criterion, and the best arm is the one with the largest mean-variance reward. We apply the Thompson Sampling algorithm for the disjoint model, and provide a comprehensive regret analysis for a variant of the proposed algorithm. For $T$ rounds, $K$ actions, and $d$-dimensional feature vectors, we prove a regret bound of $O((1+ρ+\frac{1}ρ) d\ln T \ln \frac{K}δ\sqrt{d K T^{1+2ε} \ln \frac{K}δ \frac{1}ε})$ that holds with probability $1-δ$ under the mean-variance criterion with risk tolerance $ρ$, for any $0<ε<\frac{1}{2}$, $0<δ<1$. The empirical performance of our proposed algorithms is demonstrated via a portfolio selection problem.

preprint2022arXiv

Robust Multi-Objective Bayesian Optimization Under Input Noise

Bayesian optimization (BO) is a sample-efficient approach for tuning design parameters to optimize expensive-to-evaluate, black-box performance metrics. In many manufacturing processes, the design parameters are subject to random input noise, resulting in a product that is often less performant than expected. Although BO methods have been proposed for optimizing a single objective under input noise, no existing method addresses the practical scenario where there are multiple objectives that are sensitive to input perturbations. In this work, we propose the first multi-objective BO method that is robust to input noise. We formalize our goal as optimizing the multivariate value-at-risk (MVaR), a risk measure of the uncertain objectives. Since directly optimizing MVaR is computationally infeasible in many settings, we propose a scalable, theoretically-grounded approach for optimizing MVaR using random scalarizations. Empirically, we find that our approach significantly outperforms alternative methods and efficiently identifies optimal robust designs that will satisfy specifications across multiple metrics with high probability.

preprint2021arXiv

A Diffusion Approximation Theory of Momentum SGD in Nonconvex Optimization

Momentum Stochastic Gradient Descent (MSGD) algorithm has been widely applied to many nonconvex optimization problems in machine learning, e.g., training deep neural networks, variational Bayesian inference, and etc. Despite its empirical success, there is still a lack of theoretical understanding of convergence properties of MSGD. To fill this gap, we propose to analyze the algorithmic behavior of MSGD by diffusion approximations for nonconvex optimization problems with strict saddle points and isolated local optima. Our study shows that the momentum helps escape from saddle points, but hurts the convergence within the neighborhood of optima (if without the step size annealing or momentum annealing). Our theoretical discovery partially corroborates the empirical success of MSGD in training deep neural networks.

preprint2021arXiv

Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization

Numerous empirical evidences have corroborated the importance of noise in nonconvex optimization problems. The theory behind such empirical observations, however, is still largely unknown. This paper studies this fundamental problem through investigating the nonconvex rectangular matrix factorization problem, which has infinitely many global minima due to rotation and scaling invariance. Hence, gradient descent (GD) can converge to any optimum, depending on the initialization. In contrast, we show that a perturbed form of GD with an arbitrary initialization converges to a global optimum that is uniquely determined by the injected noise. Our result implies that the noise imposes implicit bias towards certain optima. Numerical experiments are provided to support our theory.

preprint2021arXiv

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Asynchronous momentum stochastic gradient descent algorithms (Async-MSGD) is one of the most popular algorithms in distributed machine learning. However, its convergence properties for these complicated nonconvex problems is still largely unknown, because of the current technical limit. Therefore, in this paper, we propose to analyze the algorithm through a simpler but nontrivial nonconvex problem - streaming PCA, which helps us to understand Aync-MSGD better even for more general problems. Specifically, we establish the asymptotic rate of convergence of Async-MSGD for streaming PCA by diffusion approximation. Our results indicate a fundamental tradeoff between asynchrony and momentum: To ensure convergence and acceleration through asynchrony, we have to reduce the momentum (compared with Sync-MSGD). To the best of our knowledge, this is the first theoretical attempt on understanding Async-MSGD for distributed nonconvex stochastic optimization. Numerical experiments on both streaming PCA and training deep neural networks are provided to support our findings for Async-MSGD.

preprint2020arXiv

Online Quantification of Input Model Uncertainty by Two-Layer Importance Sampling

Stochastic simulation has been widely used to analyze the performance of complex stochastic systems and facilitate decision making in those systems. Stochastic simulation is driven by the input model, which is a collection of probability distributions that model the stochasticity in the system. The input model is usually estimated using a finite amount of data, which introduces the so-called input model uncertainty to the simulation output. How to quantify input uncertainty has been studied extensively, and many methods have been proposed for the batch data setting, i.e., when all the data are available at once. However, methods for &#34;streaming data&#34; arriving sequentially in time are still in demand, despite that streaming data have become increasingly prevalent in modern applications. To fill this gap, we propose a two-layer importance sampling framework that incorporates streaming data for online input uncertainty quantification. Under this framework, we develop two algorithms that suit different application scenarios: the first scenario is when data come at a fast speed and there is no time for any new simulation in between updates; the second is when data come at a moderate speed and a few but limited simulations are allowed at each time stage. We prove the consistency and asymptotic convergence rate results, which theoretically show the efficiency of our proposed approach. We further demonstrate the proposed algorithms on a numerical example of the news vendor problem.

preprint2020arXiv

Solving Bayesian Risk Optimization via Nested Stochastic Gradient Estimation

In this paper, we aim to solve Bayesian Risk Optimization (BRO), which is a recently proposed framework that formulates simulation optimization under input uncertainty. In order to efficiently solve the BRO problem, we derive nested stochastic gradient estimators and propose corresponding stochastic approximation algorithms. We show that our gradient estimators are asymptotically unbiased and consistent, and that the algorithms converge asymptotically. We demonstrate the empirical performance of the algorithms on a two-sided market model. Our estimators are of independent interest in extending the literature of stochastic gradient estimation to the case of nested risk functions.