Source author record

Enlu Zhou

Enlu Zhou appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning Applications Artificial Intelligence Computation Information Theory math.IT math.ST Methodology q-fin.CP q-fin.RM Statistics Theory

Catalog footprint

What is connected

18works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Contextual Ranking and Selection with Gaussian Processes

In many real world problems, we are faced with the problem of selecting the best among a finite number of alternatives, where the best alternative is determined based on context specific information. In this work, we study the contextual Ranking and Selection problem under a finite-alternative-finite-context setting, where we aim to find the best alternative for each context. We use a separate Gaussian process to model the reward for each alternative, and derive the large deviations rate function for both the expected and worst-case contextual probability of correct selection. We propose the GP-C-OCBA sampling policy, which uses the Gaussian process posterior to iteratively allocate observations to maximize the rate function. We prove its consistency and show that it achieves the optimal convergence rate under the assumption of a non-informative prior. Numerical experiments show that our algorithm is highly competitive in terms of sampling efficiency, while having significantly smaller computational overhead.

preprint2022arXiv

Data-driven Ranking and Selection under Input Uncertainty

We consider a simulation-based Ranking and Selection (R&S) problem with input uncertainty, where unknown input distributions can be estimated using input data arriving in batches of varying sizes over time. Each time a batch arrives, additional simulations can be run using updated input distribution estimates. The goal is to confidently identify the best design after collecting as few batches as possible. We first introduce a moving average estimator for aggregating simulation outputs generated under heterogenous input distributions. Then, based on a Sequential Elimination framework, we devise two major R&S procedures by establishing exact and asymptotic confidence bands for the estimator. In deriving the latter confidence bands, we incorporate the result of "Multiple Comparison with Best" and establish an asymptotic normality result which explicitly characterizes the tradeoff between input uncertainty and stochastic uncertainty in an online environment. We also extend our procedures to the indifference zone setting, which helps save simulation effort for practical usage. Numerical results show the effectiveness and necessity of our procedures. Moreover, the efficiency can be further boosted through optimizing the "drop rate" parameter of the estimator.

preprint2022arXiv

Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably

We investigate the role of noise in optimization algorithms for learning over-parameterized models. Specifically, we consider the recovery of a rank one matrix $Y^*\in R^{d\times d}$ from a noisy observation $Y$ using an over-parameterization model. We parameterize the rank one matrix $Y^*$ by $XX^\top$, where $X\in R^{d\times d}$. We then show that under mild conditions, the estimator, obtained by the randomly perturbed gradient descent algorithm using the square loss function, attains a mean square error of $O(σ^2/d)$, where $σ^2$ is the variance of the observational noise. In contrast, the estimator obtained by gradient descent without random perturbation only attains a mean square error of $O(σ^2)$. Our result partially justifies the implicit regularization effect of noise when learning over-parameterized models, and provides new understanding of training over-parameterized neural networks.

preprint2022arXiv

Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion. At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the corresponding reward. In particular, we consider mean-variance as the risk criterion, and the best arm is the one with the largest mean-variance reward. We apply the Thompson Sampling algorithm for the disjoint model, and provide a comprehensive regret analysis for a variant of the proposed algorithm. For $T$ rounds, $K$ actions, and $d$-dimensional feature vectors, we prove a regret bound of $O((1+ρ+\frac{1}ρ) d\ln T \ln \frac{K}δ\sqrt{d K T^{1+2ε} \ln \frac{K}δ \frac{1}ε})$ that holds with probability $1-δ$ under the mean-variance criterion with risk tolerance $ρ$, for any $0<ε<\frac{1}{2}$, $0<δ<1$. The empirical performance of our proposed algorithms is demonstrated via a portfolio selection problem.

preprint2022arXiv

Robust Multi-Objective Bayesian Optimization Under Input Noise

Bayesian optimization (BO) is a sample-efficient approach for tuning design parameters to optimize expensive-to-evaluate, black-box performance metrics. In many manufacturing processes, the design parameters are subject to random input noise, resulting in a product that is often less performant than expected. Although BO methods have been proposed for optimizing a single objective under input noise, no existing method addresses the practical scenario where there are multiple objectives that are sensitive to input perturbations. In this work, we propose the first multi-objective BO method that is robust to input noise. We formalize our goal as optimizing the multivariate value-at-risk (MVaR), a risk measure of the uncertain objectives. Since directly optimizing MVaR is computationally infeasible in many settings, we propose a scalable, theoretically-grounded approach for optimizing MVaR using random scalarizations. Empirically, we find that our approach significantly outperforms alternative methods and efficiently identifies optimal robust designs that will satisfy specifications across multiple metrics with high probability.

preprint2021arXiv

A Diffusion Approximation Theory of Momentum SGD in Nonconvex Optimization

Momentum Stochastic Gradient Descent (MSGD) algorithm has been widely applied to many nonconvex optimization problems in machine learning, e.g., training deep neural networks, variational Bayesian inference, and etc. Despite its empirical success, there is still a lack of theoretical understanding of convergence properties of MSGD. To fill this gap, we propose to analyze the algorithmic behavior of MSGD by diffusion approximations for nonconvex optimization problems with strict saddle points and isolated local optima. Our study shows that the momentum helps escape from saddle points, but hurts the convergence within the neighborhood of optima (if without the step size annealing or momentum annealing). Our theoretical discovery partially corroborates the empirical success of MSGD in training deep neural networks.

preprint2021arXiv

Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization

Numerous empirical evidences have corroborated the importance of noise in nonconvex optimization problems. The theory behind such empirical observations, however, is still largely unknown. This paper studies this fundamental problem through investigating the nonconvex rectangular matrix factorization problem, which has infinitely many global minima due to rotation and scaling invariance. Hence, gradient descent (GD) can converge to any optimum, depending on the initialization. In contrast, we show that a perturbed form of GD with an arbitrary initialization converges to a global optimum that is uniquely determined by the injected noise. Our result implies that the noise imposes implicit bias towards certain optima. Numerical experiments are provided to support our theory.

preprint2021arXiv

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Asynchronous momentum stochastic gradient descent algorithms (Async-MSGD) is one of the most popular algorithms in distributed machine learning. However, its convergence properties for these complicated nonconvex problems is still largely unknown, because of the current technical limit. Therefore, in this paper, we propose to analyze the algorithm through a simpler but nontrivial nonconvex problem - streaming PCA, which helps us to understand Aync-MSGD better even for more general problems. Specifically, we establish the asymptotic rate of convergence of Async-MSGD for streaming PCA by diffusion approximation. Our results indicate a fundamental tradeoff between asynchrony and momentum: To ensure convergence and acceleration through asynchrony, we have to reduce the momentum (compared with Sync-MSGD). To the best of our knowledge, this is the first theoretical attempt on understanding Async-MSGD for distributed nonconvex stochastic optimization. Numerical experiments on both streaming PCA and training deep neural networks are provided to support our findings for Async-MSGD.

preprint2020arXiv

Online Quantification of Input Model Uncertainty by Two-Layer Importance Sampling

Stochastic simulation has been widely used to analyze the performance of complex stochastic systems and facilitate decision making in those systems. Stochastic simulation is driven by the input model, which is a collection of probability distributions that model the stochasticity in the system. The input model is usually estimated using a finite amount of data, which introduces the so-called input model uncertainty to the simulation output. How to quantify input uncertainty has been studied extensively, and many methods have been proposed for the batch data setting, i.e., when all the data are available at once. However, methods for "streaming data" arriving sequentially in time are still in demand, despite that streaming data have become increasingly prevalent in modern applications. To fill this gap, we propose a two-layer importance sampling framework that incorporates streaming data for online input uncertainty quantification. Under this framework, we develop two algorithms that suit different application scenarios: the first scenario is when data come at a fast speed and there is no time for any new simulation in between updates; the second is when data come at a moderate speed and a few but limited simulations are allowed at each time stage. We prove the consistency and asymptotic convergence rate results, which theoretically show the efficiency of our proposed approach. We further demonstrate the proposed algorithms on a numerical example of the news vendor problem.

preprint2020arXiv

Solving Bayesian Risk Optimization via Nested Stochastic Gradient Estimation

In this paper, we aim to solve Bayesian Risk Optimization (BRO), which is a recently proposed framework that formulates simulation optimization under input uncertainty. In order to efficiently solve the BRO problem, we derive nested stochastic gradient estimators and propose corresponding stochastic approximation algorithms. We show that our gradient estimators are asymptotically unbiased and consistent, and that the algorithms converge asymptotically. We demonstrate the empirical performance of the algorithms on a two-sided market model. Our estimators are of independent interest in extending the literature of stochastic gradient estimation to the case of nested risk functions.

preprint2016arXiv

Simulation Optimization of Risk Measures with Adaptive Risk Levels

Optimizing risk measures such as Value-at-Risk (VaR) and Conditional Value-at-Risk (CVaR) of a general loss distribution is usually difficult, because 1) the loss function might lack structural properties such as convexity or differentiability since it is often generated via black-box simulation of a stochastic system; 2) evaluation of risk measures often requires rare-event simulation, which is computationally expensive. In this paper, we study the extension of the recently proposed gradient-based adaptive stochastic search (GASS) to the optimization of risk measures VaR and CVaR. Instead of optimizing VaR or CVaR at the target risk level directly, we incorporate an adaptive updating scheme on the risk level, by initializing the algorithm at a small risk level and adaptively increasing it until the target risk level is achieved while the algorithm converges at the same time. This enables us to adaptively reduce the number of samples required to estimate the risk measure at each iteration, and thus improving the overall efficiency of the algorithm.

preprint2016arXiv

Solving Multi-Objective Optimization via Adaptive Stochastic Search with Domination Measure

For general multi-objective optimization problems, we propose a novel performance metric called domination measure to measure the quality of a solution, which can be intuitively interpreted as the size of the portion of the solution space that dominates that solution. As a result, we reformulate the original multi-objective problem into a stochastic single-objective one and propose a model-based approach to solve it. We show that an ideal version algorithm of the proposed approach converges to a set representation of the global optima of the reformulated problem. We also investigate the numerical performance of an implementable version algorithm by comparing it with numerous existing multi-objective optimization methods on popular benchmark test functions. The numerical results show that the proposed approach is effective in generating a finite and uniformly spread approximation of the Pareto optimal set of the original multi-objective problem, and is competitive to the tested existing methods.

preprint2016arXiv

Solving the Dual Problems of Dynamic Programs via Regression

In recent years, information relaxation and duality in dynamic programs have been studied extensively, and the resulted primal-dual approach has become a powerful procedure in solving dynamic programs by providing lower-upper bounds on the optimal value function. Theoretically, with the so called value-based optimal dual penalty, the optimal value function could be recovered exactly via strong duality. However, in practice, obtaining tight dual bounds usually requires good approximations of the optimal dual penalty, which could be time-consuming due to the conditional expectations that need to be estimated via nested simulation. In this paper, we will develop a framework of regression approach to approximating the optimal dual penalty in a non-nested manner, by exploring the structure of the function space consisting of all feasible dual penalties. The resulted approximations maintain to be feasible dual penalties, and thus yield valid dual bounds on the optimal value function. We show that the proposed framework is computationally efficient, and the resulted dual penalties lead to numerically tractable dual problems. Finally, we apply the framework to a high-dimensional dynamic trading problem to demonstrate its effectiveness in solving the dual problems of complex dynamic programs.

preprint2016arXiv

The Empirical Likelihood Approach to Quantifying Uncertainty in Sample Average Approximation

We study the empirical likelihood approach to construct confidence intervals for the optimal value and the optimality gap of a given solution, henceforth quantify the statistical uncertainty of sample average approximation, for optimization problems with expected value objectives and constraints where the underlying probability distributions are observed via limited data. This approach relies on two distributionally robust optimization problems posited over the uncertain distribution, with a divergence-based uncertainty set that is suitably calibrated to provide asymptotic statistical guarantees.

preprint2014arXiv

Information Relaxation and Dual Formulation of Controlled Markov Diffusions

Information relaxation and duality in Markov decision processes have been studied recently by several researchers with the goal to derive dual bounds on the value function. In this paper we extend this dual formulation to controlled Markov diffusions: in a similar way we relax the constraint that the decision should be made based on the current information and impose penalty to punish the access to the information in advance. We establish the weak duality, strong duality and complementary slackness results in a parallel way as those in Markov decision processes. We explore the structure of the optimal penalties and expose the connection between Markov decision processes and controlled Markov diffusions. We demonstrate the use of the dual representation for controlled Markov diffusions in a classic dynamic portfolio choice problem. We evaluate the lower bounds on the expected utility by Monte Carlo simulation under a sub-optimal policy, and we propose a new class of penalties to derive upper bounds with little extra computation. The small gaps between the lower bounds and upper bounds indicate that the available policy is near optimal as well as the effectiveness of our proposed penalty in the dual method.

preprint2014arXiv

Weakly Coupled Dynamic Program: Information and Lagrangian Relaxations

"Weakly coupled dynamic program" describes a broad class of stochastic optimization problems in which multiple controlled stochastic processes evolve independently but subject to a set of linking constraints imposed on the controls. One feature of the weakly coupled dynamic program is that it decouples into lower-dimensional dynamic programs by dualizing the linking constraint via the Lagrangian relaxation, which also yields a bound on the optimal value of the original dynamic program. Together with the Lagrangian bound, we utilize the information relaxation approach that relaxes the non-anticipative constraint on the controls to obtain a tighter dual bound. We also investigate other combinations of the relaxations and place the resulting bounds in order. To tackle large-scale problems, we further propose a computationally tractable method based on information relaxation, and provide insightful interpretation and performance guarantee. We implement our method and demonstrate its use through two numerical examples.

preprint2013arXiv

Fast Estimation of True Bounds on Bermudan Option Prices under Jump-diffusion Processes

Fast pricing of American-style options has been a difficult problem since it was first introduced to financial markets in 1970s, especially when the underlying stocks' prices follow some jump-diffusion processes. In this paper, we propose a new algorithm to generate tight upper bounds on the Bermudan option price without nested simulation, under the jump-diffusion setting. By exploiting the martingale representation theorem for jump processes on the dual martingale, we are able to explore the unique structure of the optimal dual martingale and construct an approximation that preserves the martingale property. The resulting upper bound estimator avoids the nested Monte Carlo simulation suffered by the original primal-dual algorithm, therefore significantly improves the computational efficiency. Theoretical analysis is provided to guarantee the quality of the martingale approximation. Numerical experiments are conducted to verify the efficiency of our proposed algorithm.

preprint2013arXiv

Gradient-Based Adaptive Stochastic Search for Non-Differentiable Optimization

In this paper, we propose a stochastic search algorithm for solving general optimization problems with little structure. The algorithm iteratively finds high quality solutions by randomly sampling candidate solutions from a parameterized distribution model over the solution space. The basic idea is to convert the original (possibly non-differentiable) problem into a differentiable optimization problem on the parameter space of the parameterized sampling distribution, and then use a direct gradient search method to find improved sampling distributions. Thus, the algorithm combines the robustness feature of stochastic search from considering a population of candidate solutions with the relative fast convergence speed of classical gradient methods by exploiting local differentiable structures. We analyze the convergence and converge rate properties of the proposed algorithm, and carry out numerical study to illustrate its performance.

Enlu Zhou

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

Contextual Ranking and Selection with Gaussian Processes

Data-driven Ranking and Selection under Input Uncertainty

Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably

Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

Robust Multi-Objective Bayesian Optimization Under Input Noise

A Diffusion Approximation Theory of Momentum SGD in Nonconvex Optimization

Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Online Quantification of Input Model Uncertainty by Two-Layer Importance Sampling

Solving Bayesian Risk Optimization via Nested Stochastic Gradient Estimation

Simulation Optimization of Risk Measures with Adaptive Risk Levels

Solving Multi-Objective Optimization via Adaptive Stochastic Search with Domination Measure

Solving the Dual Problems of Dynamic Programs via Regression

The Empirical Likelihood Approach to Quantifying Uncertainty in Sample Average Approximation

Information Relaxation and Dual Formulation of Controlled Markov Diffusions

Weakly Coupled Dynamic Program: Information and Lagrangian Relaxations

Fast Estimation of True Bounds on Bermudan Option Prices under Jump-diffusion Processes

Gradient-Based Adaptive Stochastic Search for Non-Differentiable Optimization