Source author record

Xuefeng Gao

Xuefeng Gao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.OC math.PR q-fin.TR

Catalog footprint

What is connected

7works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Sublinear Regret for Learning POMDPs

We study the model-based undiscounted reinforcement learning for partially observable Markov decision processes (POMDPs). The oracle we consider is the optimal policy of the POMDP with a known environment in terms of the average reward over an infinite horizon. We propose a learning algorithm for this problem, building on spectral method-of-moments estimations for hidden Markov models, the belief error control in POMDPs and upper-confidence-bound methods for online learning. We establish a regret bound of $O(T^{2/3}\sqrt{\log T})$ for the proposed learning algorithm where $T$ is the learning horizon. This is, to the best of our knowledge, the first algorithm achieving sublinear regret with respect to our oracle for learning general POMDPs.

preprint2021arXiv

Regime Switching Bandits

We study a multi-armed bandit problem where the rewards exhibit regime switching. Specifically, the distributions of the random rewards generated from all arms are modulated by a common underlying state modeled as a finite-state Markov chain. The agent does not observe the underlying state and has to learn the transition matrix and the reward distributions. We propose a learning algorithm for this problem, building on spectral method-of-moments estimations for hidden Markov models, belief error control in partially observable Markov decision processes and upper-confidence-bound methods for online learning. We also establish an upper bound $O(T^{2/3}\sqrt{\log T})$ for the proposed learning algorithm where $T$ is the learning horizon. Finally, we conduct proof-of-concept experiments to illustrate the performance of the learning algorithm.

preprint2020arXiv

Non-Convex Optimization via Non-Reversible Stochastic Gradient Langevin Dynamics

Stochastic Gradient Langevin Dynamics (SGLD) is a powerful algorithm for optimizing a non-convex objective, where a controlled and properly scaled Gaussian noise is added to the stochastic gradients to steer the iterates towards a global minimum. SGLD is based on the overdamped Langevin diffusion which is reversible in time. By adding an anti-symmetric matrix to the drift term of the overdamped Langevin diffusion, one gets a non-reversible diffusion that converges to the same stationary distribution with a faster convergence rate. In this paper, we study the non reversible Stochastic Gradient Langevin Dynamics (NSGLD) which is based on discretization of the non-reversible Langevin diffusion. We provide finite-time performance bounds for the global convergence of NSGLD for solving stochastic non-convex optimization problems. Our results lead to non-asymptotic guarantees for both population and empirical risk minimization problems. Numerical experiments for Bayesian independent component analysis and neural network models show that NSGLD can outperform SGLD with proper choices of the anti-symmetric matrix.

preprint2020arXiv

Optimal Market Making in the Presence of Latency

This paper studies optimal market making for large-tick assets in the presence of latency. We consider a random walk model for the asset price, and formulate the market maker's optimization problem using Markov Decision Processes (MDP). We characterize the value of an order and show that it plays the role of one-period reward in the MDP model. Based on this characterization, we provide explicit criteria for assessing the profitability of market making when there is latency. Under our model, we show that a market maker can earn a positive expected profit if there are sufficient uninformed market orders hitting the market maker's limit orders compared with the rate of price jumps, and the trading horizon is sufficiently long. In addition, our theoretical and numerical results suggest that latency can be an additional source of risk and latency impacts negatively the performance of market makers.

preprint2016arXiv

Hydrodynamic limit of order book dynamics

In this paper, we establish a fluid limit for a two--sided Markov order book model. Our main result states that in a certain asymptotic regime, a pair of measure-valued processes representing the "sell-side shape" and "buy-side shape" of an order book converges to a pair of deterministic measure-valued processes in a certain sense. We also test our fluid approximation on data. The empirical results suggest that the approximation is reasonably good for liquidly--traded stocks in certain time periods.

preprint2014arXiv

Validity of heavy-traffic steady-state approximations in many-server queues with abandonment

We consider GI/Ph/n+M parallel-server systems with a renewal arrival process, a phase-type service time distribution, n homogenous servers, and an exponential patience time distribution with positive rate. We show that in the Halfin-Whitt regime, the sequence of stationary distributions corresponding to the normalized state processes is tight. As a consequence, we establish an interchange of heavy traffic and steady state limits for GI/Ph/n+M queues.

preprint2013arXiv

Positive recurrence of piecewise Ornstein-Uhlenbeck processes and common quadratic Lyapunov functions

We study the positive recurrence of piecewise Ornstein-Uhlenbeck (OU) diffusion processes, which arise from many-server queueing systems with phase-type service requirements. These diffusion processes exhibit different behavior in two regions of the state space, corresponding to "overload" (service demand exceeds capacity) and "underload" (service capacity exceeds demand). The two regimes cause standard techniques for proving positive recurrence to fail. Using and extending the framework of common quadratic Lyapunov functions from the theory of control, we construct Lyapunov functions for the diffusion approximations corresponding to systems with and without abandonment. With these Lyapunov functions, we prove that piecewise OU processes have a unique stationary distribution.