Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2023arXiv

Provably Efficient Model-Free Constrained RL with Linear Function Approximation

We study the constrained reinforcement learning problem, in which an agent aims to maximize the expected cumulative reward subject to a constraint on the expected total value of a utility function. In contrast to existing model-based approaches or model-free methods accompanied with a `simulator', we aim to develop the first model-free, simulator-free algorithm that achieves a sublinear regret and a sublinear constraint violation even in large-scale systems. To this end, we consider the episodic constrained Markov decision processes with linear function approximation, where the transition dynamics and the reward function can be represented as a linear function of some known feature mapping. We show that $\tilde{\mathcal{O}}(\sqrt{d^3H^3T})$ regret and $\tilde{\mathcal{O}}(\sqrt{d^3H^3T})$ constraint violation bounds can be achieved, where $d$ is the dimension of the feature mapping, $H$ is the length of the episode, and $T$ is the total number of steps. Our bounds are attained without explicitly estimating the unknown transition model or requiring a simulator, and they depend on the state space only through the dimension of the feature mapping. Hence our bounds hold even when the number of states goes to infinity. Our main results are achieved via novel adaptations of the standard LSVI-UCB algorithms. In particular, we first introduce primal-dual optimization into the LSVI-UCB algorithm to balance between regret and constraint violation. More importantly, we replace the standard greedy selection with respect to the state-action function in LSVI-UCB with a soft-max policy. This turns out to be key in establishing uniform concentration for the constrained case via its approximation-smoothness trade-off. We also show that one can achieve an even zero constraint violation while still maintaining the same order with respect to $T$.

preprint2022arXiv

Performance Analysis of Modified SRPT in Multiple-Processor Multitask Scheduling

In this paper we study the multiple-processor multitask scheduling problem in both deterministic and stochastic models, where each job have several tasks and is complete only when all its tasks are finished. We consider and analyze Modified Shortest Remaining Processing Time (M-SRPT) scheduling algorithm, a simple modification of SRPT, which always schedules jobs according to SRPT whenever possible, while processes tasks in an arbitrary order. The M-SRPT algorithm is proved to achieve a competitive ratio of $Θ(\log α+β)$ for minimizing response time, where $α$ denotes the ratio between maximum job workload and minimum job workload, $β$ represents the ratio between maximum non-preemptive task workload and minimum job workload. In addition, the competitive ratio achieved is shown to be optimal (up to a constant factor), when there are constant number of machines. We further consider the problem under Poisson arrival and general workload distribution (\ie, M/GI/$N$ system), and show that M-SRPT achieves asymptotic optimal mean response time when the traffic intensity $ρ$ approaches $1$, if job size distribution has finite support. Beyond finite job workload, the asymptotic optimality of M-SRPT also holds for infinite job size distributions with certain probabilistic assumptions, for example, M/M/$N$ system with finite task workload. As a special case, we show that M-SRPT is asymptotic optimal in M/M/$1$ model, in which the task size distribution is allowed to have infinite support.

preprint2022arXiv

Weighted Gaussian Process Bandits for Non-stationary Environments

In this paper, we consider the Gaussian process (GP) bandit optimization problem in a non-stationary environment. To capture external changes, the black-box function is allowed to be time-varying within a reproducing kernel Hilbert space (RKHS). To this end, we develop WGP-UCB, a novel UCB-type algorithm based on weighted Gaussian process regression. A key challenge is how to cope with infinite-dimensional feature maps. To that end, we leverage kernel approximation techniques to prove a sublinear regret bound, which is the first (frequentist) sublinear regret guarantee on weighted time-varying bandits with general nonlinear rewards. This result generalizes both non-stationary linear bandits and standard GP-UCB algorithms. Further, a novel concentration inequality is achieved for weighted Gaussian process regression with general weights. We also provide universal upper bounds and weight-dependent upper bounds for weighted maximum information gains. These results are of independent interest for applications such as news ranking and adaptive pricing, where weights can be adopted to capture the importance or quality of data. Finally, we conduct experiments to highlight the favorable gains of the proposed algorithm in many cases when compared to existing methods.

preprint2020arXiv

A Note on Load Balancing in Many-Server Heavy-Traffic Regime

In this note, we apply Stein's method to analyze the performance of general load balancing schemes in the many-server heavy-traffic regime. In particular, consider a load balancing system of $N$ servers and the distance of arrival rate to the capacity region is given by $N^{1-α}$ with $α> 1$. We are interested in the performance as $N$ goes to infinity under a large class of policies. We establish different asymptotics under different scalings and conditions. Specifically, (i) If the second moments linearly increase with $N$ with coefficients $σ_a^2$ and $ν_s^2$, then for any $α> 4$, the distribution of the sum queue length scaled by $N^{-α}$ converges to an exponential random variable with mean $\frac{σ_a^2 + ν_s^2}{2}$. (3) If the second moments quadratically increase with $N$ with coefficients $\tildeσ_a^2$ and $\tildeν_s^2$, then for any $α> 3$, the distribution of the sum queue length scaled by $N^{-α-1}$ converges to an exponential random variable with mean $\frac{\tildeσ_a^2 + \tildeν_s^2}{2}$. Both results are simple applications of our previously developed framework of Stein's method for heavy-traffic analysis in \cite{zhou2020note}.

preprint2020arXiv

ACOUSTIC-TURF: Acoustic-based Privacy-Preserving COVID-19 Contact Tracing

In this paper, we propose a new privacy-preserving, automated contact tracing system, ACOUSTIC-TURF, to fight COVID-19 using acoustic signals sent from ubiquitous mobile devices. At a high level, ACOUSTIC-TURF adaptively broadcasts inaudible ultrasonic signals with randomly generated IDs in the vicinity. Simultaneously, the system receives other ultrasonic signals sent from nearby (e.g., 6 feet) users. In such a system, individual user IDs are not disclosed to others and the system can accurately detect encounters in physical proximity with 6-foot granularity. We have implemented a prototype of ACOUSTIC-TURF on Android and evaluated its performance in terms of acoustic-signal-based encounter detection accuracy and power consumption at different ranges and under various occlusion scenarios. Experimental results show that ACOUSTIC-TURF can detect multiple contacts within a 6-foot range for mobile phones placed in pockets and outside pockets. Furthermore, our acoustic-signal-based system achieves greater precision than wireless-signal-based approaches when contact tracing is performed through walls. ACOUSTIC-TURF correctly determines that people on opposite sides of a wall are not in contact with one another, whereas the Bluetooth-based approaches detect nonexistent contacts among them.

preprint2020arXiv

Asymptotically Optimal Load Balancing in Large-scale Heterogeneous Systems with Multiple Dispatchers

We consider the load balancing problem in large-scale heterogeneous systems with multiple dispatchers. We introduce a general framework called Local-Estimation-Driven (LED). Under this framework, each dispatcher keeps local (possibly outdated) estimates of queue lengths for all the servers, and the dispatching decision is made purely based on these local estimates. The local estimates are updated via infrequent communications between dispatchers and servers. We derive sufficient conditions for LED policies to achieve throughput optimality and delay optimality in heavy-traffic, respectively. These conditions directly imply delay optimality for many previous local-memory based policies in heavy traffic. Moreover, the results enable us to design new delay optimal policies for heterogeneous systems with multiple dispatchers. Finally, the heavy-traffic delay optimality of the LED framework directly resolves a recent open problem on how to design optimal load balancing schemes using delayed information.

preprint2020arXiv

Is Deadline Oblivious Scheduling Efficient for Controlling Real-Time Traffic in Cellular Downlink Systems?

The emergence of bandwidth-intensive latency-critical traffic in 5G Networks, such as Virtual Reality, has motivated interest in wireless resource allocation problems for flows with hard-deadlines. Attempting to solve this problem brings about two challenges: (i) The flow arrival and the channel state are not known to the Base Station (BS) apriori, thus, the allocation decisions need to be made online. (ii) Wireless resource allocation algorithms that attempt to maximize a reward will likely be unfair, causing unacceptable service for some users. We model the problem as an online convex optimization problem. We propose a primal-dual Deadline-Oblivious (DO) algorithm, and show it is approximately 3.6-competitive. Furthermore, we show via simulations that our algorithm tracks the prescient offline solution very closely, significantly outperforming several existing algorithms. In the second part, we impose a stochastic constraint on the allocation, requiring a guarantee that each user achieves a certain timely throughput (amount of traffic delivered within the deadline over a period of time). We propose the Long-term Fair Deadline Oblivious (LFDO) algorithm for that setup. We combine the Lyapunov framework with analysis of online algorithms, to show that LFDO retains the high-performance of DO, while satisfying the long-term stochastic constraints.