Source author record

Longbo Huang

Longbo Huang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning Computer Science and Game Theory Networking and Internet Architecture Systems and Control Performance Artificial Intelligence Data Structures and Algorithms Distributed, Parallel, and Cluster Computing Information Theory math.IT math.ST Statistics Theory

Catalog footprint

What is connected

33works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Real-Time Parallel Counterfactual Regret Minimization

Counterfactual Regret Minimization (CFR) is the dominant algorithmic family for solving large imperfect-information games, underpinning breakthroughs such as Libratus and Pluribus in No-Limit Texas Hold'em poker. In real-time game-playing systems, the solver must compute a near-equilibrium strategy within a strict time budget of only a few seconds per decision, and the number of CFR iterations completed in this window directly determines play strength. We present \textbf{Parallel CFR}, the first parallelization framework for real-time depth-limited CFR solving that seamlessly integrates pruning, abstraction, and advanced CFR variants. We decompose each CFR iteration into a pipeline of seven stages and identify two orthogonal dimensions of parallelism: \emph{by information set} and \emph{by tree node}. Leaf node evaluation is offloaded to GPUs via batched neural network inference, creating a heterogeneous CPU--GPU pipeline. Experiments on Heads-Up No-Limit Texas Hold'em demonstrate that Parallel CFR achieves $3.3$--$3.4\times$ speedup over the single-threaded baseline on postflop streets, with per-iteration time of ${\sim}47$--$54$~ms on a depth-limited game tree with over $1$ billion histories. All experiments run on a single desktop-class device (NVIDIA DGX Spark), enabling hundreds of CFR iterations within a typical real-time decision budget without requiring datacenter-scale infrastructure.

preprint2022arXiv

Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

In this paper, we generalize the concept of heavy-tailed multi-armed bandits to adversarial environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi-armed bandits (MAB), where losses have $α$-th ($1<α\le 2$) moments bounded by $σ^α$, while the variances may not exist. Specifically, we design an algorithm \texttt{HTINF}, when the heavy-tail parameters $α$ and $σ$ are known to the agent, \texttt{HTINF} simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori. When $α,σ$ are unknown, \texttt{HTINF} achieves a $\log T$-style instance-dependent regret in stochastic cases and $o(T)$ no-regret guarantee in adversarial cases. We further develop an algorithm \texttt{AdaTINF}, achieving $\mathcal O(σK^{1-\nicefrac 1α}T^{\nicefrac{1}α})$ minimax optimal regret even in adversarial settings, without prior knowledge on $α$ and $σ$. This result matches the known regret lower-bound (Bubeck et al., 2013), which assumed a stochastic environment and $α$ and $σ$ are both known. To our knowledge, the proposed \texttt{HTINF} algorithm is the first to enjoy a best-of-both-worlds regret guarantee, and \texttt{AdaTINF} is the first algorithm that can adapt to both $α$ and $σ$ to achieve optimal gap-indepedent regret bound in classical heavy-tailed stochastic MAB setting and our novel adversarial formulation.

preprint2022arXiv

Effective Multi-User Delay-Constrained Scheduling with Deep Recurrent Reinforcement Learning

Multi-user delay constrained scheduling is important in many real-world applications including wireless communication, live streaming, and cloud computing. Yet, it poses a critical challenge since the scheduler needs to make real-time decisions to guarantee the delay and resource constraints simultaneously without prior information of system dynamics, which can be time-varying and hard to estimate. Moreover, many practical scenarios suffer from partial observability issues, e.g., due to sensing noise or hidden correlation. To tackle these challenges, we propose a deep reinforcement learning (DRL) algorithm, named Recurrent Softmax Delayed Deep Double Deterministic Policy Gradient ($\mathtt{RSD4}$), which is a data-driven method based on a Partially Observed Markov Decision Process (POMDP) formulation. $\mathtt{RSD4}$ guarantees resource and delay constraints by Lagrangian dual and delay-sensitive queues, respectively. It also efficiently tackles partial observability with a memory mechanism enabled by the recurrent neural network (RNN) and introduces user-level decomposition and node-level merging to ensure scalability. Extensive experiments on simulated/real-world datasets demonstrate that $\mathtt{RSD4}$ is robust to system dynamics and partially observable environments, and achieves superior performances over existing DRL and non-DRL-based methods.

preprint2022arXiv

Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably)

Despite the remarkable success of deep multi-modal learning in practice, it has not been well-explained in theory. Recently, it has been observed that the best uni-modal network outperforms the jointly trained multi-modal network, which is counter-intuitive since multiple signals generally bring more information. This work provides a theoretical explanation for the emergence of such performance gap in neural networks for the prevalent joint training framework. Based on a simplified data distribution that captures the realistic property of multi-modal data, we prove that for the multi-modal late-fusion network with (smoothed) ReLU activation trained jointly by gradient descent, different modalities will compete with each other. The encoder networks will learn only a subset of modalities. We refer to this phenomenon as modality competition. The losing modalities, which fail to be discovered, are the origins where the sub-optimality of joint training comes from. Experimentally, we illustrate that modality competition matches the intrinsic behavior of late-fusion joint training.

preprint2022arXiv

Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification

Conservatism has led to significant progress in offline reinforcement learning (RL) where an agent learns from pre-collected datasets. However, as many real-world scenarios involve interaction among multiple agents, it is important to resolve offline RL in the multi-agent setting. Given the recent success of transferring online RL algorithms to the multi-agent setting, one may expect that offline RL algorithms will also transfer to multi-agent settings directly. Surprisingly, we empirically observe that conservative offline RL algorithms do not work well in the multi-agent setting -- the performance degrades significantly with an increasing number of agents. Towards mitigating the degradation, we identify a key issue that non-concavity of the value function makes the policy gradient improvements prone to local optima. Multiple agents exacerbate the problem severely, since the suboptimal policy by any agent can lead to uncoordinated global failure. Following this intuition, we propose a simple yet effective method, Offline Multi-Agent RL with Actor Rectification (OMAR), which combines the first-order policy gradients and zeroth-order optimization methods to better optimize the conservative value functions over the actor parameters. Despite the simplicity, OMAR achieves state-of-the-art results in a variety of multi-agent control tasks.

preprint2022arXiv

Provable Generalization of Overparameterized Meta-learning Trained with SGD

Despite the superior empirical success of deep meta-learning, theoretical understanding of overparameterized meta-learning is still limited. This paper studies the generalization of a widely used meta-learning approach, Model-Agnostic Meta-Learning (MAML), which aims to find a good initialization for fast adaptation to new tasks. Under a mixed linear regression model, we analyze the generalization properties of MAML trained with SGD in the overparameterized regime. We provide both upper and lower bounds for the excess risk of MAML, which captures how SGD dynamics affect these generalization bounds. With such sharp characterizations, we further explore how various learning parameters impact the generalization capability of overparameterized MAML, including explicitly identifying typical data and task distributions that can achieve diminishing generalization error with overparameterization, and characterizing the impact of adaptation learning rate on both excess risk and the early stopping time. Our theoretical findings are further validated by experiments.

preprint2020arXiv

Combinatorial Pure Exploration of Dueling Bandit

In this paper, we study combinatorial pure exploration for dueling bandits (CPE-DB): we have multiple candidates for multiple positions as modeled by a bipartite graph, and in each round we sample a duel of two candidates on one position and observe who wins in the duel, with the goal of finding the best candidate-position matching with high probability after multiple rounds of samples. CPE-DB is an adaptation of the original combinatorial pure exploration for multi-armed bandit (CPE-MAB) problem to the dueling bandit setting. We consider both the Borda winner and the Condorcet winner cases. For Borda winner, we establish a reduction of the problem to the original CPE-MAB setting and design PAC and exact algorithms that achieve both the sample complexity similar to that in the CPE-MAB setting (which is nearly optimal for a subclass of problems) and polynomial running time per round. For Condorcet winner, we first design a fully polynomial time approximation scheme (FPTAS) for the offline problem of finding the Condorcet winner with known winning probabilities, and then use the FPTAS as an oracle to design a novel pure exploration algorithm ${\sf CAR}$-${\sf Cond}$ with sample complexity analysis. ${\sf CAR}$-${\sf Cond}$ is the first algorithm with polynomial running time per round for identifying the Condorcet winner in CPE-DB.

preprint2020arXiv

Multi-Path Policy Optimization

Recent years have witnessed a tremendous improvement of deep reinforcement learning. However, a challenging problem is that an agent may suffer from inefficient exploration, particularly for on-policy methods. Previous exploration methods either rely on complex structure to estimate the novelty of states, or incur sensitive hyper-parameters causing instability. We propose an efficient exploration method, Multi-Path Policy Optimization (MPPO), which does not incur high computation cost and ensures stability. MPPO maintains an efficient mechanism that effectively utilizes a population of diverse policies to enable better exploration, especially in sparse environments. We also give a theoretical guarantee of the stable performance. We build our scheme upon two widely-adopted on-policy methods, the Trust-Region Policy Optimization algorithm and Proximal Policy Optimization algorithm. We conduct extensive experiments on several MuJoCo tasks and their sparsified variants to fairly evaluate the proposed method. Results show that MPPO significantly outperforms state-of-the-art exploration methods in terms of both sample efficiency and final performance.

preprint2016arXiv

A multi-layer market for vehicle-to-grid energy trading in the smart grid

In this work, we propose a multi-layer market for vehicle-to-grid energy trading. In the macro layer, we consider a double auction mechanism, under which the utility company act as an auctioneer and energy buyers and sellers interact. This double auction mechanism is strategy-proof and converges asymptotically. In the micro layer, the aggregators, which are the sellers in the macro layer, are paid with commissions to sell the energy of plug-in hybrid electric vehicles (PHEVs) and to maximize their utilities. We analyze the interaction between the macro and micro layers and study some simplified cases. Depending on the elasticity of supply and demand, the utility is analyzed under different scenarios. Simulation results show that our approach can significantly increase the utility of PHEVs.

preprint2016arXiv

Age-of-Information in the Presence of Error

We consider the peak age-of-information (PAoI) in an M/M/1 queueing system with packet delivery error, i.e., update packets can get lost during transmissions to their destination. We focus on two types of policies, one is to adopt Last-Come-First-Served (LCFS) scheduling, and the other is to utilize retransmissions, i.e., keep transmitting the most recent packet. Both policies can effectively avoid the queueing delay of a busy channel and ensure a small PAoI. Exact PAoI expressions under both policies with different error probabilities are derived, including First-Come-First-Served (FCFS), LCFS with preemptive priority, LCFS with non-preemptive priority, Retransmission with preemptive priority, and Retransmission with non-preemptive priority. Numerical results obtained from analysis and simulation are presented to validate our results.

preprint2016arXiv

Market Share Analysis with Brand Effect

In this paper, we investigate the effect of brand in market competition. Specifically, we propose a variant Hotelling model where companies and customers are represented by points in an Euclidean space, with axes being product features. $N$ companies compete to maximize their own profits by optimally choosing their prices, while each customer in the market, when choosing sellers, considers the sum of product price, discrepancy between product feature and his preference, and a company's brand name, which is modeled by a function of its market area of the form $-β\cdot\text{(Market Area)}^q$, where $β$ captures the brand influence and $q$ captures how market share affects the brand. By varying the parameters $β$ and $q$, we derive existence results of Nash equilibrium and equilibrium market prices and shares. In particular, we prove that pure Nash equilibrium always exists when $q=0$ for markets with either one and two dominating features, and it always exists in a single dominating feature market when market affects brand name linearly, i.e., $q=1$. Moreover, we show that at equilibrium, a company's price is proportional to its market area over the competition intensity with its neighbors, a result that quantitatively reconciles the common belief of a company's pricing power. We also study an interesting "wipe out" phenomenon that only appears when $q>0$, which is similar to the "undercut" phenomenon in the Hotelling model, where companies may suddenly lose the entire market area with a small price increment. Our results offer novel insight into market pricing and positioning under competition with brand effect.

preprint2016arXiv

Prices and Subsidies in the Sharing Economy

The growth of the sharing economy is driven by the emergence of sharing platforms, e.g., Uber and Lyft, that match owners looking to share their resources with customers looking to rent them. The design of such platforms is a complex mixture of economics and engineering, and how to "optimally" design such platforms is still an open problem. In this paper, we focus on the design of prices and subsidies in sharing platforms. Our results provide insights into the tradeoff between revenue maximizing prices and social welfare maximizing prices. Specifically, we introduce a novel model of sharing platforms and characterize the profit and social welfare maximizing prices in this model. Further, we bound the efficiency loss under profit maximizing prices, showing that there is a strong alignment between profit and efficiency in practical settings. Our results highlight that the revenue of platforms may be limited in practice due to supply shortages; thus platforms have a strong incentive to encourage sharing via subsidies. We provide an analytic characterization of when such subsidies are valuable and show how to optimize the size of the subsidy provided. Finally, we validate the insights from our analysis using data from Didi Chuxing, the largest ridesharing platform in China.

preprint2016arXiv

System Intelligence: Model, Bounds and Algorithms

We present a general framework for understanding system intelligence, i.e., the level of system smartness perceived by users, and propose a novel metric for measuring intelligence levels of dynamical human-in-the-loop systems, defined to be the maximum average reward obtained by proactively serving user demands, subject to a resource constraint. Our metric captures two important elements of smartness, i.e., being able to know what users want and pre-serve them, and achieving good resource management while doing so. We provide an explicit characterization of the system intelligence, and show that it is jointly determined by user demand volume (opportunity to impress), demand correlation (user predictability), and system resource and action costs (flexibility to pre-serve). We then propose an online learning-aided control algorithm called Learning-aided Budget-limited Intelligent System Control (\mtt{LBISC}). We show that \lbisc{} achieves an intelligence level that is within $O(N(T)^{-\frac{1}{2}}+ε)$ of the highest level, where $N(T)$ represents the number of data samples collected within a learning period $T$ and is proportional to the user population size in the system, while guaranteeing an $O(\max( N(T)^{-\frac{1}{2}}/ε, \log(1/ε)^2))$ average resource deficit. Moreover, we show that \lbisc{} possesses an $O(\max( N(T)^{-\frac{1}{2}}/ε$, $ \log(1/ε)^2)+T)$ convergence time, which is much smaller compared to the $Θ(1/ε)$ time required for non-learning based algorithms. The analysis of \lbisc{} rigorously quantifies the impacts of data and user population (captured by $N(T)$), learning (captured by our learning method), and control (captured by \lbisc) on achievable system intelligence, and provides novel insight and guideline into designing future smart systems.

preprint2016arXiv

Time-Average Optimization with Non-Convex Decision Set and Its Convergence

This paper considers time-average optimization, where a decision vector is chosen every time step within a (possibly non-convex) set, and the goal is to minimize a convex function of the time averages subject to convex constraints on these averages. Such problems have applications in networking, multi-agent systems, and operations research, where decisions are constrained to a discrete set and the decision average can represent average bit rates or average agent actions. This time-average optimization extends traditional convex formulations to allow a non-convex decision set. This class of problems can be solved by Lyapunov optimization. A simple drift-based algorithm, related to a classical dual subgradient algorithm, converges to an $ε$-optimal solution within $O(1/ε^2)$ time steps. Further, the algorithm is shown to have a transient phase and a steady state phase which can be exploited to improve convergence rates to $O(1/ε)$ and $O(1/{ε^{1.5}})$ when vectors of Lagrange multipliers satisfy locally-polyhedral and locally-smooth assumptions respectively. Practically, this improved convergence suggests that decisions should be implemented after the transient period.

preprint2016arXiv

Two-Scale Stochastic Control for Multipoint Communication Systems with Renewables

Increasing threats of global warming and climate changes call for an energy-efficient and sustainable design of future wireless communication systems. To this end, a novel two-scale stochastic control framework is put forth for smart-grid powered coordinated multi-point (CoMP) systems. Taking into account renewable energy sources (RES), dynamic pricing, two-way energy trading facilities and imperfect energy storage devices, the energy management task is formulated as an infinite-horizon optimization problem minimizing the time-average energy transaction cost, subject to the users' quality of service (QoS) requirements. Leveraging the Lyapunov optimization approach as well as the stochastic subgradient method, a two-scale online control (TS-OC) approach is developed for the resultant smart-grid powered CoMP systems. Using only historical data, the proposed TS-OC makes online control decisions at two timescales, and features a provably feasible and asymptotically near-optimal solution. Numerical tests further corroborate the theoretical analysis, and demonstrate the merits of the proposed approach.

preprint2015arXiv

Fast-Convergent Learning-aided Control in Energy Harvesting Networks

In this paper, we present a novel learning-aided energy management scheme ($\mathtt{LEM}$) for multihop energy harvesting networks. Different from prior works on this problem, our algorithm explicitly incorporates information learning into system control via a step called \emph{perturbed dual learning}. $\mathtt{LEM}$ does not require any statistical information of the system dynamics for implementation, and efficiently resolves the challenging energy outage problem. We show that $\mathtt{LEM}$ achieves the near-optimal $[O(ε), O(\log(1/ε)^2)]$ utility-delay tradeoff with an $O(1/ε^{1-c/2})$ energy buffers ($c\in(0,1)$). More interestingly, $\mathtt{LEM}$ possesses a \emph{convergence time} of $O(1/ε^{1-c/2} +1/ε^c)$, which is much faster than the $Θ(1/ε)$ time of pure queue-based techniques or the $Θ(1/ε^2)$ time of approaches that rely purely on learning the system statistics. This fast convergence property makes $\mathtt{LEM}$ more adaptive and efficient in resource allocation in dynamic environments. The design and analysis of $\mathtt{LEM}$ demonstrate how system control algorithms can be augmented by learning and what the benefits are. The methodology and algorithm can also be applied to similar problems, e.g., processing networks, where nodes require nonzero amount of contents to support their actions.

preprint2015arXiv

Optimizing Age-of-Information in a Multi-class Queueing System

We consider the age-of-information in a multi-class $M/G/1$ queueing system, where each class generates packets containing status information. Age of information is a relatively new metric that measures the amount of time that elapsed between status updates, thus accounting for both the queueing delay and the delay between packet generation. This gives rise to a tradeoff between frequency of status updates, and queueing delay. In this paper, we study this tradeoff in a system with heterogenous users modeled as a multi-class $M/G/1$ queue. To this end, we derive the exact peak age-of-Information (PAoI) profile of the system, which measures the "freshness" of the status information. We then seek to optimize the age of information, by formulating the problem using quasiconvex optimization, and obtain structural properties of the optimal solution.

preprint2015arXiv

Power-Delay Tradeoff with Predictive Scheduling in Integrated Cellular and Wi-Fi Networks

The explosive growth of global mobile traffic has lead to a rapid growth in the energy consumption in communication networks. In this paper, we focus on the energy-aware design of the network selection, subchannel, and power allocation in cellular and Wi-Fi networks, while taking into account the traffic delay of mobile users. The problem is particularly challenging due to the two-timescale operations for the network selection (large timescale) and subchannel and power allocation (small timescale). Based on the two-timescale Lyapunov optimization technique, we first design an online Energy-Aware Network Selection and Resource Allocation (ENSRA) algorithm. The ENSRA algorithm yields a power consumption within O(1/V) bound of the optimal value, and guarantees an O(V) traffic delay for any positive control parameter V. Motivated by the recent advancement in the accurate estimation and prediction of user mobility, channel conditions, and traffic demands, we further develop a novel predictive Lyapunov optimization technique to utilize the predictive information, and propose a Predictive Energy-Aware Network Selection and Resource Allocation (P-ENSRA) algorithm. We characterize the performance bounds of P-ENSRA in terms of the power-delay tradeoff theoretically. To reduce the computational complexity, we finally propose a Greedy Predictive Energy-Aware Network Selection and Resource Allocation (GP-ENSRA) algorithm, where the operator solves the problem in P-ENSRA approximately and iteratively. Numerical results show that GP-ENSRA significantly improves the power-delay performance over ENSRA in the large delay regime. For a wide range of system parameters, GP-ENSRA reduces the traffic delay over ENSRA by 20~30% under the same power consumption.

preprint2015arXiv

The Value-of-Information in Matching with Queues

We consider the problem of \emph{optimal matching with queues} in dynamic systems and investigate the value-of-information. In such systems, the operators match tasks and resources stored in queues, with the objective of maximizing the system utility of the matching reward profile, minus the average matching cost. This problem appears in many practical systems and the main challenges are the no-underflow constraints, and the lack of matching-reward information and system dynamics statistics. We develop two online matching algorithms: Learning-aided Reward optimAl Matching ($\mathtt{LRAM}$) and Dual-$\mathtt{LRAM}$ ($\mathtt{DRAM}$) to effectively resolve both challenges. Both algorithms are equipped with a learning module for estimating the matching-reward information, while $\mathtt{DRAM}$ incorporates an additional module for learning the system dynamics. We show that both algorithms achieve an $O(ε+δ_r)$ close-to-optimal utility performance for any $ε>0$, while $\mathtt{DRAM}$ achieves a faster convergence speed and a better delay compared to $\mathtt{LRAM}$, i.e., $O(δ_{z}/ε+ \log(1/ε)^2))$ delay and $O(δ_z/ε)$ convergence under $\mathtt{DRAM}$ compared to $O(1/ε)$ delay and convergence under $\mathtt{LRAM}$ ($δ_r$ and $δ_z$ are maximum estimation errors for reward and system dynamics). Our results reveal that information of different system components can play very different roles in algorithm performance and provide a systematic way for designing joint learning-control algorithms for dynamic systems.

preprint2014arXiv

Optimizing Your Online-Advertisement Asynchronously

We consider the problem of designing optimal online-ad investment strategies for a single advertiser, who invests at multiple sponsored search sites simultaneously, with the objective of maximizing his average revenue subject to the advertising budget constraint. A greedy online investment scheme is developed to achieve an average revenue that can be pushed to within $O(ε)$ of the optimal, for any $ε>0$, with a tradeoff that the temporal budget violation is $O(1/ε)$. Different from many existing algorithms, our scheme allows the advertiser to \emph{asynchronously} update his investments on each search engine site, hence applies to systems where the timescales of action update intervals are heterogeneous for different sites. We also quantify the impact of inaccurate estimation of the system dynamics and show that the algorithm is robust against imperfect system knowledge.

preprint2014arXiv

The Multi-shop Ski Rental Problem

We consider the {\em multi-shop ski rental} problem. This problem generalizes the classic ski rental problem to a multi-shop setting, in which each shop has different prices for renting and purchasing a pair of skis, and a \emph{consumer} has to make decisions on when and where to buy. We are interested in the {\em optimal online (competitive-ratio minimizing) mixed strategy} from the consumer's perspective. For our problem in its basic form, we obtain exciting closed-form solutions and a linear time algorithm for computing them. We further demonstrate the generality of our approach by investigating three extensions of our basic problem, namely ones that consider costs incurred by entering a shop or switching to another shop. Our solutions to these problems suggest that the consumer must assign positive probability in \emph{exactly one} shop at any buying time. Our results apply to many real-world applications, ranging from cost management in \texttt{IaaS} cloud to scheduling in distributed computing.

preprint2014arXiv

The Power of Online Learning in Stochastic Network Optimization

In this paper, we investigate the power of online learning in stochastic network optimization with unknown system statistics {\it a priori}. We are interested in understanding how information and learning can be efficiently incorporated into system control techniques, and what are the fundamental benefits of doing so. We propose two \emph{Online Learning-Aided Control} techniques, $\mathtt{OLAC}$ and $\mathtt{OLAC2}$, that explicitly utilize the past system information in current system control via a learning procedure called \emph{dual learning}. We prove strong performance guarantees of the proposed algorithms: $\mathtt{OLAC}$ and $\mathtt{OLAC2}$ achieve the near-optimal $[O(ε), O([\log(1/ε)]^2)]$ utility-delay tradeoff and $\mathtt{OLAC2}$ possesses an $O(ε^{-2/3})$ convergence time. $\mathtt{OLAC}$ and $\mathtt{OLAC2}$ are probably the first algorithms that simultaneously possess explicit near-optimal delay guarantee and sub-linear convergence time. Simulation results also confirm the superior performance of the proposed algorithms in practice. To the best of our knowledge, our attempt is the first to explicitly incorporate online learning into stochastic network optimization and to demonstrate its power in both theory and practice.

preprint2014arXiv

When Queueing Meets Coding: Optimal-Latency Data Retrieving Scheme in Storage Clouds

In this paper, we study the problem of reducing the delay of downloading data from cloud storage systems by leveraging multiple parallel threads, assuming that the data has been encoded and stored in the clouds using fixed rate forward error correction (FEC) codes with parameters (n, k). That is, each file is divided into k equal-sized chunks, which are then expanded into n chunks such that any k chunks out of the n are sufficient to successfully restore the original file. The model can be depicted as a multiple-server queue with arrivals of data retrieving requests and a server corresponding to a thread. However, this is not a typical queueing model because a server can terminate its operation, depending on when other servers complete their service (due to the redundancy that is spread across the threads). Hence, to the best of our knowledge, the analysis of this queueing model remains quite uncharted. Recent traces from Amazon S3 show that the time to retrieve a fixed size chunk is random and can be approximated as a constant delay plus an i.i.d. exponentially distributed random variable. For the tractability of the theoretical analysis, we assume that the chunk downloading time is i.i.d. exponentially distributed. Under this assumption, we show that any work-conserving scheme is delay-optimal among all on-line scheduling schemes when k = 1. When k > 1, we find that a simple greedy scheme, which allocates all available threads to the head of line request, is delay optimal among all on-line scheduling schemes. We also provide some numerical results that point to the limitations of the exponential assumption, and suggest further research directions.

preprint2013arXiv

When Backpressure Meets Predictive Scheduling

Motivated by the increasing popularity of learning and predicting human user behavior in communication and computing systems, in this paper, we investigate the fundamental benefit of predictive scheduling, i.e., predicting and pre-serving arrivals, in controlled queueing systems. Based on a lookahead window prediction model, we first establish a novel equivalence between the predictive queueing system with a \emph{fully-efficient} scheduling scheme and an equivalent queueing system without prediction. This connection allows us to analytically demonstrate that predictive scheduling necessarily improves system delay performance and can drive it to zero with increasing prediction power. We then propose the \textsf{Predictive Backpressure (PBP)} algorithm for achieving optimal utility performance in such predictive systems. \textsf{PBP} efficiently incorporates prediction into stochastic system control and avoids the great complication due to the exponential state space growth in the prediction window size. We show that \textsf{PBP} can achieve a utility performance that is within $O(ε)$ of the optimal, for any $ε>0$, while guaranteeing that the system delay distribution is a \emph{shifted-to-the-left} version of that under the original Backpressure algorithm. Hence, the average packet delay under \textsf{PBP} is strictly better than that under Backpressure, and vanishes with increasing prediction window size. This implies that the resulting utility-delay tradeoff with predictive scheduling beats the known optimal $[O(ε), O(\log(1/ε))]$ tradeoff for systems without prediction.

preprint2012arXiv

A Benes Packet Network

Benes networks are constructed with simple switch modules and have many advantages, including small latency and requiring only an almost linear number of switch modules. As circuit-switches, Benes networks are rearrangeably non-blocking, which implies that they are full-throughput as packet switches, with suitable routing. Routing in Benes networks can be done by time-sharing permutations. However, this approach requires centralized control of the switch modules and statistical knowledge of the traffic arrivals. We propose a backpressure-based routing scheme for Benes networks, combined with end-to-end congestion control. This approach achieves the maximal utility of the network and requires only four queues per module, independently of the size of the network.

preprint2012arXiv

Codes Can Reduce Queueing Delay in Data Centers

In this paper, we quantify how much codes can reduce the data retrieval latency in storage systems. By combining a simple linear code with a novel request scheduling algorithm, which we call Blocking-one Scheduling (BoS), we show analytically that it is possible to reduce data retrieval delay by up to 17% over currently popular replication-based strategies. Although in this work we focus on a simplified setting where the storage system stores a single content, the methodology developed can be applied to more general settings with multiple contents. The results also offer insightful guidance to the design of storage systems in data centers and content distribution networks.

preprint2012arXiv

Optimal Demand Response with Energy Storage Management

In this paper, we consider the problem of optimal demand response and energy storage management for a power consuming entity. The entity's objective is to find an optimal control policy for deciding how much load to consume, how much power to purchase from/sell to the power grid, and how to use the finite capacity energy storage device and renewable energy, to minimize his average cost, being the disutility due to load- shedding and cost for purchasing power. Due to the coupling effect of the finite size energy storage, such problems are challenging and are typically tackled using dynamic programming, which is often complex in computation and requires substantial statistical information of the system dynamics. We instead develop a low-complexity algorithm called Demand Response with Energy Storage Management (DR-ESM). DR-ESM does not require any statistical knowledge of the system dynamics, including the renewable energy and the power prices. It only requires the entity to solve a small convex optimization program with 6 variables and 6 linear constraints every time for decision making. We prove that DR-ESM is able to achieve near-optimal performance and explicitly compute the required energy storage size.

preprint2011arXiv

LIFO-Backpressure Achieves Near Optimal Utility-Delay Tradeoff

There has been considerable recent work developing a new stochastic network utility maximization framework using Backpressure algorithms, also known as MaxWeight. A key open problem has been the development of utility-optimal algorithms that are also delay efficient. In this paper, we show that the Backpressure algorithm, when combined with the LIFO queueing discipline (called LIFO-Backpressure), is able to achieve a utility that is within $O(1/V)$ of the optimal value, while maintaining an average delay of $O([\log(V)]^2)$ for all but a tiny fraction of the network traffic. This result holds for general stochastic network optimization problems and general Markovian dynamics. Remarkably, the performance of LIFO-Backpressure can be achieved by simply changing the queueing discipline; it requires no other modifications of the original Backpressure algorithm. We validate the results through empirical measurements from a sensor network testbed, which show good match between theory and practice.

preprint2011arXiv

Optimal Power Procurement and Demand Response with Quality-of-Usage Guarantees

In this paper, we propose a general operating scheme which allows the utility company to jointly perform power procurement and demand response so as to maximize the social welfare. Our model takes into consideration the effect of the renewable energy and the multi-stage feature of the power procurement process. It also enables the utility company to provide quality-of-usage (QoU) guarantee to the power consumers, which ensures that the average power usage level meets the target value for each user. To maximize the social welfare, we develop a low-complexity algorithm called the \emph{welfare maximization algorithm} (WMA), which performs joint power procurement and dynamic pricing. WMA is constructed based on a two-timescale Lyapunov optimization technique. We prove that WMA achieves a close-to-optimal utility and ensures that the QoU requirement is met with bounded deficit. WMA can be implemented in a distributed manner and is robust with respect to system dynamics uncertainty.

preprint2010arXiv

Dynamic Product Assembly and Inventory Control for Maximum Profit

We consider a manufacturing plant that purchases raw materials for product assembly and then sells the final products to customers. There are M types of raw materials and K types of products, and each product uses a certain subset of raw materials for assembly. The plant operates in slotted time, and every slot it makes decisions about re-stocking materials and pricing the existing products in reaction to (possibly time-varying) material costs and consumer demands. We develop a dynamic purchasing and pricing policy that yields time average profit within epsilon of optimality, for any given epsilon>0, with a worst case storage buffer requirement that is O(1/epsilon). The policy can be implemented easily for large M, K, yields fast convergence times, and is robust to non-ergodic system dynamics.

preprint2010arXiv

Max-Weight Achieves the Exact $[O(1/V), O(V)]$ Utility-Delay Tradeoff Under Markov Dynamics

In this paper, we show that the Quadratic Lyapunov function based Algorithm (QLA, also known as MaxWeight or Backpressure) achieves an exact $[O(1/V), O(V)]$ utility-delay tradeoff in stochastic network optimization problems with Markovian network dynamics. Note that though the QLA algorithm has been extensively studied, most of the performance results are obtained under i.i.d. network radnomness, and it has not been formally proven that QLA achieves the exact $[O(1/V), O(V)]$ utility-delay tradeoff under Markov dynamics. Our analysis uses a combination of duality theory and a variable multi-slot Lyapunov drift argument. The variable multi-slot Lapunov drift argument here is different from previous multi-slot drift analysis, in that the slot number is a random variable corresponding to the renewal time of the network randomness. This variable multi-slot drift argument not only allows us to obtain an exact $[O(1/V), O(V)]$ tradeoff, but also allows us to state the performance of QLA in terms of explicit parameters of the network dynamic process.

preprint2010arXiv

Utility Optimal Scheduling in Energy Harvesting Networks

In this paper, we show how to achieve close-to-optimal utility performance in energy harvesting networks with only finite capacity energy storage devices. In these networks, nodes are capable of harvesting energy from the environment. The amount of energy that can be harvested is time varying and evolves according to some probability law. We develop an \emph{online} algorithm, called the Energy-limited Scheduling Algorithm (ESA), which jointly manages the energy and makes power allocation decisions for packet transmissions. ESA only has to keep track of the amount of energy left at the network nodes and \emph{does not require any knowledge} of the harvestable energy process. We show that ESA achieves a utility that is within $O(ε)$ of the optimal, for any $ε>0$, while ensuring that the network congestion and the required capacity of the energy storage devices are \emph{deterministically} upper bounded by bounds of size $O(1/ε)$. We then also develop the Modified-ESA algorithm (MESA) to achieve the same $O(ε)$ close-to-utility performance, with the average network congestion and the required capacity of the energy storage devices being only $O([\log(1/ε)]^2)$.

preprint2010arXiv

Utility Optimal Scheduling in Processing Networks

We consider the problem of utility optimal scheduling in general \emph{processing networks} with random arrivals and network conditions. These are generalizations of traditional data networks where commodities in one or more queues can be combined to produce new commodities that are delivered to other parts of the network. This can be used to model problems such as in-network data fusion, stream processing, and grid computing. Scheduling actions are complicated by the \emph{underflow problem} that arises when some queues with required components go empty. In this paper, we develop the Perturbed Max-Weight algorithm (PMW) to achieve optimal utility. The idea of PMW is to perturb the weights used by the usual Max-Weight algorithm to ``push'' queue levels towards non-zero values (avoiding underflows). We show that when the perturbations are carefully chosen, PMW is able to achieve a utility that is within $O(1/V)$ of the optimal value for any $V\geq1$, while ensuring an average network backlog of $O(V)$.

Longbo Huang

What is connected

Connect this record

See the researcher in context

Building this map preview

33 published item(s)

Real-Time Parallel Counterfactual Regret Minimization

Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

Effective Multi-User Delay-Constrained Scheduling with Deep Recurrent Reinforcement Learning

Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably)

Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification

Provable Generalization of Overparameterized Meta-learning Trained with SGD

Combinatorial Pure Exploration of Dueling Bandit

Multi-Path Policy Optimization

A multi-layer market for vehicle-to-grid energy trading in the smart grid

Age-of-Information in the Presence of Error

Market Share Analysis with Brand Effect

Prices and Subsidies in the Sharing Economy

System Intelligence: Model, Bounds and Algorithms

Time-Average Optimization with Non-Convex Decision Set and Its Convergence

Two-Scale Stochastic Control for Multipoint Communication Systems with Renewables

Fast-Convergent Learning-aided Control in Energy Harvesting Networks

Optimizing Age-of-Information in a Multi-class Queueing System

Power-Delay Tradeoff with Predictive Scheduling in Integrated Cellular and Wi-Fi Networks

The Value-of-Information in Matching with Queues

Optimizing Your Online-Advertisement Asynchronously

The Multi-shop Ski Rental Problem

The Power of Online Learning in Stochastic Network Optimization

When Queueing Meets Coding: Optimal-Latency Data Retrieving Scheme in Storage Clouds

When Backpressure Meets Predictive Scheduling

A Benes Packet Network

Codes Can Reduce Queueing Delay in Data Centers

Optimal Demand Response with Energy Storage Management

LIFO-Backpressure Achieves Near Optimal Utility-Delay Tradeoff

Optimal Power Procurement and Demand Response with Quality-of-Usage Guarantees

Dynamic Product Assembly and Inventory Control for Maximum Profit

Max-Weight Achieves the Exact $[O(1/V), O(V)]$ Utility-Delay Tradeoff Under Markov Dynamics

Utility Optimal Scheduling in Energy Harvesting Networks

Utility Optimal Scheduling in Processing Networks