Researcher profile

Keqin Liu

Keqin Liu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2013arXiv

Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems

In the Multi-Armed Bandit (MAB) problem, there is a given set of arms with unknown reward models. At each time, a player selects one arm to play, aiming to maximize the total expected reward over a horizon of length T. An approach based on a Deterministic Sequencing of Exploration and Exploitation (DSEE) is developed for constructing sequential arm selection policies. It is shown that for all light-tailed reward distributions, DSEE achieves the optimal logarithmic order of the regret, where regret is defined as the total expected reward loss against the ideal case with known reward models. For heavy-tailed reward distributions, DSEE achieves O(T^1/p) regret when the moments of the reward distributions exist up to the pth order for 1<p<=2 and O(T^1/(1+p/2)) for p>2. With the knowledge of an upperbound on a finite moment of the heavy-tailed reward distributions, DSEE offers the optimal logarithmic regret order. The proposed DSEE approach complements existing work on MAB by providing corresponding results for general reward distributions. Furthermore, with a clearly defined tunable parameter-the cardinality of the exploration sequence, the DSEE approach is easily extendable to variations of MAB, including MAB with various objectives, decentralized MAB with multiple players and incomplete reward observations under collisions, MAB with unknown Markov dynamics, and combinatorial MAB with dependent arms that often arise in network optimization problems such as the shortest path, the minimum spanning, and the dominating set problems under unknown random weights.

preprint2012arXiv

Adaptive Shortest-Path Routing under Unknown and Stochastically Varying Link States

We consider the adaptive shortest-path routing problem in wireless networks under unknown and stochastically varying link states. In this problem, we aim to optimize the quality of communication between a source and a destination through adaptive path selection. Due to the randomness and uncertainties in the network dynamics, the quality of each link varies over time according to a stochastic process with unknown distributions. After a path is selected for communication, the aggregated quality of all links on this path (e.g., total path delay) is observed. The quality of each individual link is not observable. We formulate this problem as a multi-armed bandit with dependent arms. We show that by exploiting arm dependencies, a regret polynomial with network size can be achieved while maintaining the optimal logarithmic order with time. This is in sharp contrast with the exponential regret order with network size offered by a direct application of the classic MAB policies that ignore arm dependencies. Furthermore, our results are obtained under a general model of link-quality distributions (including heavy-tailed distributions) and find applications in cognitive radio and ad hoc networks with unknown and dynamic communication environments.

preprint2012arXiv

Distributed Flow Scheduling in an Unknown Environment

Flow scheduling tends to be one of the oldest and most stubborn problems in networking. It becomes more crucial in the next generation network, due to fast changing link states and tremendous cost to explore the global structure. In such situation, distributed algorithms often dominate. In this paper, we design a distributed virtual game to solve the flow scheduling problem and then generalize it to situations of unknown environment, where online learning schemes are utilized. In the virtual game, we use incentives to stimulate selfish users to reach a Nash Equilibrium Point which is valid based on the analysis of the `Price of Anarchy&#39;. In the unknown-environment generalization, our ultimate goal is the minimization of cost in the long run. In order to achieve balance between exploration of routing cost and exploitation based on limited information, we model this problem based on Multi-armed Bandit Scenario and combined newly proposed DSEE with the virtual game design. Armed with these powerful tools, we find a totally distributed algorithm to ensure the logarithmic growing of regret with time, which is optimum in classic Multi-armed Bandit Problem. Theoretical proof and simulation results both affirm this claim. To our knowledge, this is the first research to combine multi-armed bandit with distributed flow scheduling.

preprint2011arXiv

Dynamic Intrusion Detection in Resource-Constrained Cyber Networks

We consider a large-scale cyber network with N components (e.g., paths, servers, subnets). Each component is either in a healthy state (0) or an abnormal state (1). Due to random intrusions, the state of each component transits from 0 to 1 over time according to certain stochastic process. At each time, a subset of K (K < N) components are checked and those observed in abnormal states are fixed. The objective is to design the optimal scheduling for intrusion detection such that the long-term network cost incurred by all abnormal components is minimized. We formulate the problem as a special class of Restless Multi-Armed Bandit (RMAB) process. A general RMAB suffers from the curse of dimensionality (PSPACE-hard) and numerical methods are often inapplicable. We show that, for this class of RMAB, Whittle index exists and can be obtained in closed form, leading to a low-complexity implementation of Whittle index policy with a strong performance. For homogeneous components, Whittle index policy is shown to have a simple structure that does not require any prior knowledge on the intrusion processes. Based on this structure, Whittle index policy is further shown to be optimal over a finite time horizon with an arbitrary length. Beyond intrusion detection, these results also find applications in queuing networks with finite-size buffers.

preprint2011arXiv

Learning in A Changing World: Restless Multi-Armed Bandit with Unknown Dynamics

We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics in which a player chooses M out of N arms to play at each time. The reward state of each arm transits according to an unknown Markovian rule when it is played and evolves according to an arbitrary unknown random process when it is passive. The performance of an arm selection policy is measured by regret, defined as the reward loss with respect to the case where the player knows which M arms are the most rewarding and always plays the M best arms. We construct a policy with an interleaving exploration and exploitation epoch structure that achieves a regret with logarithmic order when arbitrary (but nontrivial) bounds on certain system parameters are known. When no knowledge about the system is available, we show that the proposed policy achieves a regret arbitrarily close to the logarithmic order. We further extend the problem to a decentralized setting where multiple distributed players share the arms without information exchange. Under both an exogenous restless model and an endogenous restless model, we show that a decentralized extension of the proposed policy preserves the logarithmic regret order as in the centralized setting. The results apply to adaptive learning in various dynamic systems and communication networks, as well as financial investment.