Source author record

Mingyan Liu

Mingyan Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Science and Game Theory math.OC Systems and Control Networking and Internet Architecture Information Theory math.IT Cryptography and Security math.PR Artificial Intelligence Computational Engineering, Finance, and Science Social and Information Networks econ.TH Multiagent Systems physics.soc-ph

Catalog footprint

What is connected

35works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Sequential Strategic Classification with Multi-Stage Selective Classifiers

Strategic classification studies the problem where self-interested individuals or agents manipulate their response to obtain favorable decision outcomes made by classifiers, typically turning to dishonest actions when they are less costly than genuine efforts. Prior works have demonstrated a fundamental inability to get out of this conundrum by only focusing on the design of a classifier. We note that prior work also heavily focuses on either one-shot settings or repeated interaction with the same classifier. Real-world decision making is often multi-stage, involving a sequence of potentially different classifiers as an agent progresses. This paper introduces a sequential, stochastic, multi-stage model of strategic classification, by capturing how agents adapt their behavior, through improvement actions (enhancing both observable features and true attributes) and gaming actions (enhancing only observable features), over multiple levels of classification with increasing difficulty as well as reward. For each level, we adopt a selective classifier that can abstain from making a prediction at low confidence. Consequently, a positive (resp. negative) outcome leads to promotion (resp. demotion) of the agent to the next higher (resp. lower) level, while abstention keeps the agent at the same level. We fully characterize the agent's optimal instantaneous action under selective classifiers and compare the long-term properties and utility of the agent repeatedly following an optimal myopic policy of either no-improvement (never choose the improvement action) or no-gaming (never choose the gaming action). We further examine design principles over the sequence of classifiers that yield higher long-term utility for the latter policy, thereby effectively incentivizing genuine effort in the long run.

preprint2022arXiv

Characterizing Attacks on Deep Reinforcement Learning

Recent studies show that Deep Reinforcement Learning (DRL) models are vulnerable to adversarial attacks, which attack DRL models by adding small perturbations to the observations. However, some attacks assume full availability of the victim model, and some require a huge amount of computation, making them less feasible for real world applications. In this work, we make further explorations of the vulnerabilities of DRL by studying other aspects of attacks on DRL using realistic and efficient attacks. First, we adapt and propose efficient black-box attacks when we do not have access to DRL model parameters. Second, to address the high computational demands of existing attacks, we introduce efficient online sequential attacks that exploit temporal consistency across consecutive steps. Third, we explore the possibility of an attacker perturbing other aspects in the DRL setting, such as the environment dynamics. Finally, to account for imperfections in how an attacker would inject perturbations in the physical world, we devise a method for generating a robust physical perturbations to be printed. The attack is evaluated on a real-world robot under various conditions. We conduct extensive experiments both in simulation such as Atari games, robotics and autonomous driving, and on real-world robotics, to compare the effectiveness of the proposed attacks with baseline approaches. To the best of our knowledge, we are the first to apply adversarial attacks on DRL systems to physical robots.

preprint2022arXiv

Impact of Community Structure on Cascades

We study cascades under the threshold model on sparse random graphs with community structure. In this model, individuals adopt the new behavior based on how many neighbors have already chosen it. Specifically, we consider the permanent adoption model wherein individuals that have adopted the new behavior (or opinion) cannot change their state. We present a differential-equation-based tight approximation to the stochastic process of adoption and prove the validity of the mean-field equations. In addition, we characterize both necessary and sufficient conditions for contagion to happen no matter how small the set of initial adopters is. Finally, we study the problem of optimum seeding given budget constraints and propose a gradient-based heuristic seeding strategy. Our algorithm, numerically, dispels commonly held beliefs in the literature that suggest the best seeding strategy is to seed over the vertices with the highest number of neighbors.

preprint2021arXiv

Multi-Scale Games: Representing and Solving Games on Networks with Group Structure

Network games provide a natural machinery to compactly represent strategic interactions among agents whose payoffs exhibit sparsity in their dependence on the actions of others. Besides encoding interaction sparsity, however, real networks often exhibit a multi-scale structure, in which agents can be grouped into communities, those communities further grouped, and so on, and where interactions among such groups may also exhibit sparsity. We present a general model of multi-scale network games that encodes such multi-level structure. We then develop several algorithmic approaches that leverage this multi-scale structure, and derive sufficient conditions for convergence of these to a Nash equilibrium. Our numerical experiments demonstrate that the proposed approaches enable orders of magnitude improvements in scalability when computing Nash equilibria in such games. For example, we can solve previously intractable instances involving up to 1 million agents in under 15 minutes.

preprint2020arXiv

Fairness in Learning-Based Sequential Decision Algorithms: A Survey

Algorithmic fairness in decision-making has been studied extensively in static settings where one-shot decisions are made on tasks such as classification. However, in practice most decision-making processes are of a sequential nature, where decisions made in the past may have an impact on future data. This is particularly the case when decisions affect the individuals or users generating the data used for future decisions. In this survey, we review existing literature on the fairness of data-driven sequential decision-making. We will focus on two types of sequential decisions: (1) past decisions have no impact on the underlying user population and thus no impact on future data; (2) past decisions have an impact on the underlying user population and therefore the future data, which can then impact future decisions. In each case the impact of various fairness interventions on the underlying population is examined.

preprint2020arXiv

Using Private and Public Assessments in Security Information Sharing Agreements

Information sharing among organizations has been gaining attention as a method for improving cybersecurity. However, the associated disclosure costs act as deterrents for firms' voluntary cooperation. In this work, we take a game-theoretic approach to understanding firms' incentives in these agreements. We propose the design of inter-temporal incentives (i.e. conditioning future cooperation on past interactions). Specifically, we show that incentives for full cooperation can be designed if firms share their private assessments of other firms' disclosure decisions through a common communication platform. We further show that similar incentives can be designed based on outcomes of a public rating/assessment system.

preprint2016arXiv

Provision of Public Goods on Networks: On Existence, Uniqueness, and Centralities

We consider the provision of public goods on networks of strategic agents. We study different effort outcomes of these network games, namely, the Nash equilibria, Pareto efficient effort profiles, and semi-cooperative equilibria (effort profiles resulting from interactions among coalitions of agents). We identify necessary and sufficient conditions on the structure of the network for the uniqueness of the Nash equilibrium. We show that our finding unifies (and strengthens) existing results in the literature. We also identify conditions for the existence of Nash equilibria for the subclasses of games at the two extremes of our model, namely games of strategic complements and games of strategic substitutes. We provide a graph-theoretical interpretation of agents' efforts at the Nash equilibrium, as well as the Pareto efficient outcomes and semi-cooperative equilibria, by linking an agent's decision to her centrality in the interaction network. Using this connection, we separate the effects of incoming and outgoing edges on agents' efforts and uncover an alternating effect over walks of different length in the network.

preprint2015arXiv

A Tale of Two Mechanisms: Incentivizing Investments in Security Games

In a system of interdependent users, the security of an entity is affected not only by that user's investment in security measures, but also by the positive externality of the security decisions of (some of) the other users. The provision of security in such system is therefore modeled as a public good provision problem, and is referred to as a security game. In this paper, we compare two well-known incentive mechanisms in this context for incentivizing optimal security investments among users, namely the Pivotal and the Externality mechanisms. The taxes in a Pivotal mechanism are designed to ensure users' voluntary participation, while those in an Externality mechanism are devised to maintain a balanced budget. We first show the more general result that, due to the non-excludable nature of security, no mechanism can incentivize the socially optimal investment profile, while at the same time ensuring voluntary participation and maintaining a balanced budget for all instances of security games. To further illustrate, we apply the Pivotal and Externality mechanisms to the special case of weighted total effort interdependence models, and identify some of the effects of varying interdependency between users on the budget deficit in the Pivotal mechanism, as well as on the participation incentives in the Externality mechanism.

preprint2015arXiv

An Online Approach to Dynamic Channel Access and Transmission Scheduling

Making judicious channel access and transmission scheduling decisions is essential for improving performance as well as energy and spectral efficiency in multichannel wireless systems. This problem has been a subject of extensive study in the past decade, and the resulting dynamic and opportunistic channel access schemes can bring potentially significant improvement over traditional schemes. However, a common and severe limitation of these dynamic schemes is that they almost always require some form of a priori knowledge of the channel statistics. A natural remedy is a learning framework, which has also been extensively studied in the same context, but a typical learning algorithm in this literature seeks only the best static policy, with performance measured by weak regret, rather than learning a good dynamic channel access policy. There is thus a clear disconnect between what an optimal channel access policy can achieve with known channel statistics that actively exploits temporal, spatial and spectral diversity, and what a typical existing learning algorithm aims for, which is the static use of a single channel devoid of diversity gain. In this paper we bridge this gap by designing learning algorithms that track known optimal or sub-optimal dynamic channel access and transmission scheduling policies, thereby yielding performance measured by a form of strong regret, the accumulated difference between the reward returned by an optimal solution when a priori information is available and that by our online algorithm. We do so in the context of two specific algorithms that appeared in [1] and [2], respectively, the former for a multiuser single-channel setting and the latter for a single-user multichannel setting. In both cases we show that our algorithms achieve sub-linear regret uniform in time and outperforms the standard weak-regret learning algorithms.

preprint2015arXiv

Efficient Sensor Fault Detection Using Group Testing

When faulty sensors are rare in a network, diagnosing sensors individually is inefficient. This study introduces a novel use of concepts from group testing and Kalman filtering in detecting these rare faulty sensors with significantly fewer number of tests. By assigning sensors to groups and performing Kalman filter-based fault detection over these groups, we obtain binary detection outcomes, which can then be used to recover the fault state of all sensors. We first present this method using combinatorial group testing. We then present a novel adaptive group testing method based on Bayesian inference. This adaptive method further reduces the number of required tests and is suitable for noisy group test systems. Compared to non-group testing methods, our algorithm achieves similar detection accuracy with fewer tests and thus lower computational complexity. Compared to other adaptive group testing methods, the proposed method achieves higher accuracy when test results are noisy. We perform extensive numerical analysis using a set of real vibration data collected from the New Carquinez Bridge in California using an 18-sensor network mounted on the bridge. We also discuss how the features of the Kalman filter-based group test can be exploited in forming groups and further improving the detection accuracy.

preprint2015arXiv

Optimal Adaptive Learning in Uncontrolled Restless Bandit Problems

In this paper we consider the problem of learning the optimal policy for uncontrolled restless bandit problems. In an uncontrolled restless bandit problem, there is a finite set of arms, each of which when pulled yields a positive reward. There is a player who sequentially selects one of the arms at each time step. The goal of the player is to maximize its undiscounted reward over a time horizon T. The reward process of each arm is a finite state Markov chain, whose transition probabilities are unknown by the player. State transitions of each arm is independent of the selection of the player. We propose a learning algorithm with logarithmic regret uniformly over time with respect to the optimal finite horizon policy. Our results extend the optimal adaptive learning of MDPs to POMDPs.

preprint2015arXiv

Optimal Relay Selection with Non-negligible Probing Time

In this paper an optimal relay selection algorithm with non-negligible probing time is proposed and analyzed for cooperative wireless networks. Relay selection has been introduced to solve the degraded bandwidth efficiency problem in cooperative communication. Yet complete information of relay channels often remain unavailable for complex networks which renders the optimal selection strategies impossible for transmission source without probing the relay channels. Particularly when the number of relay candidate is large, even though probing all relay channels guarantees the finding of the best relays at any time instant, the degradation of bandwidth efficiency due to non-negligible probing times, which was often neglected in past literature, is also significant. In this work, a stopping rule based relay selection strategy is determined for the source node to decide when to stop the probing process and choose one of the probed relays to cooperate with under wireless channels' stochastic uncertainties. This relay selection strategy is further shown to have a simple threshold structure. At the meantime, full diversity order and high bandwidth efficiency can be achieved simultaneously. Both analytical and simulation results are provided to verify the claims.

preprint2014arXiv

Closing the Price of Anarchy Gap in the Interdependent Security Game

The reliability and security of a user in an interconnected system depends on all users' collective effort in security. Consequently, investments in security technologies by strategic users is typically modeled as a public good problem, known as the Interdependent Security (IDS) game. The equilibria for such games are often inefficient, as selfish users free-ride on positive externalities of others' contributions. In this paper, we present a mechanism that implements the socially optimal equilibrium in an IDS game through a message exchange process, in which users submit proposals about the security investment and tax/price profiles of one another. This mechanism is different from existing solutions in that (1) it results in socially optimal levels of investment, closing the Price of Anarchy gap in the IDS game, (2) it is applicable to a general model of user interdependencies. We further consider the issue of individual rationality, often a trivial condition to satisfy in many resource allocation problems, and argue that with positive externality, the incentive to stay out and free-ride on others' investment can make individual rationality much harder to satisfy in designing a mechanism.

preprint2013arXiv

DorFin: WiFi Fingerprint-based Localization Revisited

Although WiFi fingerprint-based indoor localization is attractive, its accuracy remains a primary challenge especially in mobile environments. Existing approaches either appeal to physical layer information or rely on extra wireless signals for high accuracy. In this paper, we revisit the RSS fingerprint-based localization scheme and reveal crucial observations that act as the root causes of localization errors, yet are surprisingly overlooked or even unseen in previous works. Specifically, we recognize APs' diverse discrimination for fingerprinting a specific location, observe the RSS inconsistency caused by signal fluctuations and human body blockages, and uncover the RSS outdated problem on commodity smartphones. Inspired by these insights, we devise a discrimination factor to quantify different APs' discrimination, incorporate robust regression to tolerate outlier measurements, and reassemble different fingerprints to cope with outdated RSSs. Combining these techniques in a unified solution, we propose DorFin, a novel scheme of fingerprint generation, representation, and matching, which yields remarkable accuracy without incurring extra cost. Extensive experiments demonstrate that DorFin achieves mean error of 2 meters and more importantly, bounds the 95th percentile error under 5.5 meters; these are about 56% and 69% lower, respectively, compared with the state-of-the-art schemes such as Horus and RADAR.

preprint2013arXiv

Group Learning and Opinion Diffusion in a Broadcast Network

We analyze the following group learning problem in the context of opinion diffusion: Consider a network with $M$ users, each facing $N$ options. In a discrete time setting, at each time step, each user chooses $K$ out of the $N$ options, and receive randomly generated rewards, whose statistics depend on the options chosen as well as the user itself, and are unknown to the users. Each user aims to maximize their expected total rewards over a certain time horizon through an online learning process, i.e., a sequence of exploration (sampling the return of each option) and exploitation (selecting empirically good options) steps. Within this context we consider two group learning scenarios, (1) users with uniform preferences and (2) users with diverse preferences, and examine how a user should construct its learning process to best extract information from other's decisions and experiences so as to maximize its own reward. Performance is measured in {\em weak regret}, the difference between the user's total reward and the reward from a user-specific best single-action policy (i.e., always selecting the set of options generating the highest mean rewards for this user). Within each scenario we also consider two cases: (i) when users exchange full information, meaning they share the actual rewards they obtained from their choices, and (ii) when users exchange limited information, e.g., only their choices but not rewards obtained from these choices.

preprint2013arXiv

Incentives, Quality, and Risks: A Look Into the NSF Proposal Review Pilot

The National Science Foundation (NSF) will be experimenting with a new distributed approach to reviewing proposals, whereby a group of principal investigators (PIs) or proposers in a subfield act as reviewers for the proposals submitted by the same set of PIs. To encourage honesty, PIs' chances for getting funded are tied to the quality of their reviews (with respect to the reviews provided by the entire group), in addition to the quality of their proposals. Intuitively, this approach can more fairly distribute the review workload, discourage frivolous proposal submission, and encourage high quality reviews. On the other hand, this method has already raised concerns about the integrity of the process and the possibility of strategic manipulation. In this paper, we take a closer look at three specific issues in an attempt to gain a better understanding of the strengths and limitations of the new process beyond first impressions and anecdotal evidence. We start by considering the benefits and drawbacks of bundling the quality of PIs' reviews with the scientific merit of their proposals. We then consider the issue of collusion and favoritism. Finally, we examine whether the new process puts controversial proposals at a disadvantage. We conclude that some benefits of using review quality as an incentive mechanism may outweigh its drawbacks. On the other hand, even a coalition of two PIs can cause significant harm to the process, as the built-in incentives are not strong enough to deter collusion. While we also confirm the common suspicion that the process is skewed toward non-controversial proposals, the more unexpected finding is that among equally controversial proposals, those of lower quality get a leg up through this process. Thus the process not only favors non-controversial proposals, but in some sense, mediocrity. We also discuss possible ways to improve this review process.

preprint2013arXiv

Online Learning in a Contract Selection Problem

In an online contract selection problem there is a seller which offers a set of contracts to sequentially arriving buyers whose types are drawn from an unknown distribution. If there exists a profitable contract for the buyer in the offered set, i.e., a contract with payoff higher than the payoff of not accepting any contracts, the buyer chooses the contract that maximizes its payoff. In this paper we consider the online contract selection problem to maximize the sellers profit. Assuming that a structural property called ordered preferences holds for the buyer's payoff function, we propose online learning algorithms that have sub-linear regret with respect to the best set of contracts given the distribution over the buyer's type. This problem has many applications including spectrum contracts, wireless service provider data plans and recommendation systems.

preprint2013arXiv

Perceptions and Truth: A Mechanism Design Approach to Crowd-Sourcing Reputation

We consider a distributed multi-user system where individual entities possess observations or perceptions of one another, while the truth is only known to themselves, and they might have an interest in withholding or distorting the truth. We ask the question whether it is possible for the system as a whole to arrive at the correct perceptions or assessment of all users, referred to as their reputation, by encouraging or incentivizing the users to participate in a collective effort without violating private information and self-interest. Two specific applications, online shopping and network reputation, are provided to motivate our study and interpret the results. In this paper we investigate this problem using a mechanism design theoretic approach. We introduce a number of utility models representing users' strategic behavior, each consisting of one or both of a truth element and an image element, reflecting the user's desire to obtain an accurate view of the other and an inflated image of itself. For each model, we either design a mechanism that achieves the optimal performance (solution to the corresponding centralized problem), or present individually rational sub-optimal solutions. In the latter case, we demonstrate that even when the centralized solution is not achievable, by using a simple punish-reward mechanism, not only a user has the incentive to participate and provide information, but also that this information can improve the system performance.

preprint2013arXiv

Revisiting Optimal Power Control: its Dual Effect on SNR and Contention

In this paper we study a transmission power tune problem with densely deployed 802.11 Wireless Local Area Networks (WLANs). While previous papers emphasize on tuning transmission power with either PHY or MAC layer separately, optimally setting each Access Point's (AP's) transmission power of a densely deployed 802.11 network considering its dual effects on both layers remains an open problem. In this work, we design a measure by evaluating impacts of transmission power on network performance on both PHY and MAC layers. We show that such an optimization problem is intractable and then we investigate and develop an analytical framework to allow simple yet efficient solutions. Through simulations and numerical results, we observe clear benefits of the dual-effect model compared to solutions optimizing solely on a single layer; therefore, we conclude that tuning transmission power from a dual layer (PHY-MAC) point of view is essential and necessary for dense WLANs. We further form a game theoretical framework and investigate above power-tune problem in a strategic network. We show that with decentralized and strategic users, the Nash Equilibrium (N.E.) of the corresponding game is in-efficient and thereafter we propose a punishment based mechanism to enforce users to adopt the social optimal strategy profile under both perfect and imperfect sensing environments.

preprint2013arXiv

Sufficient Conditions on the Optimality of Myopic Sensing in Opportunistic Channel Access: A Unifying Framework

This paper considers a widely studied stochastic control problem arising from opportunistic spectrum access (OSA) in a multi-channel system, with the goal of providing a unifying analytical framework whereby a number of prior results may be viewed as special cases. Specifically, we consider a single wireless transceiver/user with access to $N$ channels, each modeled as an iid discrete-time two-state Markov chain. In each time step the user is allowed to sense $k\leq N$ channels, and subsequently use up to $m\leq k$ channels out of those sensed to be available. Channel sensing is assumed to be perfect, and for each channel use in each time step the user gets a unit reward. The user's objective is to maximize its total discounted or average reward over a finite or infinite horizon. This problem has previously been studied in various special cases including $k=1$ and $m=k\leq N$, often cast as a restless bandit problem, with optimality results derived for a myopic policy that seeks to maximize the immediate one-step reward when the two-state Markov chain model is positively correlated. In this paper we study the general problem with $1\leq m\leq k\leq N$, and derive sufficient conditions under which the myopic policy is optimal for the finite and infinite horizon reward criteria, respectively. It is shown that these results reduce to those derived in prior studies under the corresponding special cases, and thus may be viewed as a set of unifying optimality conditions. Numerical examples are also presented to highlight how and why an optimal policy may deviate from the otherwise-optimal myopic sensing given additional exploration opportunities, i.e., when $m<k$.

preprint2013arXiv

To Stay Or To Switch: Multiuser Dynamic Channel Access

In this paper we study opportunistic spectrum access (OSA) policies in a multiuser multichannel random access cognitive radio network, where users perform channel probing and switching in order to obtain better channel condition or higher instantaneous transmission quality. However, unlikely many prior works in this area, including those on channel probing and switching policies for a single user to exploit spectral diversity, and on probing and access policies for multiple users over a single channel to exploit temporal and multiuser diversity, in this study we consider the collective switching of multiple users over multiple channels. In addition, we consider finite arrivals, i.e., users are not assumed to always have data to send and demand for channel follow a certain arrival process. Under such a scenario, the users' ability to opportunistically exploit temporal diversity (the temporal variation in channel quality over a single channel) and spectral diversity (quality variation across multiple channels at a given time) is greatly affected by the level of congestion in the system. We investigate the optimal decision process in this case, and evaluate the extent to which congestion affects potential gains from opportunistic dynamic channel switching.

preprint2012arXiv

Dynamic Pricing of Power in Smart-Grid Networks

In this paper we introduce the problem of dynamic pricing of power for smart-grid networks. This is studied within a network utility maximization (NUM) framework in a deterministic setting with a single provider, multiple users and a finite horizon. The provider produces power or buys power in a (deterministic) spot market, and determines a dynamic price to charge the users. The users then adjust their demand in response to the time-varying prices. This is typically categorized as the demand response problem, and we study a progression of related models by focusing on two aspects: 1) the characterization of the structure of the optimal dynamic prices in the Smart Grid and the optimal demand and supply under various interaction with a spot market; 2) a greedy approach to facilitate the solution process of the aggregate NUM problem and the optimality gap between the greedy solution and the optimal one.

preprint2012arXiv

Online Learning in Decentralized Multiuser Resource Sharing Problems

In this paper, we consider the general scenario of resource sharing in a decentralized system when the resource rewards/qualities are time-varying and unknown to the users, and using the same resource by multiple users leads to reduced quality due to resource sharing. Firstly, we consider a user-independent reward model with no communication between the users, where a user gets feedback about the congestion level in the resource it uses. Secondly, we consider user-specific rewards and allow costly communication between the users. The users have a cooperative goal of achieving the highest system utility. There are multiple obstacles in achieving this goal such as the decentralized nature of the system, unknown resource qualities, communication, computation and switching costs. We propose distributed learning algorithms with logarithmic regret with respect to the optimal allocation. Our logarithmic regret result holds under both i.i.d. and Markovian reward models, as well as under communication, computation and switching costs.

preprint2012arXiv

Profit Incentive In A Secondary Spectrum Market: A Contract Design Approach

In this paper we formulate a contract design problem where a primary license holder wishes to profit from its excess spectrum capacity by selling it to potential secondary users/buyers. It needs to determine how to optimally price the excess spectrum so as to maximize its profit, knowing that this excess capacity is stochastic in nature, does not come with exclusive access, and cannot provide deterministic service guarantees to a buyer. At the same time, buyers are of different {\em types}, characterized by different communication needs, tolerance for the channel uncertainty, and so on, all of which a buyer's private information. The license holder must then try to design different contracts catered to different types of buyers in order to maximize its profit. We address this problem by adopting as a reference a traditional spectrum market where the buyer can purchase exclusive access with fixed/deterministic guarantees. We fully characterize the optimal solution in the cases where there is a single buyer type, and when multiple types of buyers share the same, known channel condition as a result of the primary user activity. In the most general case we construct an algorithm that generates a set of contracts in a computationally efficient manner, and show that this set is optimal when the buyer types satisfy a monotonicity condition.

preprint2012arXiv

Throughput Optimal Switching in Multi-channel WLANs

We observe that in a multi-channel wireless system, an opportunistic channel/spectrum access scheme that solely focuses on channel quality sensing measured by received SNR may induce users to use channels that, while providing better signals, are more congested. Ultimately the notion of channel quality should include both the signal quality and the level of congestion, and a good multi-channel access scheme should take both into account in deciding which channel to use and when. Motivated by this, we focus on the congestion aspect and examine what type of dynamic channel switching schemes may result in the best system throughput performance. Specifically we derive the stability region of a multi-user multi-channel WLAN system and determine the throughput optimal channel switching scheme within a certain class of schemes.

preprint2011arXiv

On the Combinatorial Multi-Armed Bandit Problem with Markovian Rewards

We consider a combinatorial generalization of the classical multi-armed bandit problem that is defined as follows. There is a given bipartite graph of $M$ users and $N \geq M$ resources. For each user-resource pair $(i,j)$, there is an associated state that evolves as an aperiodic irreducible finite-state Markov chain with unknown parameters, with transitions occurring each time the particular user $i$ is allocated resource $j$. The user $i$ receives a reward that depends on the corresponding state each time it is allocated the resource $j$. The system objective is to learn the best matching of users to resources so that the long-term sum of the rewards received by all users is maximized. This corresponds to minimizing regret, defined here as the gap between the expected total reward that can be obtained by the best-possible static matching and the expected total reward that can be achieved by a given algorithm. We present a polynomial-storage and polynomial-complexity-per-step matching-learning algorithm for this problem. We show that this algorithm can achieve a regret that is uniformly arbitrarily close to logarithmic in time and polynomial in the number of users and resources. This formulation is broadly applicable to scheduling and switching problems in networks and significantly extends prior results in the area.

preprint2011arXiv

Online Learning for Combinatorial Network Optimization with Restless Markovian Rewards

Combinatorial network optimization algorithms that compute optimal structures taking into account edge weights form the foundation for many network protocols. Examples include shortest path routing, minimal spanning tree computation, maximum weighted matching on bipartite graphs, etc. We present CLRMR, the first online learning algorithm that efficiently solves the stochastic version of these problems where the underlying edge weights vary as independent Markov chains with unknown dynamics. The performance of an online learning algorithm is characterized in terms of regret, defined as the cumulative difference in rewards between a suitably-defined genie, and that obtained by the given algorithm. We prove that, compared to a genie that knows the Markov transition matrices and uses the single-best structure at all times, CLRMR yields regret that is polynomial in the number of edges and nearly-logarithmic in time.

preprint2011arXiv

Online Learning of Rested and Restless Bandits

In this paper we study the online learning problem involving rested and restless multiarmed bandits with multiple plays. The system consists of a single player/user and a set of K finite-state discrete-time Markov chains (arms) with unknown state spaces and statistics. At each time step the player can play M arms. The objective of the user is to decide for each step which M of the K arms to play over a sequence of trials so as to maximize its long term reward. The restless multiarmed bandit is particularly relevant to the application of opportunistic spectrum access (OSA), where a (secondary) user has access to a set of K channels, each of time-varying condition as a result of random fading and/or certain primary users' activities.

preprint2011arXiv

Performance and Convergence of Multi-user Online Learning

We study the problem of allocating multiple users to a set of wireless channels in a decentralized manner when the channel quali- ties are time-varying and unknown to the users, and accessing the same channel by multiple users leads to reduced quality due to interference. In such a setting the users not only need to learn the inherent channel quality and at the same time the best allocations of users to channels so as to maximize the social welfare. Assuming that the users adopt a certain online learning algorithm, we investigate under what conditions the socially optimal allocation is achievable. In particular we examine the effect of different levels of knowledge the users may have and the amount of communications and cooperation. The general conclusion is that when the cooperation of users decreases and the uncertainty about channel payoffs increases it becomes harder to achieve the socially opti- mal allocation.

preprint2010arXiv

CapEst: A Measurement-based Approach to Estimating Link Capacity in Wireless Networks

Estimating link capacity in a wireless network is a complex task because the available capacity at a link is a function of not only the current arrival rate at that link, but also of the arrival rate at links which interfere with that link as well as of the nature of interference between these links. Models which accurately characterize this dependence are either too computationally complex to be useful or lack accuracy. Further, they have a high implementation overhead and make restrictive assumptions, which makes them inapplicable to real networks. In this paper, we propose CapEst, a general, simple yet accurate, measurement-based approach to estimating link capacity in a wireless network. To be computationally light, CapEst allows inaccuracy in estimation; however, using measurements, it can correct this inaccuracy in an iterative fashion and converge to the correct estimate. Our evaluation shows that CapEst always converged to within 5% of the correct value in less than 18 iterations. CapEst is model-independent, hence, is applicable to any MAC/PHY layer and works with auto-rate adaptation. Moreover, it has a low implementation overhead, can be used with any application which requires an estimate of residual capacity on a wireless link and can be implemented completely at the network layer without any support from the underlying chipset.

preprint2010arXiv

Channel Estimation for Opportunistic Spectrum Access: Uniform and Random Sensing

The knowledge of channel statistics can be very helpful in making sound opportunistic spectrum access decisions. It is therefore desirable to be able to efficiently and accurately estimate channel statistics. In this paper we study the problem of optimally placing sensing times over a time window so as to get the best estimate on the parameters of an on-off renewal channel. We are particularly interested in a sparse sensing regime with a small number of samples relative to the time window size. Using Fisher information as a measure, we analytically derive the best and worst sensing sequences under a sparsity condition. We also present a way to derive the best/worst sequences without this condition using a dynamic programming approach. In both cases the worst turns out to be the uniform sensing sequence, where sensing times are evenly spaced within the window. With these results we argue that without a priori knowledge, a robust sensing strategy should be a randomized strategy. We then compare different random schemes using a family of distributions generated by the circular $β$ ensemble, and propose an adaptive sensing scheme to effectively track time-varying channel parameters. We further discuss the applicability of compressive sensing for this problem.

preprint2010arXiv

Energy-Efficient Transmission Scheduling with Strict Underflow Constraints

We consider a single source transmitting data to one or more receivers/users over a shared wireless channel. Due to random fading, the wireless channel conditions vary with time and from user to user. Each user has a buffer to store received packets before they are drained. At each time step, the source determines how much power to use for transmission to each user. The source's objective is to allocate power in a manner that minimizes an expected cost measure, while satisfying strict buffer underflow constraints and a total power constraint in each slot. The expected cost measure is composed of costs associated with power consumption from transmission and packet holding costs. The primary application motivating this problem is wireless media streaming. For this application, the buffer underflow constraints prevent the user buffers from emptying, so as to maintain playout quality. In the case of a single user with linear power-rate curves, we show that a modified base-stock policy is optimal under the finite horizon, infinite horizon discounted, and infinite horizon average expected cost criteria. For a single user with piecewise-linear convex power-rate curves, we show that a finite generalized base-stock policy is optimal under all three expected cost criteria. We also present the sequences of critical numbers that complete the characterization of the optimal control laws in each of these cases when some additional technical conditions are satisfied. We then analyze the structure of the optimal policy for the case of two users. We conclude with a discussion of methods to identify implementable near-optimal policies for the most general case of M users.

preprint2010arXiv

Networked Computing in Wireless Sensor Networks for Structural Health Monitoring

This paper studies the problem of distributed computation over a network of wireless sensors. While this problem applies to many emerging applications, to keep our discussion concrete we will focus on sensor networks used for structural health monitoring. Within this context, the heaviest computation is to determine the singular value decomposition (SVD) to extract mode shapes (eigenvectors) of a structure. Compared to collecting raw vibration data and performing SVD at a central location, computing SVD within the network can result in significantly lower energy consumption and delay. Using recent results on decomposing SVD, a well-known centralized operation, into components, we seek to determine a near-optimal communication structure that enables the distribution of this computation and the reassembly of the final results, with the objective of minimizing energy consumption subject to a computational delay constraint. We show that this reduces to a generalized clustering problem; a cluster forms a unit on which a component of the overall computation is performed. We establish that this problem is NP-hard. By relaxing the delay constraint, we derive a lower bound to this problem. We then propose an integer linear program (ILP) to solve the constrained problem exactly as well as an approximate algorithm with a proven approximation ratio. We further present a distributed version of the approximate algorithm. We present both simulation and experimentation results to demonstrate the effectiveness of these algorithms.

preprint2010arXiv

Online Learning in Opportunistic Spectrum Access: A Restless Bandit Approach

We consider an opportunistic spectrum access (OSA) problem where the time-varying condition of each channel (e.g., as a result of random fading or certain primary users' activities) is modeled as an arbitrary finite-state Markov chain. At each instance of time, a (secondary) user probes a channel and collects a certain reward as a function of the state of the channel (e.g., good channel condition results in higher data rate for the user). Each channel has potentially different state space and statistics, both unknown to the user, who tries to learn which one is the best as it goes and maximizes its usage of the best channel. The objective is to construct a good online learning algorithm so as to minimize the difference between the user's performance in total rewards and that of using the best channel (on average) had it known which one is the best from a priori knowledge of the channel statistics (also known as the regret). This is a classic exploration and exploitation problem and results abound when the reward processes are assumed to be iid. Compared to prior work, the biggest difference is that in our case the reward process is assumed to be Markovian, of which iid is a special case. In addition, the reward processes are restless in that the channel conditions will continue to evolve independent of the user's actions. This leads to a restless bandit problem, for which there exists little result on either algorithms or performance bounds in this learning context to the best of our knowledge. In this paper we introduce an algorithm that utilizes regenerative cycles of a Markov chain and computes a sample-mean based index policy, and show that under mild conditions on the state transition probabilities of the Markov chains this algorithm achieves logarithmic regret uniformly over time, and that this regret bound is also optimal.

preprint2010arXiv

Spectrum Sharing as Spatial Congestion Games

In this paper, we present and analyze the properties of a new class of games - the spatial congestion game (SCG), which is a generalization of the classical congestion game (CG). In a classical congestion game, multiple users share the same set of resources and a user's payoff for using any resource is a function of the total number of users sharing it. As a potential game, this game enjoys some very appealing properties, including the existence of a pure strategy Nash equilibrium (NE) and that every improvement path is finite and leads to such a NE (also called the finite improvement property or FIP). While it's tempting to use this model to study spectrum sharing, it does not capture the spatial reuse feature of wireless communication, where resources (interpreted as channels) may be reused without increasing congestion provided that users are located far away from each other. This motivates us to study an extended form of the congestion game where a user's payoff for using a resource is a function of the number of its interfering users sharing it. This naturally results in a spatial congestion game (SCG), where users are placed over a network (or a conflict graph). We study fundamental properties of a spatial congestion game; in particular, we seek to answer under what conditions this game possesses the finite improvement property or a Nash equilibrium. We also discuss the implications of these results when applied to wireless spectrum sharing.

Mingyan Liu

What is connected

Connect this record

See the researcher in context

Building this map preview

35 published item(s)

Sequential Strategic Classification with Multi-Stage Selective Classifiers

Characterizing Attacks on Deep Reinforcement Learning

Impact of Community Structure on Cascades

Multi-Scale Games: Representing and Solving Games on Networks with Group Structure

Fairness in Learning-Based Sequential Decision Algorithms: A Survey

Using Private and Public Assessments in Security Information Sharing Agreements

Provision of Public Goods on Networks: On Existence, Uniqueness, and Centralities

A Tale of Two Mechanisms: Incentivizing Investments in Security Games

An Online Approach to Dynamic Channel Access and Transmission Scheduling

Efficient Sensor Fault Detection Using Group Testing

Optimal Adaptive Learning in Uncontrolled Restless Bandit Problems

Optimal Relay Selection with Non-negligible Probing Time

Closing the Price of Anarchy Gap in the Interdependent Security Game

DorFin: WiFi Fingerprint-based Localization Revisited

Group Learning and Opinion Diffusion in a Broadcast Network

Incentives, Quality, and Risks: A Look Into the NSF Proposal Review Pilot

Online Learning in a Contract Selection Problem

Perceptions and Truth: A Mechanism Design Approach to Crowd-Sourcing Reputation

Revisiting Optimal Power Control: its Dual Effect on SNR and Contention

Sufficient Conditions on the Optimality of Myopic Sensing in Opportunistic Channel Access: A Unifying Framework

To Stay Or To Switch: Multiuser Dynamic Channel Access

Dynamic Pricing of Power in Smart-Grid Networks

Online Learning in Decentralized Multiuser Resource Sharing Problems

Profit Incentive In A Secondary Spectrum Market: A Contract Design Approach

Throughput Optimal Switching in Multi-channel WLANs

On the Combinatorial Multi-Armed Bandit Problem with Markovian Rewards

Online Learning for Combinatorial Network Optimization with Restless Markovian Rewards

Online Learning of Rested and Restless Bandits

Performance and Convergence of Multi-user Online Learning

CapEst: A Measurement-based Approach to Estimating Link Capacity in Wireless Networks

Channel Estimation for Opportunistic Spectrum Access: Uniform and Random Sensing

Energy-Efficient Transmission Scheduling with Strict Underflow Constraints

Networked Computing in Wireless Sensor Networks for Structural Health Monitoring

Online Learning in Opportunistic Spectrum Access: A Restless Bandit Approach

Spectrum Sharing as Spatial Congestion Games