Source author record

Tamer Başar

Tamer Başar appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Systems and Control Computer Science and Game Theory Machine Learning eess.SY Information Theory math.IT Multiagent Systems Artificial Intelligence math.DS Cryptography and Security Distributed, Parallel, and Cluster Computing eess.SP math.ST Networking and Internet Architecture Social and Information Networks Statistics Theory

Catalog footprint

What is connected

42works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Distributed Adaptive Newton Methods with Global Superlinear Convergence

This paper considers the distributed optimization problem where each node of a peer-to-peer network minimizes a finite sum of objective functions by communicating with its neighboring nodes. In sharp contrast to the existing literature where the fastest distributed algorithms converge either with a global linear or a local superlinear rate, we propose a distributed adaptive Newton (DAN) algorithm with a global quadratic convergence rate. Our key idea lies in the design of a finite-time set-consensus method with Polyak's adaptive stepsize. Moreover, we introduce a low-rank matrix approximation (LA) technique to compress the innovation of Hessian matrix so that each node only needs to transmit message of dimension $\mathcal{O}(p)$ (where $p$ is the dimension of decision vectors) per iteration, which is essentially the same as that of first-order methods. Nevertheless, the resulting DAN-LA converges to an optimal solution with a global superlinear rate. Numerical experiments on logistic regression problems are conducted to validate their advantages over existing methods.

preprint2022arXiv

How does a Rational Agent Act in an Epidemic?

Evolution of disease in a large population is a function of the top-down policy measures from a centralized planner, as well as the self-interested decisions (to be socially active) of individual agents in a large heterogeneous population. This paper is concerned with understanding the latter based on a mean-field type optimal control model. Specifically, the model is used to investigate the role of partial information on an agent's decision-making, and study the impact of such decisions by a large number of agents on the spread of the virus in the population. The motivation comes from the presymptomatic and asymptomatic spread of the COVID-19 virus where an agent unwittingly spreads the virus. We show that even in a setting with fully rational agents, limited information on the viral state can result in an epidemic growth.

preprint2022arXiv

Linear Quadratic Mean-Field Games with Communication Constraints

In this paper, we study a large population game with heterogeneous dynamics and cost functions solving a consensus problem. Moreover, the agents have communication constraints which appear as: (1) an Additive-White Gaussian Noise (AWGN) channel, and (2) asynchronous data transmission via a fixed scheduling policy. Since the complexity of solving the game increases with the number of agents, we use the Mean-Field Game paradigm to solve it. Under standard assumptions on the information structure of the agents, we prove that the control of the agent in the MFG setting is free of the dual effect. This allows us to obtain an equilibrium control policy for the generic agent, which is a function of only the local observation of the agent. Furthermore, the equilibrium mean-field trajectory is shown to follow linear dynamics, hence making it computable. We show that in the finite population game, the equilibrium control policy prescribed by the MFG analysis constitutes an $ε$-Nash equilibrium, where $ε$ tends to zero as the number of agents goes to infinity. The paper is concluded with simulations demonstrating the performance of the equilibrium control policy.

preprint2022arXiv

Model-Free Non-Stationary RL: Near-Optimal Regret and Applications in Multi-Agent RL and Inventory Control

We consider model-free reinforcement learning (RL) in non-stationary Markov decision processes. Both the reward functions and the state transition functions are allowed to vary arbitrarily over time as long as their cumulative variations do not exceed certain variation budgets. We propose Restarted Q-Learning with Upper Confidence Bounds (RestartQ-UCB), the first model-free algorithm for non-stationary RL, and show that it outperforms existing solutions in terms of dynamic regret. Specifically, RestartQ-UCB with Freedman-type bonus terms achieves a dynamic regret bound of $\widetilde{O}(S^{\frac{1}{3}} A^{\frac{1}{3}} Δ^{\frac{1}{3}} H T^{\frac{2}{3}})$, where $S$ and $A$ are the numbers of states and actions, respectively, $Δ>0$ is the variation budget, $H$ is the number of time steps per episode, and $T$ is the total number of time steps. We further present a parameter-free algorithm named Double-Restart Q-UCB that does not require prior knowledge of the variation budget. We show that our algorithms are \emph{nearly optimal} by establishing an information-theoretical lower bound of $Ω(S^{\frac{1}{3}} A^{\frac{1}{3}} Δ^{\frac{1}{3}} H^{\frac{2}{3}} T^{\frac{2}{3}})$, the first lower bound in non-stationary RL. Numerical experiments validate the advantages of RestartQ-UCB in terms of both cumulative rewards and computational efficiency. We demonstrate the power of our results in examples of multi-agent RL and inventory control across related products.

preprint2022arXiv

On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning

Multi-agent reinforcement learning (MARL) algorithms often suffer from an exponential sample complexity dependence on the number of agents, a phenomenon known as \emph{the curse of multiagents}. In this paper, we address this challenge by investigating sample-efficient model-free algorithms in \emph{decentralized} MARL, and aim to improve existing algorithms along this line. For learning (coarse) correlated equilibria in general-sum Markov games, we propose \emph{stage-based} V-learning algorithms that significantly simplify the algorithmic design and analysis of recent works, and circumvent a rather complicated no-\emph{weighted}-regret bandit subroutine. For learning Nash equilibria in Markov potential games, we propose an independent policy gradient algorithm with a decentralized momentum-based variance reduction technique. All our algorithms are decentralized in that each agent can make decisions based on only its local information. Neither communication nor centralized coordination is required during learning, leading to a natural generalization to a large number of agents. We also provide numerical simulations to corroborate our theoretical findings.

preprint2022arXiv

Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games

This paper addresses the problem of learning an equilibrium efficiently in general-sum Markov games through decentralized multi-agent reinforcement learning. Given the fundamental difficulty of calculating a Nash equilibrium (NE), we instead aim at finding a coarse correlated equilibrium (CCE), a solution concept that generalizes NE by allowing possible correlations among the agents' strategies. We propose an algorithm in which each agent independently runs optimistic V-learning (a variant of Q-learning) to efficiently explore the unknown environment, while using a stabilized online mirror descent (OMD) subroutine for policy updates. We show that the agents can find an $ε$-approximate CCE in at most $\widetilde{O}( H^6S A /ε^2)$ episodes, where $S$ is the number of states, $A$ is the size of the largest individual action space, and $H$ is the length of an episode. This appears to be the first sample complexity result for learning in generic general-sum Markov games. Our results rely on a novel investigation of an anytime high-probability regret bound for OMD with a dynamic learning rate and weighted regret, which would be of independent interest. One key feature of our algorithm is that it is fully \emph{decentralized}, in the sense that each agent has access to only its local information, and is completely oblivious to the presence of others. This way, our algorithm can readily scale up to an arbitrary number of agents, without suffering from the exponential dependence on the number of agents.

preprint2021arXiv

Asynchronous Networked Aggregative Games

We propose a fully asynchronous networked aggregative game (Asy-NAG) where each player minimizes a cost function that depends on its local action and the aggregate of all players' actions. In sharp contrast to the existing NAGs, each player in our Asy-NAG can compute an estimate of the aggregate action at any wall-clock time by only using (possibly stale) information from nearby players of a directed network. Such an asynchronous update does not require any coordination among players. Moreover, we design a novel distributed algorithm with an aggressive mechanism for each player to adaptively adjust the optimization stepsize per update. Particularly, the slow players in terms of updating their estimates smartly increase their stepsizes to catch up with the fast ones. Then, we develop an augmented system approach to address the asynchronicity and the information delays between players, and rigorously show the convergence to a Nash equilibrium of the Asy-NAG via a perturbed coordinate algorithm which is also of independent interest. Finally, we evaluate the performance of the distributed algorithm through numerical simulations.

preprint2021arXiv

Derivative-Free Policy Optimization for Linear Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity

Direct policy search serves as one of the workhorses in modern reinforcement learning (RL), and its applications in continuous control tasks have recently attracted increasing attention. In this work, we investigate the convergence theory of policy gradient (PG) methods for learning the linear risk-sensitive and robust controller. In particular, we develop PG methods that can be implemented in a derivative-free fashion by sampling system trajectories, and establish both global convergence and sample complexity results in the solutions of two fundamental settings in risk-sensitive and robust control: the finite-horizon linear exponential quadratic Gaussian, and the finite-horizon linear-quadratic disturbance attenuation problems. As a by-product, our results also provide the first sample complexity for the global convergence of PG methods on solving zero-sum linear-quadratic dynamic games, a nonconvex-nonconcave minimax optimization problem that serves as a baseline setting in multi-agent reinforcement learning (MARL) with continuous spaces. One feature of our algorithms is that during the learning phase, a certain level of robustness/risk-sensitivity of the controller is preserved, which we termed as the implicit regularization property, and is an essential requirement in safety-critical control systems.

preprint2021arXiv

Fully Asynchronous Policy Evaluation in Distributed Reinforcement Learning over Networks

This paper proposes a \emph{fully asynchronous} scheme for the policy evaluation problem of distributed reinforcement learning (DisRL) over directed peer-to-peer networks. Without waiting for any other node of the network, each node can locally update its value function at any time by using (possibly delayed) information from its neighbors. This is in sharp contrast to the gossip-based scheme where a pair of nodes concurrently update. Though the fully asynchronous setting involves a difficult multi-timescale decision problem, we design a novel stochastic average gradient (SAG) based distributed algorithm and develop a push-pull augmented graph approach to prove its exact convergence at a linear rate of $\mathcal{O}(c^k)$ where $c\in(0,1)$ and $k$ increases by one no matter on which node updates. Finally, numerical experiments validate that our method speeds up linearly with respect to the number of nodes, and is robust to straggler nodes.

preprint2021arXiv

Partial Observability Approach for the Optimal Transparency Problem in Multi-agent Systems

This paper considers a network of agents, where each agent is assumed to take actions optimally with respect to a predefined payoff function involving the latest actions of the agent's neighbors. Neighborhood relationships stem from payoff functions rather than actual communication channels between the agents. A principal is tasked to optimize the network's performance by controlling the information available to each agent with regard to other agents' latest actions. The information control by the principal is done via a partial observability approach, which comprises a static partitioning of agents into blocks and making the mean of agents' latest actions within each block publicly available. While the problem setup is general in terms of the payoff functions and the network's performance metric, this paper has a narrower focus to illuminate the problem and how it can be addressed in practice. In particular, the performance metric is assumed to be a function of the steady-state behavior of the agents. After conducting a comprehensive steady-state analysis of the network, an efficient algorithm finding optimal partitions with respect to various performance metrics is presented and validated via numerical examples.

preprint2021arXiv

Policy Optimization for $\mathcal{H}_2$ Linear Control with $\mathcal{H}_\infty$ Robustness Guarantee: Implicit Regularization and Global Convergence

Policy optimization (PO) is a key ingredient for reinforcement learning (RL). For control design, certain constraints are usually enforced on the policies to optimize, accounting for either the stability, robustness, or safety concerns on the system. Hence, PO is by nature a constrained (nonconvex) optimization in most cases, whose global convergence is challenging to analyze in general. More importantly, some constraints that are safety-critical, e.g., the $\mathcal{H}_\infty$-norm constraint that guarantees the system robustness, are difficult to enforce as the PO methods proceed. Recently, policy gradient methods have been shown to converge to the global optimum of linear quadratic regulator (LQR), a classical optimal control problem, without regularizing/projecting the control iterates onto the stabilizing set, its (implicit) feasible set. This striking result is built upon the coercive property of the cost, ensuring that the iterates remain feasible as the cost decreases. In this paper, we study the convergence theory of PO for $\mathcal{H}_2$ linear control with $\mathcal{H}_\infty$-norm robustness guarantee. One significant new feature of this problem is the lack of coercivity, i.e., the cost may have finite value around the feasible set boundary, breaking the existing analysis for LQR. Interestingly, we show that two PO methods enjoy the implicit regularization property, i.e., the iterates preserve the $\mathcal{H}_\infty$ robustness constraint as if they are regularized by the algorithms. Furthermore, despite the nonconvexity of the problem, we show that these algorithms converge to the globally optimal policies with globally sublinear rates, avoiding all suboptimal stationary points/local minima, and with locally (super-)linear rates under certain conditions.

preprint2021arXiv

Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games

We study the global convergence of policy optimization for finding the Nash equilibria (NE) in zero-sum linear quadratic (LQ) games. To this end, we first investigate the landscape of LQ games, viewing it as a nonconvex-nonconcave saddle-point problem in the policy space. Specifically, we show that despite its nonconvexity and nonconcavity, zero-sum LQ games have the property that the stationary point of the objective function with respect to the linear feedback control policies constitutes the NE of the game. Building upon this, we develop three projected nested-gradient methods that are guaranteed to converge to the NE of the game. Moreover, we show that all of these algorithms enjoy both globally sublinear and locally linear convergence rates. Simulation results are also provided to illustrate the satisfactory convergence properties of the algorithms. To the best of our knowledge, this work appears to be the first one to investigate the optimization landscape of LQ games, and provably show the convergence of policy optimization methods to the Nash equilibria. Our work serves as an initial step toward understanding the theoretical aspects of policy-based reinforcement learning algorithms for zero-sum Markov games in general.

preprint2020arXiv

A Game of Drones: Cyber-Physical Security of Time-Critical UAV Applications with Cumulative Prospect Theory Perceptions and Valuations

In this paper, a novel mathematical framework is introduced for modeling and analyzing the cyber-physical security of time-critical UAV applications. A general UAV security network interdiction game is formulated to model interactions between a UAV operator and an interdictor, each of which can be benign or malicious. In this game, the interdictor chooses the optimal location(s) from which to target the drone system by interdicting the potential paths of the UAVs. Meanwhile, the UAV operator responds by finding an optimal path selection policy that enables its UAVs to evade attacks and minimize their mission completion time. New notions from cumulative prospect theory (PT) are incorporated into the game to capture the operator's and interdictor's subjective valuations of mission completion times and perceptions of the risk levels facing the UAVs. The equilibrium of the game, with and without PT, is then analytically characterized and studied. Novel algorithms are then proposed to reach the game's equilibria under both PT and classical game theory. Simulation results show the properties of the equilibrium for both the rational and PT cases. The results show that the operator's and interdictor's bounded rationality is more likely to be disadvantageous to the UAV operator.

preprint2020arXiv

A Game-Theoretic Framework for Multi-Period-Multi-Company Demand Response Management in the Smart Grid

By utilizing tools from game theory, we develop a novel multi-period-multi-company demand response framework considering the interactions between companies (sellers of energy) and their consumers (buyers of energy). We model the interactions in terms of a Stackelberg game, where companies set their prices and consumers respond by choosing their demands. We show that the underlying game has a unique equilibrium at which the companies maximize their revenues while the consumers maximize their utilities subject to their local constraints. Closed-form expressions are provided for the optimal strategies of all players. Based on these solutions, a power allocation game has been formulated, which is shown to admit a unique pure-strategy Nash equilibrium, for which closed-form expressions are also provided. This equilibrium is found under the assumption that companies can freely allocate their power across the time horizon, but we also demonstrate that it is possible to relax this assumption. We further provide a fast distributed algorithm for the computation of all optimal strategies using only local information. We also study the effect of variations in the number of periods (subdivisions of the time horizon) and the number of consumers. As a consequence, we are able to find an appropriate company-to-consumer ratio when the number of consumers participating in demand response exceeds some threshold. Furthermore, we show, both analytically and numerically, that the multi-period scheme provides incentives for energy consumers to participate in demand response, compared to the single-period framework studied in the literature. In our framework, we provide a condition for the minimum budgets consumers need, and carry out case studies using real life data to demonstrate the benefits of the approach, which show potential savings of up to $30\%$ and equilibrium prices that have low volatility.

preprint2020arXiv

Approximate Equilibrium Computation for Discrete-Time Linear-Quadratic Mean-Field Games

While the topic of mean-field games (MFGs) has a relatively long history, heretofore there has been limited work concerning algorithms for the computation of equilibrium control policies. In this paper, we develop a computable policy iteration algorithm for approximating the mean-field equilibrium in linear-quadratic MFGs with discounted cost. Given the mean-field, each agent faces a linear-quadratic tracking problem, the solution of which involves a dynamical system evolving in retrograde time. This makes the development of forward-in-time algorithm updates challenging. By identifying a structural property of the mean-field update operator, namely that it preserves sequences of a particular form, we develop a forward-in-time equilibrium computation algorithm. Bounds that quantify the accuracy of the computed mean-field equilibrium as a function of the algorithm's stopping condition are provided. The optimality of the computed equilibrium is validated numerically. In contrast to the most recent/concurrent results, our algorithm appears to be the first to study infinite-horizon MFGs with non-stationary mean-field equilibria, though with focus on the linear quadratic setting.

preprint2020arXiv

Controlling a Networked SIS Model via a Single Input over Undirected Graphs

This paper formulates and studies the problem of controlling a networked SIS model using a single input in which the network structure is described by a connected undirected graph. A necessary and sufficient condition on the values of curing and infection rates for the healthy state to be exponentially stable is obtained via the analysis of signed Laplacians when the control input is the curing budget of a single agent. In the case when the healthy state is stabilizable, an explicit expression for the minimum curing budget is provided. The utility of the algorithm is demonstrated using a simulation over a network of cities in the northeastern United States.

preprint2020arXiv

Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies

Policy gradient (PG) methods are a widely used reinforcement learning methodology in many applications such as video games, autonomous driving, and robotics. In spite of its empirical success, a rigorous understanding of the global convergence of PG methods is lacking in the literature. In this work, we close the gap by viewing PG methods from a nonconvex optimization perspective. In particular, we propose a new variant of PG methods for infinite-horizon problems that uses a random rollout horizon for the Monte-Carlo estimation of the policy gradient. This method then yields an unbiased estimate of the policy gradient with bounded variance, which enables the tools from nonconvex optimization to be applied to establish global convergence. Employing this perspective, we first recover the convergence results with rates to the stationary-point policies in the literature. More interestingly, motivated by advances in nonconvex optimization, we modify the proposed PG method by introducing periodically enlarged stepsizes. The modified algorithm is shown to escape saddle points under mild assumptions on the reward and the policy parameterization. Under a further strict saddle points assumption, this result establishes convergence to essentially locally-optimal policies of the underlying problem, and thus bridges the gap in existing literature on the convergence of PG methods. Results from experiments on the inverted pendulum are then provided to corroborate our theory, namely, by slightly reshaping the reward function to satisfy our assumption, unfavorable saddle points can be avoided and better limit points can be attained. Intriguingly, this empirical finding justifies the benefit of reward-reshaping from a nonconvex optimization perspective.

preprint2020arXiv

Graph-Theoretic Framework for Unified Analysis of Observability and Data Injection Attacks in the Smart Grid

In this paper, a novel graph-theoretic framework is proposed to generalize the analysis of a broad set of security attacks, including observability and data injection attacks, that target the state estimator of a smart grid. First, the notion of observability attacks is defined based on a proposed graph-theoretic construct. In this respect, a structured approach is proposed to characterize critical sets, whose removal renders the system unobservable. It is then shown that, for the system to be observable, these critical sets must be part of a maximum matching over a proposed bipartite graph. In addition, it is shown that stealthy data injection attacks (SDIAs) constitute a special case of these observability attacks. Then, various attack strategies and defense policies, for observability and data injection attacks, are shown to be amenable to analysis using the introduced graph-theoretic framework. The proposed framework is then shown to provide a unified basis for analysis of four key security problems (among others), pertaining to the characterization of: 1) The sparsest SDIA; 2) the sparsest SDIA including a certain measurement; 3) a set of measurements which must be defended to thwart all potential SDIAs; and 4) the set of measurements, which when protected, can thwart any SDIA whose cardinality is below a certain threshold. A case study using the IEEE 14-bus system with a set of 17 measurements is used to support the theoretical findings.

preprint2020arXiv

Information State Embedding in Partially Observable Cooperative Multi-Agent Reinforcement Learning

Multi-agent reinforcement learning (MARL) under partial observability has long been considered challenging, primarily due to the requirement for each agent to maintain a belief over all other agents' local histories -- a domain that generally grows exponentially over time. In this work, we investigate a partially observable MARL problem in which agents are cooperative. To enable the development of tractable algorithms, we introduce the concept of an information state embedding that serves to compress agents' histories. We quantify how the compression error influences the resulting value functions for decentralized control. Furthermore, we propose an instance of the embedding based on recurrent neural networks (RNNs). The embedding is then used as an approximate information state, and can be fed into any MARL algorithm. The proposed embed-then-learn pipeline opens the black-box of existing (partially observable) MARL algorithms, allowing us to establish some theoretical guarantees (error bounds of value functions) while still achieving competitive performance with many end-to-end approaches.

preprint2020arXiv

Non-Cooperative Inverse Reinforcement Learning

Making decisions in the presence of a strategic opponent requires one to take into account the opponent's ability to actively mask its intended objective. To describe such strategic situations, we introduce the non-cooperative inverse reinforcement learning (N-CIRL) formalism. The N-CIRL formalism consists of two agents with completely misaligned objectives, where only one of the agents knows the true objective function. Formally, we model the N-CIRL formalism as a zero-sum Markov game with one-sided incomplete information. Through interacting with the more informed player, the less informed player attempts to both infer, and act according to, the true objective function. As a result of the one-sided incomplete information, the multi-stage game can be decomposed into a sequence of single-stage games expressed by a recursive formula. Solving this recursive formula yields the value of the N-CIRL game and the more informed player's equilibrium strategy. Another recursive formula, constructed by forming an auxiliary game, termed the dual game, yields the less informed player's strategy. Building upon these two recursive formulas, we develop a computationally tractable algorithm to approximately solve for the equilibrium strategies. Finally, we demonstrate the benefits of our N-CIRL formalism over the existing multi-agent IRL formalism via extensive numerical simulation in a novel cyber security setting.

preprint2020arXiv

On Spectral Properties of Signed Laplacians with Connections to Eventual Positivity

Signed graphs have appeared in a broad variety of applications, ranging from social networks to biological networks, from distributed control and computation to power systems. In this paper, we investigate spectral properties of signed Laplacians for undirected signed graphs. We find conditions on the negative weights under which a signed Laplacian is positive semidefinite via the Kron reduction and multiport network theory. For signed Laplacians that are indefinite, we characterize their inertias with the same framework. Furthermore, we build connections between signed Laplacians, generalized M-matrices, and eventually exponentially positive matrices.

preprint2020arXiv

POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis

Monte-Carlo planning, as exemplified by Monte-Carlo Tree Search (MCTS), has demonstrated remarkable performance in applications with finite spaces. In this paper, we consider Monte-Carlo planning in an environment with continuous state-action spaces, a much less understood problem with important applications in control and robotics. We introduce POLY-HOOT, an algorithm that augments MCTS with a continuous armed bandit strategy named Hierarchical Optimistic Optimization (HOO) (Bubeck et al., 2011). Specifically, we enhance HOO by using an appropriate polynomial, rather than logarithmic, bonus term in the upper confidence bounds. Such a polynomial bonus is motivated by its empirical successes in AlphaGo Zero (Silver et al., 2017b), as well as its significant role in achieving theoretical guarantees of finite space MCTS (Shah et al., 2019). We investigate, for the first time, the regret of the enhanced HOO algorithm in non-stationary bandit problems. Using this result as a building block, we establish non-asymptotic convergence guarantees for POLY-HOOT: the value estimate converges to an arbitrarily small neighborhood of the optimal value function at a polynomial rate. We further provide experimental results that corroborate our theoretical findings.

preprint2020arXiv

Quantifying Market Efficiency Impacts of Aggregated Distributed Energy Resources

We focus on the aggregation of distributed energy resources (DERs) through a profit-maximizing intermediary that enables participation of DERs in wholesale electricity markets. Particularly, we study the market efficiency brought in by the large-scale deployment of DERs and explore to what extent such benefits are offset by the profit-maximizing nature of the aggregator. We deploy a game-theoretic framework to study the strategic interactions between an aggregator and DER owners. The proposed model takes into account the stochastic nature of the DER supply. We explicitly characterize the equilibrium of the game and provide illustrative examples to quantify the efficiency loss due to the strategic incentives of the aggregator. Our numerical experiments illustrate the impact of uncertainty and amount of DER integration on the overall market efficiency.

preprint2016arXiv

Global Stabilization of Triangulated Formations

Formation control deals with the design of decentralized control laws that stabilize mobile, autonomous agents at prescribed distances from each other. We call any configuration of the agents a target configuration if it satisfies the inter-agent distance conditions. It is well known that when the distance conditions are defined by a rigid graph, there is a finite number of target configurations modulo rotations and translations of the entire formation. We can thus recast the objective of formation control as stabilizing one or many of the target configurations. A major issue is that such control laws will also have equilibria corresponding to configurations which do not meet the desired inter-agent distance conditions; we refer to these as undesirable configurations. The undesirable configurations become problematic if they are also stable. Designing decentralized control laws whose stable equilibria are all target configurations in the case of a general rigid graph is still an open problem. We provide here a new point of view on this problem, and propose a partial solution by exhibiting a class of rigid graphs and control laws for which all stable equilibria are target configurations.

preprint2016arXiv

On the Analysis of a Continuous-Time Bi-Virus Model

Motivated by the spread of opinions on different social networks, we study a distributed continuous-time bi-virus model for a system of groups of individuals. An in-depth stability analysis is performed for more general models than have been previously considered, for the healthy and epidemic states. In addition, we investigate sensitivity properties of some nontrivial equilibria and obtain an impossibility result for distributed feedback control.

preprint2015arXiv

A Stackelberg Game for Multi-Period Demand Response Management in the Smart Grid

This paper studies a multi-period demand response management problem in the smart grid where multiple utility companies compete among themselves. The user-utility interactions are modeled by a noncooperative game of a Stackelberg type where the interactions among the utility companies are captured through a Nash equilibrium. It is shown that this game has a unique Stackelberg equilibrium at which the utility companies set prices to maximize their revenues (within a Nash game) while the users respond accordingly to maximize their utilities subject to their budget constraints. Closed-form expressions are provided for the corresponding strategies of the users and the utility companies. It is shown that the multi- period scheme, compared with the single-period case, provides more incentives for the users to participate in the game. A necessary and sufficient condition on the minimum budget needed for a user to participate is provided.

preprint2015arXiv

Context-Aware Wireless Small Cell Networks: How to Exploit User Information for Resource Allocation

In this paper, a novel context-aware approach for resource allocation in two-tier wireless small cell networks~(SCNs) is proposed. In particular, the SCN's users are divided into two types: frequent users, who are regular users of certain small cells, and occasional users, who are one-time or infrequent users of a particular small cell. Given such \emph{context} information, each small cell base station (SCBS) aims to maximize the overall performance provided to its frequent users, while ensuring that occasional users are also well serviced. We formulate the problem as a noncooperative game in which the SCBSs are the players. The strategy of each SCBS is to choose a proper power allocation so as to optimize a utility function that captures the tradeoff between the users' quality-of-service gains and the costs in terms of resource expenditures. We provide a sufficient condition for the existence and uniqueness of a pure strategy Nash equilibrium for the game, and we show that this condition is independent of the number of users in the network. Simulation results show that the proposed context-aware resource allocation game yields significant performance gains, in terms of the average utility per SCBS, compared to conventional techniques such as proportional fair allocation and sum-rate maximization.

preprint2015arXiv

Robust Distributed Averaging: When are Potential-Theoretic Strategies Optimal?

We study the interaction between a network designer and an adversary over a dynamical network. The network consists of nodes performing continuous-time distributed averaging. The adversary strategically disconnects a set of links to prevent the nodes from reaching consensus. Meanwhile, the network designer assists the nodes in reaching consensus by changing the weights of a limited number of links in the network. We formulate two Stackelberg games to describe this competition where the order in which the players act is reversed in the two problems. Although the canonical equations provided by the Pontryagin's maximum principle seem to be intractable, we provide an alternative characterization for the optimal strategies that makes connection to potential theory. Finally, we provide a sufficient condition for the existence of a saddle-point equilibrium for the underlying zero-sum game.

preprint2015arXiv

Stability of Epidemic Models over Directed Graphs: A Positive Systems Approach

We study the stability properties of a susceptible-infected-susceptible (SIS) diffusion model, so-called the $n$-intertwined Markov model, over arbitrary directed network topologies. As in the majority of the work on infection spread dynamics, this model exhibits a threshold phenomenon. When the curing rates in the network are high, the disease-free state is the unique equilibrium over the network. Otherwise, an endemic equilibrium state emerges, where some infection remains within the network. Using notions from positive systems theory, {we provide novel proofs for the global asymptotic stability of the equilibrium points in both cases over strongly connected networks based on the value of the basic reproduction number, a fundamental quantity in the study of epidemics.} When the network topology is weakly connected, we provide conditions for the existence, uniqueness, and global asymptotic stability of an endemic state, and we study the stability of the disease-free state. Finally, we demonstrate that the $n$-intertwined Markov model can be viewed as a best-response dynamical system of a concave game among the nodes. This characterization allows us to cast new infection spread dynamics; additionally, we provide a sufficient condition for the global convergence to the disease-free state, which can be checked in a distributed fashion. Several simulations demonstrate our results.

preprint2014arXiv

Design and Analysis of Distributed Averaging with Quantized Communication

Consider a network whose nodes have some initial values, and it is desired to design an algorithm that builds on neighbor to neighbor interactions with the ultimate goal of convergence to the average of all initial node values or to some value close to that average. Such an algorithm is called generically "distributed averaging," and our goal in this paper is to study the performance of a subclass of deterministic distributed averaging algorithms where the information exchange between neighboring nodes (agents) is subject to uniform quantization. With such quantization, convergence to the precise average cannot be achieved in general, but the convergence would be to some value close to it, called quantized consensus. Using Lyapunov stability analysis, we characterize the convergence properties of the resulting nonlinear quantized system. We show that in finite time and depending on initial conditions, the algorithm will either cause all agents to reach a quantized consensus where the consensus value is the largest quantized value not greater than the average of their initial values, or will lead all variables to cycle in a small neighborhood around the average. In the latter case, we identify tight bounds for the size of the neighborhood and we further show that the error can be made arbitrarily small by adjusting the algorithm's parameters in a distributed manner.

preprint2013arXiv

A Game-Theoretic Approach to Energy Trading in the Smart Grid

Electric storage units constitute a key element in the emerging smart grid system. In this paper, the interactions and energy trading decisions of a number of geographically distributed storage units are studied using a novel framework based on game theory. In particular, a noncooperative game is formulated between storage units, such as PHEVs, or an array of batteries that are trading their stored energy. Here, each storage unit's owner can decide on the maximum amount of energy to sell in a local market so as to maximize a utility that reflects the tradeoff between the revenues from energy trading and the accompanying costs. Then in this energy exchange market between the storage units and the smart grid elements, the price at which energy is traded is determined via an auction mechanism. The game is shown to admit at least one Nash equilibrium and a novel proposed algorithm that is guaranteed to reach such an equilibrium point is proposed. Simulation results show that the proposed approach yields significant performance improvements, in terms of the average utility per storage unit, reaching up to 130.2% compared to a conventional greedy approach.

preprint2013arXiv

Robust Distributed Averaging in Networks

In this work, we consider two types of adversarial attacks on a network of nodes seeking to reach consensus. The first type involves an adversary that is capable of breaking a specific number of links at each time instant. In the second attack, the adversary is capable of corrupting the values of the nodes by adding a noise signal. In this latter case, we assume that the adversary is constrained by a power budget. We consider the optimization problem of the adversary and fully characterize its optimum strategy for each scenario.

preprint2013arXiv

Robust Distributed Averaging on Networks with Adversarial Intervention

We study the interaction between a network designer and an adversary over a dynamical network. The network consists of nodes performing continuous-time distributed averaging. The goal of the network designer is to assist the nodes reach consensus by changing the weights of a limited number of links in the network. Meanwhile, an adversary strategically disconnects a set of links to prevent the nodes from converging. We formulate two problems to describe this competition where the order in which the players act is reversed in the two problems. We utilize Pontryagin's Maximum Principle (MP) to tackle both problems and derive the optimal strategies. Although the canonical equations provided by the MP are intractable, we provide an alternative characterization for the optimal strategies that highlights a connection with potential theory. Finally, we provide a sufficient condition for the existence of a saddle-point equilibrium (SPE) for this zero-sum game.

preprint2012arXiv

A Cooperative Bayesian Nonparametric Framework for Primary User Activity Monitoring in Cognitive Radio Network

This paper introduces a novel approach that enables a number of cognitive radio devices that are observing the availability pattern of a number of primary users(PUs), to cooperate and use \emph{Bayesian nonparametric} techniques to estimate the distributions of the PUs' activity pattern, assumed to be completely unknown. In the proposed model, each cognitive node may have its own individual view on each PU's distribution, and, hence, seeks to find partners having a correlated perception. To address this problem, a coalitional game is formulated between the cognitive devices and an algorithm for cooperative coalition formation is proposed. It is shown that the proposed coalition formation algorithm allows the cognitive nodes that are experiencing a similar behavior from some PUs to self-organize into disjoint, independent coalitions. Inside each coalition, the cooperative cognitive nodes use a combination of Bayesian nonparametric models such as the Dirichlet process and statistical goodness of fit techniques in order to improve the accuracy of the estimated PUs' activity distributions. Simulation results show that the proposed algorithm significantly improves the estimates of the PUs' distributions and yields a performance advantage, in terms of reduction of the average achieved Kullback-Leibler distance between the real and the estimated distributions, reaching up to 36.5% relative the non-cooperative estimates. The results also show that the proposed algorithm enables the cognitive nodes to adapt their cooperative decisions when the actual PUs' distributions change due to, for example, PU mobility.

preprint2012arXiv

Coalitional Games in Partition Form for Joint Spectrum Sensing and Access in Cognitive Radio Networks

Unlicensed secondary users (SUs) in cognitive radio networks are subject to an inherent tradeoff between spectrum sensing and spectrum access. Although each SU has an incentive to sense the primary user (PU) channels for locating spectrum holes, this exploration of the spectrum can come at the expense of a shorter transmission time, and, hence, a possibly smaller capacity for data transmission. This paper investigates the impact of this tradeoff on the cooperative strategies of a network of SUs that seek to cooperate in order to improve their view of the spectrum (sensing), reduce the possibility of interference among each other, and improve their transmission capacity (access). The problem is modeled as a coalitional game in partition form and an algorithm for coalition formation is proposed. Using the proposed algorithm, the SUs can make individual distributed decisions to join or leave a coalition while maximizing their utilities which capture the average time spent for sensing as well as the capacity achieved while accessing the spectrum. It is shown that, by using the proposed algorithm, the SUs can self-organize into a network partition composed of disjoint coalitions, with the members of each coalition cooperating to jointly optimize their sensing and access performance. Simulation results show the performance improvement that the proposed algorithm yields with respect to the non-cooperative case. The results also show how the algorithm allows the SUs to self-adapt to changes in the environment such as the change in the traffic of the PUs, or slow mobility.

preprint2012arXiv

Game Theoretic Methods for the Smart Grid

The future smart grid is envisioned as a large-scale cyber-physical system encompassing advanced power, communications, control, and computing technologies. In order to accommodate these technologies, it will have to build on solid mathematical tools that can ensure an efficient and robust operation of such heterogeneous and large-scale cyber-physical systems. In this context, this paper is an overview on the potential of applying game theory for addressing relevant and timely open problems in three emerging areas that pertain to the smart grid: micro-grid systems, demand-side management, and communications. In each area, the state-of-the-art contributions are gathered and a systematic treatment, using game theory, of some of the most relevant problems for future power systems is provided. Future opportunities for adopting game theoretic methodologies in the transition from legacy systems toward smart and intelligent grids are also discussed. In a nutshell, this article provides a comprehensive account of the application of game theory in smart grid systems tailored to the interdisciplinary characteristics of these systems that integrate components from power systems, networking, communications, and control.

preprint2012arXiv

Nash Equilibria for Stochastic Games with Asymmetric Information-Part 1: Finite Games

A model of stochastic games where multiple controllers jointly control the evolution of the state of a dynamic system but have access to different information about the state and action processes is considered. The asymmetry of information among the controllers makes it difficult to compute or characterize Nash equilibria. Using common information among the controllers, the game with asymmetric information is shown to be equivalent to another game with symmetric information. Further, under certain conditions, a Markov state is identified for the equivalent symmetric information game and its Markov perfect equilibria are characterized. This characterization provides a backward induction algorithm to find Nash equilibria of the original game with asymmetric information in pure or behavioral strategies. Each step of this algorithm involves finding Bayesian Nash equilibria of a one-stage Bayesian game. The class of Nash equilibria of the original game that can be characterized in this backward manner are named common information based Markov perfect equilibria.

preprint2012arXiv

Network Formation Games Among Relay Stations in Next Generation Wireless Networks

The introduction of relay station (RS) nodes is a key feature in next generation wireless networks such as 3GPP's long term evolution advanced (LTE-Advanced), or the forthcoming IEEE 802.16j WiMAX standard. This paper presents, using game theory, a novel approach for the formation of the tree architecture that connects the RSs and their serving base station in the \emph{uplink} of the next generation wireless multi-hop systems. Unlike existing literature which mainly focused on performance analysis, we propose a distributed algorithm for studying the \emph{structure} and \emph{dynamics} of the network. We formulate a network formation game among the RSs whereby each RS aims to maximize a cross-layer utility function that takes into account the benefit from cooperative transmission, in terms of reduced bit error rate, and the costs in terms of the delay due to multi-hop transmission. For forming the tree structure, a distributed myopic algorithm is devised. Using the proposed algorithm, each RS can individually select the path that connects it to the BS through other RSs while optimizing its utility. We show the convergence of the algorithm into a Nash tree network, and we study how the RSs can adapt the network's topology to environmental changes such as mobility or the deployment of new mobile stations. Simulation results show that the proposed algorithm presents significant gains in terms of average utility per mobile station which is at least 17.1% better relatively to the case with no RSs and reaches up to 40.3% improvement compared to a nearest neighbor algorithm (for a network with 10 RSs). The results also show that the average number of hops does not exceed 3 even for a network with up to 25 RSs.

preprint2011arXiv

Adaptive Resource Allocation in Jamming Teams Using Game Theory

In this work, we study the problem of power allocation and adaptive modulation in teams of decision makers. We consider the special case of two teams with each team consisting of two mobile agents. Agents belonging to the same team communicate over wireless ad hoc networks, and they try to split their available power between the tasks of communication and jamming the nodes of the other team. The agents have constraints on their total energy and instantaneous power usage. The cost function adopted is the difference between the rates of erroneously transmitted bits of each team. We model the adaptive modulation problem as a zero-sum matrix game which in turn gives rise to a a continuous kernel game to handle power control. Based on the communications model, we present sufficient conditions on the physical parameters of the agents for the existence of a pure strategy saddle-point equilibrium (PSSPE).

preprint2011arXiv

Power Allocation in Team Jamming Games in Wireless Ad Hoc Networks

In this work, we study the problem of power allocation in teams. Each team consists of two agents who try to split their available power between the tasks of communication and jamming the nodes of the other team. The agents have constraints on their total energy and instantaneous power usage. The cost function is the difference between the rates of erroneously transmitted bits of each team. We model the problem as a zero-sum differential game between the two teams and use {\it{Isaacs'}} approach to obtain the necessary conditions for the optimal trajectories. This leads to a continuous-kernel power allocation game among the players. Based on the communications model, we present sufficient conditions on the physical parameters of the agents for the existence of a pure strategy Nash equilibrium (PSNE). Finally, we present simulation results for the case when the agents are holonomic.

preprint2010arXiv

Fictitious Play with Time-Invariant Frequency Update for Network Security

We study two-player security games which can be viewed as sequences of nonzero-sum matrix games played by an Attacker and a Defender. The evolution of the game is based on a stochastic fictitious play process, where players do not have access to each other's payoff matrix. Each has to observe the other's actions up to present and plays the action generated based on the best response to these observations. In a regular fictitious play process, each player makes a maximum likelihood estimate of her opponent's mixed strategy, which results in a time-varying update based on the previous estimate and current action. In this paper, we explore an alternative scheme for frequency update, whose mean dynamic is instead time-invariant. We examine convergence properties of the mean dynamic of the fictitious play process with such an update scheme, and establish local stability of the equilibrium point when both players are restricted to two actions. We also propose an adaptive algorithm based on this time-invariant frequency update.

preprint2009arXiv

Coalitional Games for Distributed Collaborative Spectrum Sensing in Cognitive Radio Networks

Collaborative spectrum sensing among secondary users (SUs) in cognitive networks is shown to yield a significant performance improvement. However, there exists an inherent trade off between the gains in terms of probability of detection of the primary user (PU) and the costs in terms of false alarm probability. In this paper, we study the impact of this trade off on the topology and the dynamics of a network of SUs seeking to reduce the interference on the PU through collaborative sensing. Moreover, while existing literature mainly focused on centralized solutions for collaborative sensing, we propose distributed collaboration strategies through game theory. We model the problem as a non-transferable coalitional game, and propose a distributed algorithm for coalition formation through simple merge and split rules. Through the proposed algorithm, SUs can autonomously collaborate and self-organize into disjoint independent coalitions, while maximizing their detection probability taking into account the cooperation costs (in terms of false alarm). We study the stability of the resulting network structure, and show that a maximum number of SUs per formed coalition exists for the proposed utility model. Simulation results show that the proposed algorithm allows a reduction of up to 86.6% of the average missing probability per SU (probability of missing the detection of the PU) relative to the non-cooperative case, while maintaining a certain false alarm level. In addition, through simulations, we compare the performance of the proposed distributed solution with respect to an optimal centralized solution that minimizes the average missing probability per SU. Finally, the results also show how the proposed algorithm autonomously adapts the network topology to environmental changes such as mobility.

Tamer Başar

What is connected

Connect this record

See the researcher in context

Building this map preview

42 published item(s)

Distributed Adaptive Newton Methods with Global Superlinear Convergence

How does a Rational Agent Act in an Epidemic?

Linear Quadratic Mean-Field Games with Communication Constraints

Model-Free Non-Stationary RL: Near-Optimal Regret and Applications in Multi-Agent RL and Inventory Control

On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning

Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games

Asynchronous Networked Aggregative Games

Derivative-Free Policy Optimization for Linear Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity

Fully Asynchronous Policy Evaluation in Distributed Reinforcement Learning over Networks

Partial Observability Approach for the Optimal Transparency Problem in Multi-agent Systems

Policy Optimization for $\mathcal{H}_2$ Linear Control with $\mathcal{H}_\infty$ Robustness Guarantee: Implicit Regularization and Global Convergence

Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games

A Game of Drones: Cyber-Physical Security of Time-Critical UAV Applications with Cumulative Prospect Theory Perceptions and Valuations

A Game-Theoretic Framework for Multi-Period-Multi-Company Demand Response Management in the Smart Grid

Approximate Equilibrium Computation for Discrete-Time Linear-Quadratic Mean-Field Games

Controlling a Networked SIS Model via a Single Input over Undirected Graphs

Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies

Graph-Theoretic Framework for Unified Analysis of Observability and Data Injection Attacks in the Smart Grid

Information State Embedding in Partially Observable Cooperative Multi-Agent Reinforcement Learning

Non-Cooperative Inverse Reinforcement Learning

On Spectral Properties of Signed Laplacians with Connections to Eventual Positivity

POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis

Quantifying Market Efficiency Impacts of Aggregated Distributed Energy Resources

Global Stabilization of Triangulated Formations

On the Analysis of a Continuous-Time Bi-Virus Model

A Stackelberg Game for Multi-Period Demand Response Management in the Smart Grid

Context-Aware Wireless Small Cell Networks: How to Exploit User Information for Resource Allocation

Robust Distributed Averaging: When are Potential-Theoretic Strategies Optimal?

Stability of Epidemic Models over Directed Graphs: A Positive Systems Approach

Design and Analysis of Distributed Averaging with Quantized Communication

A Game-Theoretic Approach to Energy Trading in the Smart Grid

Robust Distributed Averaging in Networks

Robust Distributed Averaging on Networks with Adversarial Intervention

A Cooperative Bayesian Nonparametric Framework for Primary User Activity Monitoring in Cognitive Radio Network

Coalitional Games in Partition Form for Joint Spectrum Sensing and Access in Cognitive Radio Networks

Game Theoretic Methods for the Smart Grid

Nash Equilibria for Stochastic Games with Asymmetric Information-Part 1: Finite Games

Network Formation Games Among Relay Stations in Next Generation Wireless Networks

Adaptive Resource Allocation in Jamming Teams Using Game Theory

Power Allocation in Team Jamming Games in Wireless Ad Hoc Networks

Fictitious Play with Time-Invariant Frequency Update for Network Security

Coalitional Games for Distributed Collaborative Spectrum Sensing in Cognitive Radio Networks