Researcher profile

Tamer Başar

Tamer Başar contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
23works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

23 published item(s)

preprint2022arXiv

Distributed Adaptive Newton Methods with Global Superlinear Convergence

This paper considers the distributed optimization problem where each node of a peer-to-peer network minimizes a finite sum of objective functions by communicating with its neighboring nodes. In sharp contrast to the existing literature where the fastest distributed algorithms converge either with a global linear or a local superlinear rate, we propose a distributed adaptive Newton (DAN) algorithm with a global quadratic convergence rate. Our key idea lies in the design of a finite-time set-consensus method with Polyak's adaptive stepsize. Moreover, we introduce a low-rank matrix approximation (LA) technique to compress the innovation of Hessian matrix so that each node only needs to transmit message of dimension $\mathcal{O}(p)$ (where $p$ is the dimension of decision vectors) per iteration, which is essentially the same as that of first-order methods. Nevertheless, the resulting DAN-LA converges to an optimal solution with a global superlinear rate. Numerical experiments on logistic regression problems are conducted to validate their advantages over existing methods.

preprint2022arXiv

How does a Rational Agent Act in an Epidemic?

Evolution of disease in a large population is a function of the top-down policy measures from a centralized planner, as well as the self-interested decisions (to be socially active) of individual agents in a large heterogeneous population. This paper is concerned with understanding the latter based on a mean-field type optimal control model. Specifically, the model is used to investigate the role of partial information on an agent's decision-making, and study the impact of such decisions by a large number of agents on the spread of the virus in the population. The motivation comes from the presymptomatic and asymptomatic spread of the COVID-19 virus where an agent unwittingly spreads the virus. We show that even in a setting with fully rational agents, limited information on the viral state can result in an epidemic growth.

preprint2022arXiv

Linear Quadratic Mean-Field Games with Communication Constraints

In this paper, we study a large population game with heterogeneous dynamics and cost functions solving a consensus problem. Moreover, the agents have communication constraints which appear as: (1) an Additive-White Gaussian Noise (AWGN) channel, and (2) asynchronous data transmission via a fixed scheduling policy. Since the complexity of solving the game increases with the number of agents, we use the Mean-Field Game paradigm to solve it. Under standard assumptions on the information structure of the agents, we prove that the control of the agent in the MFG setting is free of the dual effect. This allows us to obtain an equilibrium control policy for the generic agent, which is a function of only the local observation of the agent. Furthermore, the equilibrium mean-field trajectory is shown to follow linear dynamics, hence making it computable. We show that in the finite population game, the equilibrium control policy prescribed by the MFG analysis constitutes an $ε$-Nash equilibrium, where $ε$ tends to zero as the number of agents goes to infinity. The paper is concluded with simulations demonstrating the performance of the equilibrium control policy.

preprint2022arXiv

Model-Free Non-Stationary RL: Near-Optimal Regret and Applications in Multi-Agent RL and Inventory Control

We consider model-free reinforcement learning (RL) in non-stationary Markov decision processes. Both the reward functions and the state transition functions are allowed to vary arbitrarily over time as long as their cumulative variations do not exceed certain variation budgets. We propose Restarted Q-Learning with Upper Confidence Bounds (RestartQ-UCB), the first model-free algorithm for non-stationary RL, and show that it outperforms existing solutions in terms of dynamic regret. Specifically, RestartQ-UCB with Freedman-type bonus terms achieves a dynamic regret bound of $\widetilde{O}(S^{\frac{1}{3}} A^{\frac{1}{3}} Δ^{\frac{1}{3}} H T^{\frac{2}{3}})$, where $S$ and $A$ are the numbers of states and actions, respectively, $Δ>0$ is the variation budget, $H$ is the number of time steps per episode, and $T$ is the total number of time steps. We further present a parameter-free algorithm named Double-Restart Q-UCB that does not require prior knowledge of the variation budget. We show that our algorithms are \emph{nearly optimal} by establishing an information-theoretical lower bound of $Ω(S^{\frac{1}{3}} A^{\frac{1}{3}} Δ^{\frac{1}{3}} H^{\frac{2}{3}} T^{\frac{2}{3}})$, the first lower bound in non-stationary RL. Numerical experiments validate the advantages of RestartQ-UCB in terms of both cumulative rewards and computational efficiency. We demonstrate the power of our results in examples of multi-agent RL and inventory control across related products.

preprint2022arXiv

On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning

Multi-agent reinforcement learning (MARL) algorithms often suffer from an exponential sample complexity dependence on the number of agents, a phenomenon known as \emph{the curse of multiagents}. In this paper, we address this challenge by investigating sample-efficient model-free algorithms in \emph{decentralized} MARL, and aim to improve existing algorithms along this line. For learning (coarse) correlated equilibria in general-sum Markov games, we propose \emph{stage-based} V-learning algorithms that significantly simplify the algorithmic design and analysis of recent works, and circumvent a rather complicated no-\emph{weighted}-regret bandit subroutine. For learning Nash equilibria in Markov potential games, we propose an independent policy gradient algorithm with a decentralized momentum-based variance reduction technique. All our algorithms are decentralized in that each agent can make decisions based on only its local information. Neither communication nor centralized coordination is required during learning, leading to a natural generalization to a large number of agents. We also provide numerical simulations to corroborate our theoretical findings.

preprint2022arXiv

Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games

This paper addresses the problem of learning an equilibrium efficiently in general-sum Markov games through decentralized multi-agent reinforcement learning. Given the fundamental difficulty of calculating a Nash equilibrium (NE), we instead aim at finding a coarse correlated equilibrium (CCE), a solution concept that generalizes NE by allowing possible correlations among the agents' strategies. We propose an algorithm in which each agent independently runs optimistic V-learning (a variant of Q-learning) to efficiently explore the unknown environment, while using a stabilized online mirror descent (OMD) subroutine for policy updates. We show that the agents can find an $ε$-approximate CCE in at most $\widetilde{O}( H^6S A /ε^2)$ episodes, where $S$ is the number of states, $A$ is the size of the largest individual action space, and $H$ is the length of an episode. This appears to be the first sample complexity result for learning in generic general-sum Markov games. Our results rely on a novel investigation of an anytime high-probability regret bound for OMD with a dynamic learning rate and weighted regret, which would be of independent interest. One key feature of our algorithm is that it is fully \emph{decentralized}, in the sense that each agent has access to only its local information, and is completely oblivious to the presence of others. This way, our algorithm can readily scale up to an arbitrary number of agents, without suffering from the exponential dependence on the number of agents.

preprint2021arXiv

Asynchronous Networked Aggregative Games

We propose a fully asynchronous networked aggregative game (Asy-NAG) where each player minimizes a cost function that depends on its local action and the aggregate of all players' actions. In sharp contrast to the existing NAGs, each player in our Asy-NAG can compute an estimate of the aggregate action at any wall-clock time by only using (possibly stale) information from nearby players of a directed network. Such an asynchronous update does not require any coordination among players. Moreover, we design a novel distributed algorithm with an aggressive mechanism for each player to adaptively adjust the optimization stepsize per update. Particularly, the slow players in terms of updating their estimates smartly increase their stepsizes to catch up with the fast ones. Then, we develop an augmented system approach to address the asynchronicity and the information delays between players, and rigorously show the convergence to a Nash equilibrium of the Asy-NAG via a perturbed coordinate algorithm which is also of independent interest. Finally, we evaluate the performance of the distributed algorithm through numerical simulations.

preprint2021arXiv

Derivative-Free Policy Optimization for Linear Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity

Direct policy search serves as one of the workhorses in modern reinforcement learning (RL), and its applications in continuous control tasks have recently attracted increasing attention. In this work, we investigate the convergence theory of policy gradient (PG) methods for learning the linear risk-sensitive and robust controller. In particular, we develop PG methods that can be implemented in a derivative-free fashion by sampling system trajectories, and establish both global convergence and sample complexity results in the solutions of two fundamental settings in risk-sensitive and robust control: the finite-horizon linear exponential quadratic Gaussian, and the finite-horizon linear-quadratic disturbance attenuation problems. As a by-product, our results also provide the first sample complexity for the global convergence of PG methods on solving zero-sum linear-quadratic dynamic games, a nonconvex-nonconcave minimax optimization problem that serves as a baseline setting in multi-agent reinforcement learning (MARL) with continuous spaces. One feature of our algorithms is that during the learning phase, a certain level of robustness/risk-sensitivity of the controller is preserved, which we termed as the implicit regularization property, and is an essential requirement in safety-critical control systems.

preprint2021arXiv

Fully Asynchronous Policy Evaluation in Distributed Reinforcement Learning over Networks

This paper proposes a \emph{fully asynchronous} scheme for the policy evaluation problem of distributed reinforcement learning (DisRL) over directed peer-to-peer networks. Without waiting for any other node of the network, each node can locally update its value function at any time by using (possibly delayed) information from its neighbors. This is in sharp contrast to the gossip-based scheme where a pair of nodes concurrently update. Though the fully asynchronous setting involves a difficult multi-timescale decision problem, we design a novel stochastic average gradient (SAG) based distributed algorithm and develop a push-pull augmented graph approach to prove its exact convergence at a linear rate of $\mathcal{O}(c^k)$ where $c\in(0,1)$ and $k$ increases by one no matter on which node updates. Finally, numerical experiments validate that our method speeds up linearly with respect to the number of nodes, and is robust to straggler nodes.

preprint2021arXiv

Partial Observability Approach for the Optimal Transparency Problem in Multi-agent Systems

This paper considers a network of agents, where each agent is assumed to take actions optimally with respect to a predefined payoff function involving the latest actions of the agent's neighbors. Neighborhood relationships stem from payoff functions rather than actual communication channels between the agents. A principal is tasked to optimize the network's performance by controlling the information available to each agent with regard to other agents' latest actions. The information control by the principal is done via a partial observability approach, which comprises a static partitioning of agents into blocks and making the mean of agents' latest actions within each block publicly available. While the problem setup is general in terms of the payoff functions and the network's performance metric, this paper has a narrower focus to illuminate the problem and how it can be addressed in practice. In particular, the performance metric is assumed to be a function of the steady-state behavior of the agents. After conducting a comprehensive steady-state analysis of the network, an efficient algorithm finding optimal partitions with respect to various performance metrics is presented and validated via numerical examples.

preprint2021arXiv

Policy Optimization for $\mathcal{H}_2$ Linear Control with $\mathcal{H}_\infty$ Robustness Guarantee: Implicit Regularization and Global Convergence

Policy optimization (PO) is a key ingredient for reinforcement learning (RL). For control design, certain constraints are usually enforced on the policies to optimize, accounting for either the stability, robustness, or safety concerns on the system. Hence, PO is by nature a constrained (nonconvex) optimization in most cases, whose global convergence is challenging to analyze in general. More importantly, some constraints that are safety-critical, e.g., the $\mathcal{H}_\infty$-norm constraint that guarantees the system robustness, are difficult to enforce as the PO methods proceed. Recently, policy gradient methods have been shown to converge to the global optimum of linear quadratic regulator (LQR), a classical optimal control problem, without regularizing/projecting the control iterates onto the stabilizing set, its (implicit) feasible set. This striking result is built upon the coercive property of the cost, ensuring that the iterates remain feasible as the cost decreases. In this paper, we study the convergence theory of PO for $\mathcal{H}_2$ linear control with $\mathcal{H}_\infty$-norm robustness guarantee. One significant new feature of this problem is the lack of coercivity, i.e., the cost may have finite value around the feasible set boundary, breaking the existing analysis for LQR. Interestingly, we show that two PO methods enjoy the implicit regularization property, i.e., the iterates preserve the $\mathcal{H}_\infty$ robustness constraint as if they are regularized by the algorithms. Furthermore, despite the nonconvexity of the problem, we show that these algorithms converge to the globally optimal policies with globally sublinear rates, avoiding all suboptimal stationary points/local minima, and with locally (super-)linear rates under certain conditions.

preprint2021arXiv

Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games

We study the global convergence of policy optimization for finding the Nash equilibria (NE) in zero-sum linear quadratic (LQ) games. To this end, we first investigate the landscape of LQ games, viewing it as a nonconvex-nonconcave saddle-point problem in the policy space. Specifically, we show that despite its nonconvexity and nonconcavity, zero-sum LQ games have the property that the stationary point of the objective function with respect to the linear feedback control policies constitutes the NE of the game. Building upon this, we develop three projected nested-gradient methods that are guaranteed to converge to the NE of the game. Moreover, we show that all of these algorithms enjoy both globally sublinear and locally linear convergence rates. Simulation results are also provided to illustrate the satisfactory convergence properties of the algorithms. To the best of our knowledge, this work appears to be the first one to investigate the optimization landscape of LQ games, and provably show the convergence of policy optimization methods to the Nash equilibria. Our work serves as an initial step toward understanding the theoretical aspects of policy-based reinforcement learning algorithms for zero-sum Markov games in general.

preprint2020arXiv

A Game of Drones: Cyber-Physical Security of Time-Critical UAV Applications with Cumulative Prospect Theory Perceptions and Valuations

In this paper, a novel mathematical framework is introduced for modeling and analyzing the cyber-physical security of time-critical UAV applications. A general UAV security network interdiction game is formulated to model interactions between a UAV operator and an interdictor, each of which can be benign or malicious. In this game, the interdictor chooses the optimal location(s) from which to target the drone system by interdicting the potential paths of the UAVs. Meanwhile, the UAV operator responds by finding an optimal path selection policy that enables its UAVs to evade attacks and minimize their mission completion time. New notions from cumulative prospect theory (PT) are incorporated into the game to capture the operator's and interdictor's subjective valuations of mission completion times and perceptions of the risk levels facing the UAVs. The equilibrium of the game, with and without PT, is then analytically characterized and studied. Novel algorithms are then proposed to reach the game's equilibria under both PT and classical game theory. Simulation results show the properties of the equilibrium for both the rational and PT cases. The results show that the operator's and interdictor's bounded rationality is more likely to be disadvantageous to the UAV operator.

preprint2020arXiv

A Game-Theoretic Framework for Multi-Period-Multi-Company Demand Response Management in the Smart Grid

By utilizing tools from game theory, we develop a novel multi-period-multi-company demand response framework considering the interactions between companies (sellers of energy) and their consumers (buyers of energy). We model the interactions in terms of a Stackelberg game, where companies set their prices and consumers respond by choosing their demands. We show that the underlying game has a unique equilibrium at which the companies maximize their revenues while the consumers maximize their utilities subject to their local constraints. Closed-form expressions are provided for the optimal strategies of all players. Based on these solutions, a power allocation game has been formulated, which is shown to admit a unique pure-strategy Nash equilibrium, for which closed-form expressions are also provided. This equilibrium is found under the assumption that companies can freely allocate their power across the time horizon, but we also demonstrate that it is possible to relax this assumption. We further provide a fast distributed algorithm for the computation of all optimal strategies using only local information. We also study the effect of variations in the number of periods (subdivisions of the time horizon) and the number of consumers. As a consequence, we are able to find an appropriate company-to-consumer ratio when the number of consumers participating in demand response exceeds some threshold. Furthermore, we show, both analytically and numerically, that the multi-period scheme provides incentives for energy consumers to participate in demand response, compared to the single-period framework studied in the literature. In our framework, we provide a condition for the minimum budgets consumers need, and carry out case studies using real life data to demonstrate the benefits of the approach, which show potential savings of up to $30\%$ and equilibrium prices that have low volatility.

preprint2020arXiv

Approximate Equilibrium Computation for Discrete-Time Linear-Quadratic Mean-Field Games

While the topic of mean-field games (MFGs) has a relatively long history, heretofore there has been limited work concerning algorithms for the computation of equilibrium control policies. In this paper, we develop a computable policy iteration algorithm for approximating the mean-field equilibrium in linear-quadratic MFGs with discounted cost. Given the mean-field, each agent faces a linear-quadratic tracking problem, the solution of which involves a dynamical system evolving in retrograde time. This makes the development of forward-in-time algorithm updates challenging. By identifying a structural property of the mean-field update operator, namely that it preserves sequences of a particular form, we develop a forward-in-time equilibrium computation algorithm. Bounds that quantify the accuracy of the computed mean-field equilibrium as a function of the algorithm's stopping condition are provided. The optimality of the computed equilibrium is validated numerically. In contrast to the most recent/concurrent results, our algorithm appears to be the first to study infinite-horizon MFGs with non-stationary mean-field equilibria, though with focus on the linear quadratic setting.

preprint2020arXiv

Controlling a Networked SIS Model via a Single Input over Undirected Graphs

This paper formulates and studies the problem of controlling a networked SIS model using a single input in which the network structure is described by a connected undirected graph. A necessary and sufficient condition on the values of curing and infection rates for the healthy state to be exponentially stable is obtained via the analysis of signed Laplacians when the control input is the curing budget of a single agent. In the case when the healthy state is stabilizable, an explicit expression for the minimum curing budget is provided. The utility of the algorithm is demonstrated using a simulation over a network of cities in the northeastern United States.

preprint2020arXiv

Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies

Policy gradient (PG) methods are a widely used reinforcement learning methodology in many applications such as video games, autonomous driving, and robotics. In spite of its empirical success, a rigorous understanding of the global convergence of PG methods is lacking in the literature. In this work, we close the gap by viewing PG methods from a nonconvex optimization perspective. In particular, we propose a new variant of PG methods for infinite-horizon problems that uses a random rollout horizon for the Monte-Carlo estimation of the policy gradient. This method then yields an unbiased estimate of the policy gradient with bounded variance, which enables the tools from nonconvex optimization to be applied to establish global convergence. Employing this perspective, we first recover the convergence results with rates to the stationary-point policies in the literature. More interestingly, motivated by advances in nonconvex optimization, we modify the proposed PG method by introducing periodically enlarged stepsizes. The modified algorithm is shown to escape saddle points under mild assumptions on the reward and the policy parameterization. Under a further strict saddle points assumption, this result establishes convergence to essentially locally-optimal policies of the underlying problem, and thus bridges the gap in existing literature on the convergence of PG methods. Results from experiments on the inverted pendulum are then provided to corroborate our theory, namely, by slightly reshaping the reward function to satisfy our assumption, unfavorable saddle points can be avoided and better limit points can be attained. Intriguingly, this empirical finding justifies the benefit of reward-reshaping from a nonconvex optimization perspective.

preprint2020arXiv

Graph-Theoretic Framework for Unified Analysis of Observability and Data Injection Attacks in the Smart Grid

In this paper, a novel graph-theoretic framework is proposed to generalize the analysis of a broad set of security attacks, including observability and data injection attacks, that target the state estimator of a smart grid. First, the notion of observability attacks is defined based on a proposed graph-theoretic construct. In this respect, a structured approach is proposed to characterize critical sets, whose removal renders the system unobservable. It is then shown that, for the system to be observable, these critical sets must be part of a maximum matching over a proposed bipartite graph. In addition, it is shown that stealthy data injection attacks (SDIAs) constitute a special case of these observability attacks. Then, various attack strategies and defense policies, for observability and data injection attacks, are shown to be amenable to analysis using the introduced graph-theoretic framework. The proposed framework is then shown to provide a unified basis for analysis of four key security problems (among others), pertaining to the characterization of: 1) The sparsest SDIA; 2) the sparsest SDIA including a certain measurement; 3) a set of measurements which must be defended to thwart all potential SDIAs; and 4) the set of measurements, which when protected, can thwart any SDIA whose cardinality is below a certain threshold. A case study using the IEEE 14-bus system with a set of 17 measurements is used to support the theoretical findings.

preprint2020arXiv

Information State Embedding in Partially Observable Cooperative Multi-Agent Reinforcement Learning

Multi-agent reinforcement learning (MARL) under partial observability has long been considered challenging, primarily due to the requirement for each agent to maintain a belief over all other agents' local histories -- a domain that generally grows exponentially over time. In this work, we investigate a partially observable MARL problem in which agents are cooperative. To enable the development of tractable algorithms, we introduce the concept of an information state embedding that serves to compress agents' histories. We quantify how the compression error influences the resulting value functions for decentralized control. Furthermore, we propose an instance of the embedding based on recurrent neural networks (RNNs). The embedding is then used as an approximate information state, and can be fed into any MARL algorithm. The proposed embed-then-learn pipeline opens the black-box of existing (partially observable) MARL algorithms, allowing us to establish some theoretical guarantees (error bounds of value functions) while still achieving competitive performance with many end-to-end approaches.

preprint2020arXiv

Non-Cooperative Inverse Reinforcement Learning

Making decisions in the presence of a strategic opponent requires one to take into account the opponent's ability to actively mask its intended objective. To describe such strategic situations, we introduce the non-cooperative inverse reinforcement learning (N-CIRL) formalism. The N-CIRL formalism consists of two agents with completely misaligned objectives, where only one of the agents knows the true objective function. Formally, we model the N-CIRL formalism as a zero-sum Markov game with one-sided incomplete information. Through interacting with the more informed player, the less informed player attempts to both infer, and act according to, the true objective function. As a result of the one-sided incomplete information, the multi-stage game can be decomposed into a sequence of single-stage games expressed by a recursive formula. Solving this recursive formula yields the value of the N-CIRL game and the more informed player's equilibrium strategy. Another recursive formula, constructed by forming an auxiliary game, termed the dual game, yields the less informed player's strategy. Building upon these two recursive formulas, we develop a computationally tractable algorithm to approximately solve for the equilibrium strategies. Finally, we demonstrate the benefits of our N-CIRL formalism over the existing multi-agent IRL formalism via extensive numerical simulation in a novel cyber security setting.

preprint2020arXiv

On Spectral Properties of Signed Laplacians with Connections to Eventual Positivity

Signed graphs have appeared in a broad variety of applications, ranging from social networks to biological networks, from distributed control and computation to power systems. In this paper, we investigate spectral properties of signed Laplacians for undirected signed graphs. We find conditions on the negative weights under which a signed Laplacian is positive semidefinite via the Kron reduction and multiport network theory. For signed Laplacians that are indefinite, we characterize their inertias with the same framework. Furthermore, we build connections between signed Laplacians, generalized M-matrices, and eventually exponentially positive matrices.

preprint2020arXiv

POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis

Monte-Carlo planning, as exemplified by Monte-Carlo Tree Search (MCTS), has demonstrated remarkable performance in applications with finite spaces. In this paper, we consider Monte-Carlo planning in an environment with continuous state-action spaces, a much less understood problem with important applications in control and robotics. We introduce POLY-HOOT, an algorithm that augments MCTS with a continuous armed bandit strategy named Hierarchical Optimistic Optimization (HOO) (Bubeck et al., 2011). Specifically, we enhance HOO by using an appropriate polynomial, rather than logarithmic, bonus term in the upper confidence bounds. Such a polynomial bonus is motivated by its empirical successes in AlphaGo Zero (Silver et al., 2017b), as well as its significant role in achieving theoretical guarantees of finite space MCTS (Shah et al., 2019). We investigate, for the first time, the regret of the enhanced HOO algorithm in non-stationary bandit problems. Using this result as a building block, we establish non-asymptotic convergence guarantees for POLY-HOOT: the value estimate converges to an arbitrarily small neighborhood of the optimal value function at a polynomial rate. We further provide experimental results that corroborate our theoretical findings.

preprint2020arXiv

Quantifying Market Efficiency Impacts of Aggregated Distributed Energy Resources

We focus on the aggregation of distributed energy resources (DERs) through a profit-maximizing intermediary that enables participation of DERs in wholesale electricity markets. Particularly, we study the market efficiency brought in by the large-scale deployment of DERs and explore to what extent such benefits are offset by the profit-maximizing nature of the aggregator. We deploy a game-theoretic framework to study the strategic interactions between an aggregator and DER owners. The proposed model takes into account the stochastic nature of the DER supply. We explicitly characterize the equilibrium of the game and provide illustrative examples to quantify the efficiency loss due to the strategic incentives of the aggregator. Our numerical experiments illustrate the impact of uncertainty and amount of DER integration on the overall market efficiency.