Source author record

Panayotis Mertikopoulos

Panayotis Mertikopoulos appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Science and Game Theory math.OC Machine Learning Information Theory math.IT math.PR Networking and Internet Architecture math.DS Multiagent Systems Computational Complexity cond-mat.stat-mech Distributed, Parallel, and Cluster Computing Social and Information Networks

Catalog footprint

What is connected

38works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Explicit Second-Order Min-Max Optimization: Practical Algorithms and Complexity Analysis

We propose and analyze several inexact regularized Newton-type methods for finding a global saddle point of \emph{convex-concave} unconstrained min-max optimization problems. Compared to first-order methods, our understanding of second-order methods for min-max optimization is relatively limited, as obtaining global rates of convergence with second-order information can be much more involved. In this paper, we examine how second-order information is used to speed up extra-gradient methods, even under inexactness. In particular, we show that the proposed methods generate iterates that remain within a bounded set and that the averaged iterates converge to an $ε$-saddle point within $O(ε^{-2/3})$ iterations in terms of a restricted gap function. We also provide a simple routine for solving the subproblem at each iteration, requiring a single Schur decomposition and $O(\log\log(1/ε))$ calls to a linear system solver in a quasi-upper-triangular system. Thus, our method improves the existing line-search-based second-order min-max optimization methods by shaving off an $O(\log\log(1/ε))$ factor in the required number of Schur decompositions. Finally, we conduct experiments on synthetic and real data to demonstrate the efficiency of the proposed methods.

preprint2022arXiv

A universal black-box optimization method with almost dimension-free convergence rate guarantees

Universal methods for optimization are designed to achieve theoretically optimal convergence rates without any prior knowledge of the problem's regularity parameters or the accurarcy of the gradient oracle employed by the optimizer. In this regard, existing state-of-the-art algorithms achieve an $\mathcal{O}(1/T^2)$ value convergence rate in Lipschitz smooth problems with a perfect gradient oracle, and an $\mathcal{O}(1/\sqrt{T})$ convergence rate when the underlying problem is non-smooth and/or the gradient oracle is stochastic. On the downside, these methods do not take into account the problem's dimensionality, and this can have a catastrophic impact on the achieved convergence rate, in both theory and practice. Our paper aims to bridge this gap by providing a scalable universal gradient method - dubbed UnderGrad - whose oracle complexity is almost dimension-free in problems with a favorable geometry (like the simplex, linearly constrained semidefinite programs and combinatorial bandits), while retaining the order-optimal dependence on $T$ described above. These "best-of-both-worlds" results are achieved via a primal-dual update scheme inspired by the dual exploration method for variational inequalities.

preprint2022arXiv

Asymptotic Degradation of Linear Regression Estimates With Strategic Data Sources

We consider the problem of linear regression from strategic data sources with a public good component, i.e., when data is provided by strategic agents who seek to minimize an individual provision cost for increasing their data's precision while benefiting from the model's overall precision. In contrast to previous works, our model tackles the case where there is uncertainty on the attributes characterizing the agents' data -- a critical aspect of the problem when the number of agents is large. We provide a characterization of the game's equilibrium, which reveals an interesting connection with optimal design. Subsequently, we focus on the asymptotic behavior of the covariance of the linear regression parameters estimated via generalized least squares as the number of data sources becomes large. We provide upper and lower bounds for this covariance matrix and we show that, when the agents' provision costs are superlinear, the model's covariance converges to zero but at a slower rate relative to virtually all learning problems with exogenous data. On the other hand, if the agents' provision costs are linear, this covariance fails to converge. This shows that even the basic property of consistency of generalized least squares estimators is compromised when the data sources are strategic.

preprint2022arXiv

Learning in Games with Quantized Payoff Observations

This paper investigates the impact of feedback quantization on multi-agent learning. In particular, we analyze the equilibrium convergence properties of the well-known "follow the regularized leader" (FTRL) class of algorithms when players can only observe a quantized (and possibly noisy) version of their payoffs. In this information-constrained setting, we show that coarser quantization triggers a qualitative shift in the convergence behavior of FTRL schemes. Specifically, if the quantization error lies below a threshold value (which depends only on the underlying game and not on the level of uncertainty entering the process or the specific FTRL variant under study), then (i) FTRL is attracted to the game's strict Nash equilibria with arbitrarily high probability; and (ii) the algorithm's asymptotic rate of convergence remains the same as in the non-quantized case. Otherwise, for larger quantization levels, these convergence properties are lost altogether: players may fail to learn anything beyond their initial state, even with full information on their payoff vectors. This is in contrast to the impact of quantization in continuous optimization problems, where the quality of the obtained solution degrades smoothly with the quantization level.

preprint2022arXiv

Multi-Agent Online Optimization with Delays: Asynchronicity, Adaptivity, and Optimism

In this paper, we provide a general framework for studying multi-agent online learning problems in the presence of delays and asynchronicities. Specifically, we propose and analyze a class of adaptive dual averaging schemes in which agents only need to accumulate gradient feedback received from the whole system, without requiring any between-agent coordination. In the single-agent case, the adaptivity of the proposed method allows us to extend a range of existing results to problems with potentially unbounded delays between playing an action and receiving the corresponding feedback. In the multi-agent case, the situation is significantly more complicated because agents may not have access to a global clock to use as a reference point; to overcome this, we focus on the information that is available for producing each prediction rather than the actual delay associated with each feedback. This allows us to derive adaptive learning strategies with optimal regret bounds, even in a fully decentralized, asynchronous environment. Finally, we also analyze an "optimistic" variant of the proposed algorithm which is capable of exploiting the predictability of problems with a slower variation and leads to improved regret bounds.

preprint2022arXiv

Nested bandits

In many online decision processes, the optimizing agent is called to choose between large numbers of alternatives with many inherent similarities; in turn, these similarities imply closely correlated losses that may confound standard discrete choice models and bandit algorithms. We study this question in the context of nested bandits, a class of adversarial multi-armed bandit problems where the learner seeks to minimize their regret in the presence of a large number of distinct alternatives with a hierarchy of embedded (non-combinatorial) similarities. In this setting, optimal algorithms based on the exponential weights blueprint (like Hedge, EXP3, and their variants) may incur significant regret because they tend to spend excessive amounts of time exploring irrelevant alternatives with similar, suboptimal costs. To account for this, we propose a nested exponential weights (NEW) algorithm that performs a layered exploration of the learner's set of alternatives based on a nested, step-by-step selection method. In so doing, we obtain a series of tight bounds for the learner's regret showing that online learning problems with a high degree of similarity between alternatives can be resolved efficiently, without a red bus / blue bus paradox occurring.

preprint2022arXiv

Pick your Neighbor: Local Gauss-Southwell Rule for Fast Asynchronous Decentralized Optimization

In decentralized optimization environments, each agent $i$ in a network of $n$ nodes has its own private function $f_i$, and nodes communicate with their neighbors to cooperatively minimize the aggregate objective $\sum_{i=1}^n f_i$. In this setting, synchronizing the nodes' updates incurs significant communication overhead and computational costs, so much of the recent literature has focused on the analysis and design of asynchronous optimization algorithms, where agents activate and communicate at arbitrary times without needing a global synchronization enforcer. However, most works assume that when a node activates, it selects the neighbor to contact based on a fixed probability (e.g., uniformly at random), a choice that ignores the optimization landscape at the moment of activation. Instead, in this work we introduce an optimization-aware selection rule that chooses the neighbor providing the highest dual cost improvement (a quantity related to a dualization of the problem based on consensus). This scheme is related to the coordinate descent (CD) method with the Gauss-Southwell (GS) rule for coordinate updates; in our setting however, only a subset of coordinates is accessible at each iteration (because each node can communicate only with its neighbors), so the existing literature on GS methods does not apply. To overcome this difficulty, we develop a new analytical framework for smooth and strongly convex $f_i$ that covers the class of set-wise CD algorithms -- a class that directly applies to decentralized scenarios, but is not limited to them -- and we show that the proposed set-wise GS rule achieves a speedup factor of up to the maximum degree in the network (which is in the order of $Θ(n)$ for highly connected graphs). The speedup predicted by our analysis is validated in numerical experiments with synthetic data.

preprint2022arXiv

Routing in an Uncertain World: Adaptivity, Efficiency, and Equilibrium

We consider the traffic assignment problem in nonatomic routing games where the players' cost functions may be subject to random fluctuations (e.g., weather disturbances, perturbations in the underlying network, etc.). We tackle this problem from the viewpoint of a control interface that makes routing recommendations based solely on observed costs and without any further knowledge of the system's governing dynamics -- such as the network's cost functions, the distribution of any random events affecting the network, etc. In this online setting, learning methods based on the popular exponential weights algorithm converge to equilibrium at an $\mathcal{O}({1/\sqrt{T}})$ rate: this rate is known to be order-optimal in stochastic networks, but it is otherwise suboptimal in static networks. In the latter case, it is possible to achieve an $\mathcal{O}({1/T^{2}})$ equilibrium convergence rate via the use of finely tuned accelerated algorithms; on the other hand, these accelerated algorithms fail to converge altogether in the presence of persistent randomness, so it is not clear how to achieve the "best of both worlds" in terms of convergence speed. Our paper seeks to fill this gap by proposing an adaptive routing algortihm with the following desirable properties: $(i)$ it seamlessly interpolates between the $\mathcal{O}({1/T^{2}})$ and $\mathcal{O}({1/\sqrt{T}})$ rates for static and stochastic environments respectively; $(ii)$ its convergence speed is polylogarithmic in the number of paths in the network; ${(iii)}$ the method's per-iteration complexity and memory requirements are both linear in the number of nodes and edges in the network; and ${(iv)}$ it does not require any prior knowledge of the problem's parameters.

preprint2021arXiv

Multi-agent online learning in time-varying games

We examine the long-run behavior of multi-agent online learning in games that evolve over time. Specifically, we focus on a wide class of policies based on mirror descent, and we show that the induced sequence of play (a) converges to Nash equilibrium in time-varying games that stabilize in the long run to a strictly monotone limit; and (b) it stays asymptotically close to the evolving equilibrium of the sequence of stage games (assuming they are strongly monotone). Our results apply to both gradient-based and payoff-based feedback - i.e., the "bandit feedback" case where players only get to observe the payoffs of their chosen actions.

preprint2021arXiv

Survival of the strictest: Stable and unstable equilibria under regularized learning with partial information

In this paper, we examine the Nash equilibrium convergence properties of no-regret learning in general N-player games. For concreteness, we focus on the archetypal follow the regularized leader (FTRL) family of algorithms, and we consider the full spectrum of uncertainty that the players may encounter - from noisy, oracle-based feedback, to bandit, payoff-based information. In this general context, we establish a comprehensive equivalence between the stability of a Nash equilibrium and its support: a Nash equilibrium is stable and attracting with arbitrarily high probability if and only if it is strict (i.e., each equilibrium strategy has a unique best response). This equivalence extends existing continuous-time versions of the folk theorem of evolutionary game theory to a bona fide algorithmic learning setting, and it provides a clear refinement criterion for the prediction of the day-to-day behavior of no-regret learning in games

preprint2021arXiv

The limits of min-max optimization algorithms: convergence to spurious non-critical sets

Compared to ordinary function minimization problems, min-max optimization algorithms encounter far greater challenges because of the existence of periodic cycles and similar phenomena. Even though some of these behaviors can be overcome in the convex-concave regime, the general case is considerably more difficult. On that account, we take an in-depth look at a comprehensive class of state-of-the art algorithms and prevalent heuristics in non-convex / non-concave problems, and we establish the following general results: a) generically, the algorithms' limit points are contained in the ICT sets of a common, mean-field system; b) the attractors of this system also attract the algorithms in question with arbitrarily high probability; and c) all algorithms avoid the system's unstable sets with probability 1. On the surface, this provides a highly optimistic outlook for min-max algorithms; however, we show that there exist spurious attractors that do not contain any stationary points of the problem under study. In this regard, our work suggests that existing min-max algorithms may be subject to inescapable convergence failures. We complement our theoretical analysis by illustrating such attractors in simple, two-dimensional, almost bilinear problems.

preprint2020arXiv

A new regret analysis for Adam-type algorithms

In this paper, we focus on a theory-practice gap for Adam and its variants (AMSgrad, AdamNC, etc.). In practice, these algorithms are used with a constant first-order moment parameter $β_{1}$ (typically between $0.9$ and $0.99$). In theory, regret guarantees for online convex optimization require a rapidly decaying $β_{1}\to0$ schedule. We show that this is an artifact of the standard analysis and propose a novel framework that allows us to derive optimal, data-dependent regret bounds with a constant $β_{1}$, without further assumptions. We also demonstrate the flexibility of our analysis on a wide range of different algorithms and settings.

preprint2020arXiv

Gradient-free Online Learning in Games with Delayed Rewards

Motivated by applications to online advertising and recommender systems, we consider a game-theoretic model with delayed rewards and asynchronous, payoff-based feedback. In contrast to previous work on delayed multi-armed bandits, we focus on multi-player games with continuous action spaces, and we examine the long-run behavior of strategic agents that follow a no-regret learning policy (but are otherwise oblivious to the game being played, the objectives of their opponents, etc.). To account for the lack of a consistent stream of information (for instance, rewards can arrive out of order, with an a priori unbounded delay, etc.), we introduce a gradient-free learning policy where payoff information is placed in a priority queue as it arrives. In this general context, we derive new bounds for the agents' regret; furthermore, under a standard diagonal concavity assumption, we show that the induced sequence of play converges to Nash equilibrium with probability $1$, even if the delay between choosing an action and receiving the corresponding reward is unbounded.

preprint2020arXiv

On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems

This paper analyzes the trajectories of stochastic gradient descent (SGD) to help understand the algorithm's convergence properties in non-convex problems. We first show that the sequence of iterates generated by SGD remains bounded and converges with probability $1$ under a very broad range of step-size schedules. Subsequently, going beyond existing positive probability guarantees, we show that SGD avoids strict saddle points/manifolds with probability $1$ for the entire spectrum of step-size policies considered. Finally, we prove that the algorithm's rate of convergence to Hurwicz minimizers is $\mathcal{O}(1/n^{p})$ if the method is employed with a $Θ(1/n^p)$ step-size schedule. This provides an important guideline for tuning the algorithm's step-size as it suggests that a cool-down phase with a vanishing step-size could lead to faster convergence; we demonstrate this heuristic using ResNet architectures on CIFAR.

preprint2020arXiv

On the convergence of single-call stochastic extra-gradient methods

Variational inequalities have recently attracted considerable interest in machine learning as a flexible paradigm for models that go beyond ordinary loss function minimization (such as generative adversarial networks and related deep learning systems). In this setting, the optimal $\mathcal{O}(1/t)$ convergence rate for solving smooth monotone variational inequalities is achieved by the Extra-Gradient (EG) algorithm and its variants. Aiming to alleviate the cost of an extra gradient step per iteration (which can become quite substantial in deep learning applications), several algorithms have been proposed as surrogates to Extra-Gradient with a \emph{single} oracle call per iteration. In this paper, we develop a synthetic view of such algorithms, and we complement the existing literature by showing that they retain a $\mathcal{O}(1/t)$ ergodic convergence rate in smooth, deterministic problems. Subsequently, beyond the monotone deterministic case, we also show that the last iterate of single-call, \emph{stochastic} extra-gradient methods still enjoys a $\mathcal{O}(1/t)$ local convergence rate to solutions of \emph{non-monotone} variational inequalities that satisfy a second-order sufficient condition.

preprint2020arXiv

Quick or cheap? Breaking points in dynamic markets

We examine two-sided markets where players arrive stochastically over time and are drawn from a continuum of types. The cost of matching a client and provider varies, so a social planner is faced with two contending objectives: a) to reduce players' waiting time before getting matched; and b) to form efficient pairs in order to reduce matching costs. We show that such markets are characterized by a quick-or-cheap dilemma: Under a large class of distributional assumptions, there is no 'free lunch', i.e., there exists no clearing schedule that is simultaneously optimal along both objectives. We further identify a unique breaking point signifying a stark reduction in matching cost contrasted by an increase in waiting time. Generalizing this model, we identify two regimes: one, where no free lunch exists; the other, where a window of opportunity opens to achieve a free lunch. Remarkably, greedy scheduling is never optimal in this setting.

preprint2016arXiv

Boltzmann meets Nash: Energy-efficient routing in optical networks under uncertainty

Motivated by the massive deployment of power-hungry data centers for service provisioning, we examine the problem of routing in optical networks with the aim of minimizing traffic-driven power consumption. To tackle this issue, routing must take into account energy efficiency as well as capacity considerations; moreover, in rapidly-varying network environments, this must be accomplished in a real-time, distributed manner that remains robust in the presence of random disturbances and noise. In view of this, we derive a pricing scheme whose Nash equilibria coincide with the network's socially optimum states, and we propose a distributed learning method based on the Boltzmann distribution of statistical mechanics. Using tools from stochastic calculus, we show that the resulting Boltzmann routing scheme exhibits remarkable convergence properties under uncertainty: specifically, the long-term average of the network's power consumption converges within $\varepsilon$ of its minimum value in time which is at most $\tilde O(1/\varepsilon^2)$, irrespective of the fluctuations' magnitude; additionally, if the network admits a strict, non-mixing optimum state, the algorithm converges to it - again, no matter the noise level. Our analysis is supplemented by extensive numerical simulations which show that Boltzmann routing can lead to a significant decrease in power consumption over basic, shortest-path routing schemes in realistic network conditions.

preprint2016arXiv

Exponentially fast convergence to (strict) equilibrium via hedging

Motivated by applications to data networks where fast convergence is essential, we analyze the problem of learning in generic N-person games that admit a Nash equilibrium in pure strategies. Specifically, we consider a scenario where players interact repeatedly and try to learn from past experience by small adjustments based on local - and possibly imperfect - payoff information. For concreteness, we focus on the so-called "hedge" variant of the exponential weights algorithm where players select an action with probability proportional to the exponential of the action's cumulative payoff over time. When players have perfect information on their mixed payoffs, the algorithm converges locally to a strict equilibrium and the rate of convergence is exponentially fast - of the order of $\mathcal{O}(\exp(-a\sum_{j=1}^{t}γ_{j}))$ where $a>0$ is a constant and $γ_{j}$ is the algorithm's step-size. In the presence of uncertainty, convergence requires a more conservative step-size policy, but with high probability, the algorithm remains locally convergent and achieves an exponential convergence rate.

preprint2016arXiv

Learning in games via reinforcement and regularization

We investigate a class of reinforcement learning dynamics where players adjust their strategies based on their actions' cumulative payoffs over time - specifically, by playing mixed strategies that maximize their expected cumulative payoff minus a regularization term. A widely studied example is exponential reinforcement learning, a process induced by an entropic regularization term which leads mixed strategies to evolve according to the replicator dynamics. However, in contrast to the class of regularization functions used to define smooth best responses in models of stochastic fictitious play, the functions used in this paper need not be infinitely steep at the boundary of the simplex; in fact, dropping this requirement gives rise to an important dichotomy between steep and nonsteep cases. In this general framework, we extend several properties of exponential learning, including the elimination of dominated strategies, the asymptotic stability of strict Nash equilibria, and the convergence of time-averaged trajectories in zero-sum games with an interior Nash equilibrium.

preprint2016arXiv

On the robustness of learning in games with stochastically perturbed payoff observations

Motivated by the scarcity of accurate payoff feedback in practical applications of game theory, we examine a class of learning dynamics where players adjust their choices based on past payoff observations that are subject to noise and random disturbances. First, in the single-player case (corresponding to an agent trying to adapt to an arbitrarily changing environment), we show that the stochastic dynamics under study lead to no regret almost surely, irrespective of the noise level in the player's observations. In the multi-player case, we find that dominated strategies become extinct and we show that strict Nash equilibria are stochastically stable and attracting; conversely, if a state is stable or attracting with positive probability, then it is a Nash equilibrium. Finally, we provide an averaging principle for 2-player games, and we show that in zero-sum games with an interior equilibrium, time averages converge to Nash equilibrium for any noise level.

preprint2015arXiv

A stochastic approximation algorithm for stochastic semidefinite programming

Motivated by applications to multi-antenna wireless networks, we propose a distributed and asynchronous algorithm for stochastic semidefinite programming. This algorithm is a stochastic approximation of a continous- time matrix exponential scheme regularized by the addition of an entropy-like term to the problem's objective function. We show that the resulting algorithm converges almost surely to an $\varepsilon$-approximation of the optimal solution requiring only an unbiased estimate of the gradient of the problem's stochastic objective. When applied to throughput maximization in wireless multiple-input and multiple-output (MIMO) systems, the proposed algorithm retains its convergence properties under a wide array of mobility impediments such as user update asynchronicities, random delays and/or ergodically changing channels. Our theoretical analysis is complemented by extensive numerical simulations which illustrate the robustness and scalability of the proposed method in realistic network conditions.

preprint2015arXiv

Adaptive Power Allocation and Control in Time-Varying Multi-Carrier MIMO Networks

In this paper, we examine the fundamental trade-off between radiated power and achieved throughput in wireless multi-carrier, multiple-input and multiple-output (MIMO) systems that vary with time in an unpredictable fashion (e.g. due to changes in the wireless medium or the users' QoS requirements). Contrary to the static/stationary channel regime, there is no optimal power allocation profile to target (either static or in the mean), so the system's users must adapt to changes in the environment "on the fly", without being able to predict the system's evolution ahead of time. In this dynamic context, we formulate the users' power/throughput trade-off as an online optimization problem and we provide a matrix exponential learning algorithm that leads to no regret - i.e. the proposed transmit policy is asymptotically optimal in hindsight, irrespective of how the system evolves over time. Furthermore, we also examine the robustness of the proposed algorithm under imperfect channel state information (CSI) and we show that it retains its regret minimization properties under very mild conditions on the measurement noise statistics. As a result, users are able to track the evolution of their individually optimum transmit profiles remarkably well, even under rapidly changing network conditions and high uncertainty. Our theoretical analysis is validated by extensive numerical simulations corresponding to a realistic network deployment and providing further insights in the practical implementation aspects of the proposed algorithm.

preprint2015arXiv

Cost-Efficient Throughput Maximization in Multi-Carrier Cognitive Radio Systems

Cognitive radio (CR) systems allow opportunistic, secondary users (SUs) to access portions of the spectrum that are unused by the network's licensed primary users (PUs), provided that the induced interference does not compromise the primary users' performance guarantees. To account for interference constraints of this type, we consider a flexible spectrum access pricing scheme that charges secondary users based on the interference that they cause to the system's primary users (individually, globally, or both), and we examine how secondary users can maximize their achievable transmission rate in this setting. We show that the resulting non-cooperative game admits a unique Nash equilibrium under very mild assumptions on the pricing mechanism employed by the network operator, and under both static and ergodic (fast-fading) channel conditions. In addition, we derive a dynamic power allocation policy that converges to equilibrium within a few iterations (even for large numbers of users), and which relies only on local signal-to-interference-and-noise measurements; importantly, the proposed algorithm retains its convergence properties even in the ergodic channel regime, despite the inherent stochasticity thereof. Our theoretical analysis is complemented by extensive numerical simulations which illustrate the performance and scalability properties of the proposed pricing scheme under realistic network conditions.

preprint2015arXiv

Energy-Aware Competitive Power Allocation for Heterogeneous Networks Under QoS Constraints

This work proposes a distributed power allocation scheme for maximizing energy efficiency in the uplink of orthogonal frequency-division multiple access (OFDMA)-based heterogeneous networks (HetNets). The user equipment (UEs) in the network are modeled as rational agents that engage in a non-cooperative game where each UE allocates its available transmit power over the set of assigned subcarriers so as to maximize its individual utility (defined as the user's throughput per Watt of transmit power) subject to minimum-rate constraints. In this framework, the relevant solution concept is that of Debreu equilibrium, a generalization of Nash equilibrium which accounts for the case where an agent's set of possible actions depends on the actions of its opponents. Since the problem at hand might not be feasible, Debreu equilibria do not always exist. However, using techniques from fractional programming, we provide a characterization of equilibrial power allocation profiles when they do exist. In particular, Debreu equilibria are found to be the fixed points of a water-filling best response operator whose water level is a function of minimum rate constraints and circuit power. Moreover, we also describe a set of sufficient conditions for the existence and uniqueness of Debreu equilibria exploiting the contraction properties of the best response operator. This analysis provides the necessary tools to derive a power allocation scheme that steers the network to equilibrium in an iterative and distributed manner without the need for any centralized processing. Numerical simulations are then used to validate the analysis and assess the performance of the proposed algorithm as a function of the system parameters.

preprint2015arXiv

In an Uncertain World: Distributed Optimization in MIMO Systems with Imperfect Information

In this paper, we introduce a distributed algorithm that optimizes the Gaussian signal covariance matrices of multi-antenna users transmitting to a common multi-antenna receiver under imperfect and possibly delayed channel state information. The algorithm is based on an extension of exponential learning techniques to a semidefinite setting and it requires the same information as distributed water-filling methods. Unlike water-filling however, the proposed matrix exponential learning (MXL) algorithm converges to the system's optimum signal covariance profile under very mild conditions on the channel uncertainty statistics; moreover, the algorithm retains its convergence properties even in the presence of user update asynchronicities, random delays and/or ergodically changing channel conditions. In particular, by properly tuning the algorithm's learning rate (or step size), the algorithm converges within a few iterations, even for large numbers of users and/or antennas per user. Our theoretical analysis is complemented by numerical simulations which illustrate the algorithm's robustness and scalability in realistic network conditions.

preprint2015arXiv

Inertial game dynamics and applications to constrained optimization

Aiming to provide a new class of game dynamics with good long-term rationality properties, we derive a second-order inertial system that builds on the widely studied "heavy ball with friction" optimization method. By exploiting a well-known link between the replicator dynamics and the Shahshahani geometry on the space of mixed strategies, the dynamics are stated in a Riemannian geometric framework where trajectories are accelerated by the players' unilateral payoff gradients and they slow down near Nash equilibria. Surprisingly (and in stark contrast to another second-order variant of the replicator dynamics), the inertial replicator dynamics are not well-posed; on the other hand, it is possible to obtain a well-posed system by endowing the mixed strategy space with a different Hessian-Riemannian (HR) metric structure, and we characterize those HR geometries that do so. In the single-agent version of the dynamics (corresponding to constrained optimization over simplex-like objects), we show that regular maximum points of smooth functions attract all nearby solution orbits with low initial speed. More generally, we establish an inertial variant of the so-called "folk theorem" of evolutionary game theory and we show that strict equilibria are attracting in asymmetric (multi-population) games - provided of course that the dynamics are well-posed. A similar asymptotic stability result is obtained for evolutionarily stable strategies in symmetric (single- population) games.

preprint2015arXiv

Learning to be green: robust energy efficiency maximization in dynamic MIMO-OFDM systems

In this paper, we examine the maximization of energy efficiency (EE) in next-generation multi-user MIMO-OFDM networks that evolve dynamically over time - e.g. due to user mobility, fluctuations in the wireless medium, modulations in the users' load, etc. Contrary to the static/stationary regime, the system may evolve in an arbitrary manner so, targeting a fixed optimum state (either static or in the mean) becomes obsolete; instead, users must adjust to changes in the system "on the fly", without being able to predict the state of the system in advance. To tackle these issues, we propose a simple and distributed online optimization policy that leads to no regret, i.e. it allows users to match (and typically outperform) even the best fixed transmit policy in hindsight, irrespective of how the system varies with time. Moreover, to account for the scarcity of perfect channel state information (CSI) in massive MIMO systems, we also study the algorithm's robustness in the presence of measurement errors and observation noise. Importantly, the proposed policy retains its no-regret properties under very mild assumptions on the error statistics and, on average, it enjoys the same performance guarantees as in the noiseless, deterministic case. Our analysis is supplemented by extensive numerical simulations which show that, in realistic network environments, users track their individually optimum transmit profile even under rapidly changing channel conditions, achieving gains of up to 600% in energy efficiency over uniform power allocation policies.

preprint2014arXiv

A continuous-time approach to online optimization

We consider a family of learning strategies for online optimization problems that evolve in continuous time and we show that they lead to no regret. From a more traditional, discrete-time viewpoint, this continuous-time approach allows us to derive the no-regret properties of a large class of discrete-time algorithms including as special cases the exponential weight algorithm, online mirror descent, smooth fictitious play and vanishingly smooth fictitious play. In so doing, we obtain a unified view of many classical regret bounds, and we show that they can be decomposed into a term stemming from continuous-time considerations and a term which measures the disparity between discrete and continuous time. As a result, we obtain a general class of infinite horizon learning strategies that guarantee an $\mathcal{O}(n^{-1/2})$ regret bound without having to resort to a doubling trick.

preprint2014arXiv

Game-theoretical control with continuous action sets

Motivated by the recent applications of game-theoretical learning techniques to the design of distributed control systems, we study a class of control problems that can be formulated as potential games with continuous action sets, and we propose an actor-critic reinforcement learning algorithm that provably converges to equilibrium in this class of problems. The method employed is to analyse the learning process under study through a mean-field dynamical system that evolves in an infinite-dimensional function space (the space of probability distributions over the players' continuous controls). To do so, we extend the theory of finite-dimensional two-timescale stochastic approximation to an infinite-dimensional, Banach space setting, and we prove that the continuous dynamics of the process converge to equilibrium in the case of potential games. These results combine to give a provably-convergent learning algorithm in which players do not need to keep track of the controls selected by the other agents.

preprint2014arXiv

Imitation Dynamics with Payoff Shocks

We investigate the impact of payoff shocks on the evolution of large populations of myopic players that employ simple strategy revision protocols such as the "imitation of success". In the noiseless case, this process is governed by the standard (deterministic) replicator dynamics; in the presence of noise however, the induced stochastic dynamics are different from previous versions of the stochastic replicator dynamics (such as the aggregate-shocks model of Fudenberg and Harris, 1992). In this context, we show that strict equilibria are always stochastically asymptotically stable, irrespective of the magnitude of the shocks; on the other hand, in the high-noise regime, non-equilibrium states may also become stochastically asymptotically stable and dominated strategies may survive in perpetuity (they become extinct if the noise is low). Such behavior is eliminated if players are less myopic and revise their strategies based on their cumulative payoffs. In this case, we obtain a second order stochastic dynamical system whose attracting states coincide with the game's strict equilibria and where dominated strategies become extinct (a.s.), no matter the noise level.

preprint2014arXiv

Penalty-regulated dynamics and robust learning procedures in games

Starting from a heuristic learning scheme for N-person games, we derive a new class of continuous-time learning dynamics consisting of a replicator-like drift adjusted by a penalty term that renders the boundary of the game's strategy space repelling. These penalty-regulated dynamics are equivalent to players keeping an exponentially discounted aggregate of their on-going payoffs and then using a smooth best response to pick an action based on these performance scores. Owing to this inherent duality, the proposed dynamics satisfy a variant of the folk theorem of evolutionary game theory and they converge to (arbitrarily precise) approximations of Nash equilibria in potential games. Motivated by applications to traffic engineering, we exploit this duality further to design a discrete-time, payoff-based learning algorithm which retains these convergence properties and only requires players to observe their in-game payoffs: moreover, the algorithm remains robust in the presence of stochastic perturbations and observation errors, and it does not require any synchronization between players.

preprint2014arXiv

Transmit without regrets: Online optimization in MIMO-OFDM cognitive radio systems

In this paper, we examine cognitive radio systems that evolve dynamically over time due to changing user and environmental conditions. To combine the advantages of orthogonal frequency division multiplexing (OFDM) and multiple-input, multiple-output (MIMO) technologies, we consider a MIMO-OFDM cognitive radio network where wireless users with multiple antennas communicate over several non-interfering frequency bands. As the network's primary users (PUs) come and go in the system, the communication environment changes constantly (and, in many cases, randomly). Accordingly, the network's unlicensed, secondary users (SUs) must adapt their transmit profiles "on the fly" in order to maximize their data rate in a rapidly evolving environment over which they have no control. In this dynamic setting, static solution concepts (such as Nash equilibrium) are no longer relevant, so we focus on dynamic transmit policies that lead to no regret: specifically, we consider policies that perform at least as well as (and typically outperform) even the best fixed transmit profile in hindsight. Drawing on the method of matrix exponential learning and online mirror descent techniques, we derive a no-regret transmit policy for the system's SUs which relies only on local channel state information (CSI). Using this method, the system's SUs are able to track their individually evolving optimum transmit profiles remarkably well, even under rapidly (and randomly) changing conditions. Importantly, the proposed augmented exponential learning (AXL) policy leads to no regret even if the SUs' channel measurements are subject to arbitrarily large observation errors (the imperfect CSI case), thus ensuring the method's robustness in the presence of uncertainties.

preprint2013arXiv

Higher Order Game Dynamics

Continuous-time game dynamics are typically first order systems where payoffs determine the growth rate of the players' strategy shares. In this paper, we investigate what happens beyond first order by viewing payoffs as higher order forces of change, specifying e.g. the acceleration of the players' evolution instead of its velocity (a viewpoint which emerges naturally when it comes to aggregating empirical data of past instances of play). To that end, we derive a wide class of higher order game dynamics, generalizing first order imitative dynamics, and, in particular, the replicator dynamics. We show that strictly dominated strategies become extinct in n-th order payoff-monotonic dynamics n orders as fast as in the corresponding first order dynamics; furthermore, in stark contrast to first order, weakly dominated strategies also become extinct for n>1. All in all, higher order payoff-monotonic dynamics lead to the elimination of weakly dominated strategies, followed by the iterated deletion of strictly dominated strategies, thus providing a dynamic justification of the well-known epistemic rationalizability process of Dekel and Fudenberg (1990). Finally, we also establish a higher order analogue of the folk theorem of evolutionary game theory, and we show that con- vergence to strict equilibria in n-th order dynamics is n orders as fast as in first order.

preprint2013arXiv

Power Optimization in Random Wireless Networks

Consider a wireless network of transmitter-receiver pairs where the transmitters adjust their powers to maintain a target SINR level in the presence of interference. In this paper, we analyze the optimal power vector that achieves this target in large, random networks obtained by "erasing" a finite fraction of nodes from a regular lattice of transmitter-receiver pairs. We show that this problem is equivalent to the so-called Anderson model of electron motion in dirty metals which has been used extensively in the analysis of diffusion in random environments. A standard approximation to this model is the so-called coherent potential approximation (CPA) method which we apply to evaluate the first and second order intra-sample statistics of the optimal power vector in one- and two-dimensional systems. This approach is equivalent to traditional techniques from random matrix theory and free probability, but while generally accurate (and in agreement with numerical simulations), it fails to fully describe the system: in particular, results obtained in this way fail to predict when power control becomes infeasible. In this regard, we find that the infinite system is always unstable beyond a certain value of the target SINR, but any finite system only has a small probability of becoming unstable. This instability probability is proportional to the tails of the eigenvalue distribution of the system which are calculated to exponential accuracy using methodologies developed within the Anderson model and its ties with random walks in random media. Finally, using these techniques, we also calculate the tails of the system's power distribution under power control and the rate of convergence of the Foschini-Miljanic power control algorithm in the presence of random erasures. Overall, in the paper we try to strike a balance between intuitive arguments and formal proofs.

preprint2011arXiv

Distributed Learning Policies for Power Allocation in Multiple Access Channels

We analyze the problem of distributed power allocation for orthogonal multiple access channels by considering a continuous non-cooperative game whose strategy space represents the users' distribution of transmission power over the network's channels. When the channels are static, we find that this game admits an exact potential function and this allows us to show that it has a unique equilibrium almost surely. Furthermore, using the game's potential property, we derive a modified version of the replicator dynamics of evolutionary game theory which applies to this continuous game, and we show that if the network's users employ a distributed learning scheme based on these dynamics, then they converge to equilibrium exponentially quickly. On the other hand, a major challenge occurs if the channels do not remain static but fluctuate stochastically over time, following a stationary ergodic process. In that case, the associated ergodic game still admits a unique equilibrium, but the learning analysis becomes much more complicated because the replicator dynamics are no longer deterministic. Nonetheless, by employing results from the theory of stochastic approximation, we show that users still converge to the game's unique equilibrium. Our analysis hinges on a game-theoretical result which is of independent interest: in finite player games which admit a (possibly nonlinear) convex potential function, the replicator dynamics (suitably modified to account for nonlinear payoffs) converge to an eps-neighborhood of an equilibrium at time of order O(log(1/eps)).

preprint2010arXiv

Balancing Traffic in Networks: Redundancy, Learning and the Effect of Stochastic Fluctuations

We study the distribution of traffic in networks whose users try to minimise their delays by adhering to a simple learning scheme inspired by the replicator dynamics of evolutionary game theory. The stable steady states of these dynamics coincide with the network's Wardrop equilibria and form a convex polytope whose dimension is determined by the network's redundancy (an important concept which measures the "linear dependence" of the users' paths). Despite this abundance of stationary points, the long-term behaviour of the replicator dynamics turns out to be remarkably simple: every solution orbit converges to a Wardrop equilibrium. On the other hand, a major challenge occurs when the users' delays fluctuate unpredictably due to random external factors. In that case, interior equilibria are no longer stationary, but strict equilibria remain stochastically stable irrespective of the fluctuations' magnitude. In fact, if the network has no redundancy and the users are patient enough, we show that the long-term averages of the users' traffic flows converge to the vicinity of an equilibrium, and we also estimate the corresponding invariant measure.

preprint2010arXiv

Dynamic Power Allocation Games in Parallel Multiple Access Channels

We analyze the distributed power allocation problem in parallel multiple access channels (MAC) by studying an associated non-cooperative game which admits an exact potential. Even though games of this type have been the subject of considerable study in the literature, we find that the sufficient conditions which ensure uniqueness of Nash equilibrium points typically do not hold in this context. Nonetheless, we show that the parallel MAC game admits a unique equilibrium almost surely, thus establishing an important class of counterexamples where these sufficient conditions are not necessary. Furthermore, if the network's users employ a distributed learning scheme based on the replicator dynamics, we show that they converge to equilibrium from almost any initial condition, even though users only have local information at their disposal.

preprint2010arXiv

The emergence of rational behavior in the presence of stochastic perturbations

We study repeated games where players use an exponential learning scheme in order to adapt to an ever-changing environment. If the game's payoffs are subject to random perturbations, this scheme leads to a new stochastic version of the replicator dynamics that is quite different from the "aggregate shocks" approach of evolutionary game theory. Irrespective of the perturbations' magnitude, we find that strategies which are dominated (even iteratively) eventually become extinct and that the game's strict Nash equilibria are stochastically asymptotically stable. We complement our analysis by illustrating these results in the case of congestion games.

Panayotis Mertikopoulos

What is connected

Connect this record

See the researcher in context

Building this map preview

38 published item(s)

Explicit Second-Order Min-Max Optimization: Practical Algorithms and Complexity Analysis

A universal black-box optimization method with almost dimension-free convergence rate guarantees

Asymptotic Degradation of Linear Regression Estimates With Strategic Data Sources

Learning in Games with Quantized Payoff Observations

Multi-Agent Online Optimization with Delays: Asynchronicity, Adaptivity, and Optimism

Nested bandits

Pick your Neighbor: Local Gauss-Southwell Rule for Fast Asynchronous Decentralized Optimization

Routing in an Uncertain World: Adaptivity, Efficiency, and Equilibrium

Multi-agent online learning in time-varying games

Survival of the strictest: Stable and unstable equilibria under regularized learning with partial information

The limits of min-max optimization algorithms: convergence to spurious non-critical sets

A new regret analysis for Adam-type algorithms

Gradient-free Online Learning in Games with Delayed Rewards

On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems

On the convergence of single-call stochastic extra-gradient methods

Quick or cheap? Breaking points in dynamic markets

Boltzmann meets Nash: Energy-efficient routing in optical networks under uncertainty

Exponentially fast convergence to (strict) equilibrium via hedging

Learning in games via reinforcement and regularization

On the robustness of learning in games with stochastically perturbed payoff observations

A stochastic approximation algorithm for stochastic semidefinite programming

Adaptive Power Allocation and Control in Time-Varying Multi-Carrier MIMO Networks

Cost-Efficient Throughput Maximization in Multi-Carrier Cognitive Radio Systems

Energy-Aware Competitive Power Allocation for Heterogeneous Networks Under QoS Constraints

In an Uncertain World: Distributed Optimization in MIMO Systems with Imperfect Information

Inertial game dynamics and applications to constrained optimization

Learning to be green: robust energy efficiency maximization in dynamic MIMO-OFDM systems

A continuous-time approach to online optimization

Game-theoretical control with continuous action sets

Imitation Dynamics with Payoff Shocks

Penalty-regulated dynamics and robust learning procedures in games

Transmit without regrets: Online optimization in MIMO-OFDM cognitive radio systems

Higher Order Game Dynamics

Power Optimization in Random Wireless Networks

Distributed Learning Policies for Power Allocation in Multiple Access Channels

Balancing Traffic in Networks: Redundancy, Learning and the Effect of Stochastic Fluctuations

Dynamic Power Allocation Games in Parallel Multiple Access Channels

The emergence of rational behavior in the presence of stochastic perturbations