Source author record

Ioannis Panageas

Ioannis Panageas appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.DS Computer Science and Game Theory math.OC Multiagent Systems Populations and Evolution Computation Computational Complexity Computational Engineering, Finance, and Science Data Structures and Algorithms Discrete Mathematics math.PR math.SP math.ST Quantitative Methods Statistics Theory

Catalog footprint

What is connected

13works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Accelerated Multiplicative Weights Update Avoids Saddle Points almost always

We consider non-convex optimization problems with constraint that is a product of simplices. A commonly used algorithm in solving this type of problem is the Multiplicative Weights Update (MWU), an algorithm that is widely used in game theory, machine learning and multi-agent systems. Despite it has been known that MWU avoids saddle points, there is a question that remains unaddressed:"Is there an accelerated version of MWU that avoids saddle points provably?" In this paper we provide a positive answer to above question. We provide an accelerated MWU based on Riemannian Accelerated Gradient Descent, and prove that the Riemannian Accelerated Gradient Descent, thus the accelerated MWU, almost always avoid saddle points.

preprint2022arXiv

Efficiently Computing Nash Equilibria in Adversarial Team Markov Games

Computing Nash equilibrium policies is a central problem in multi-agent reinforcement learning that has received extensive attention both in theory and in practice. However, provable guarantees have been thus far either limited to fully competitive or cooperative scenarios or impose strong assumptions that are difficult to meet in most practical applications. In this work, we depart from those prior results by investigating infinite-horizon \emph{adversarial team Markov games}, a natural and well-motivated class of games in which a team of identically-interested players -- in the absence of any explicit coordination or communication -- is competing against an adversarial player. This setting allows for a unifying treatment of zero-sum Markov games and Markov potential games, and serves as a step to model more realistic strategic interactions that feature both competing and cooperative interests. Our main contribution is the first algorithm for computing stationary $ε$-approximate Nash equilibria in adversarial team Markov games with computational complexity that is polynomial in all the natural parameters of the game, as well as $1/ε$. The proposed algorithm is particularly natural and practical, and it is based on performing independent policy gradient steps for each player in the team, in tandem with best responses from the side of the adversary; in turn, the policy for the adversary is then obtained by solving a carefully constructed linear program. Our analysis leverages non-standard techniques to establish the KKT optimality conditions for a nonlinear program with nonconvex constraints, thereby leading to a natural interpretation of the induced Lagrange multipliers. Along the way, we significantly extend an important characterization of optimal policies in adversarial (normal-form) team games due to Von Stengel and Koller (GEB `97).

preprint2020arXiv

Better Depth-Width Trade-offs for Neural Networks through the lens of Dynamical Systems

The expressivity of neural networks as a function of their depth, width and type of activation units has been an important question in deep learning theory. Recently, depth separation results for ReLU networks were obtained via a new connection with dynamical systems, using a generalized notion of fixed points of a continuous map $f$, called periodic points. In this work, we strengthen the connection with dynamical systems and we improve the existing width lower bounds along several aspects. Our first main result is period-specific width lower bounds that hold under the stronger notion of $L^1$-approximation error, instead of the weaker classification error. Our second contribution is that we provide sharper width lower bounds, still yielding meaningful exponential depth-width separations, in regimes where previous results wouldn't apply. A byproduct of our results is that there exists a universal constant characterizing the depth-width trade-offs, as long as $f$ has odd periods. Technically, our results follow by unveiling a tighter connection between the following three quantities of a given function: its period, its Lipschitz constant and the growth rate of the number of oscillations arising under compositions of the function $f$ with itself.

preprint2020arXiv

Convergence to Second-Order Stationarity for Non-negative Matrix Factorization: Provably and Concurrently

Non-negative matrix factorization (NMF) is a fundamental non-convex optimization problem with numerous applications in Machine Learning (music analysis, document clustering, speech-source separation etc). Despite having received extensive study, it is poorly understood whether or not there exist natural algorithms that can provably converge to a local minimum. Part of the reason is because the objective is heavily symmetric and its gradient is not Lipschitz. In this paper we define a multiplicative weight update type dynamics (modification of the seminal Lee-Seung algorithm) that runs concurrently and provably avoids saddle points (first order stationary points that are not second order). Our techniques combine tools from dynamical systems such as stability and exploit the geometry of the NMF objective by reducing the standard NMF formulation over the non-negative orthant to a new formulation over (a scaled) simplex. An important advantage of our method is the use of concurrent updates, which permits implementations in parallel computing environments.

preprint2020arXiv

Efficient Statistics for Sparse Graphical Models from Truncated Samples

In this paper, we study high-dimensional estimation from truncated samples. We focus on two fundamental and classical problems: (i) inference of sparse Gaussian graphical models and (ii) support recovery of sparse linear models. (i) For Gaussian graphical models, suppose $d$-dimensional samples ${\bf x}$ are generated from a Gaussian $N(μ,Σ)$ and observed only if they belong to a subset $S \subseteq \mathbb{R}^d$. We show that $μ$ and $Σ$ can be estimated with error $ε$ in the Frobenius norm, using $\tilde{O}\left(\frac{\textrm{nz}(Σ^{-1})}{ε^2}\right)$ samples from a truncated $\mathcal{N}(μ,Σ)$ and having access to a membership oracle for $S$. The set $S$ is assumed to have non-trivial measure under the unknown distribution but is otherwise arbitrary. (ii) For sparse linear regression, suppose samples $({\bf x},y)$ are generated where $y = {\bf x}^\top{Ω^*} + \mathcal{N}(0,1)$ and $({\bf x}, y)$ is seen only if $y$ belongs to a truncation set $S \subseteq \mathbb{R}$. We consider the case that $Ω^*$ is sparse with a support set of size $k$. Our main result is to establish precise conditions on the problem dimension $d$, the support size $k$, the number of observations $n$, and properties of the samples and the truncation that are sufficient to recover the support of $Ω^*$. Specifically, we show that under some mild assumptions, only $O(k^2 \log d)$ samples are needed to estimate $Ω^*$ in the $\ell_\infty$-norm up to a bounded error. For both problems, our estimator minimizes the sum of the finite population negative log-likelihood function and an $\ell_1$-regularization term.

preprint2020arXiv

Logistic-Regression with peer-group effects via inference in higher order Ising models

Spin glass models, such as the Sherrington-Kirkpatrick, Hopfield and Ising models, are all well-studied members of the exponential family of discrete distributions, and have been influential in a number of application domains where they are used to model correlation phenomena on networks. Conventionally these models have quadratic sufficient statistics and consequently capture correlations arising from pairwise interactions. In this work we study extensions of these to models with higher-order sufficient statistics, modeling behavior on a social network with peer-group effects. In particular, we model binary outcomes on a network as a higher-order spin glass, where the behavior of an individual depends on a linear function of their own vector of covariates and some polynomial function of the behavior of others, capturing peer-group effects. Using a {\em single}, high-dimensional sample from such model our goal is to recover the coefficients of the linear function as well as the strength of the peer-group effects. The heart of our result is a novel approach for showing strong concavity of the log pseudo-likelihood of the model, implying statistical error rate of $\sqrt{d/n}$ for the Maximum Pseudo-Likelihood Estimator (MPLE), where $d$ is the dimensionality of the covariate vectors and $n$ is the size of the network (number of nodes). Our model generalizes vanilla logistic regression as well as the peer-effect models studied in recent works, and our results extend these results to accommodate higher-order interactions.

preprint2020arXiv

Multiplicative Weights Update as a Distributed Constrained Optimization Algorithm: Convergence to Second-order Stationary Points Almost Always

Non-concave maximization has been the subject of much recent study in the optimization and machine learning communities, specifically in deep learning. Recent papers Ge et al, Lee et al (and references therein) indicate that first order methods work well and avoid saddle points. Results as in Lee et al, however, are limited to the \textit{unconstrained} case or for cases where the critical points are in the interior of the feasibility set, which fail to capture some of the most interesting applications. In this paper we focus on \textit{constrained} non-concave maximization. We analyze a variant of a well-established algorithm in machine learning called Multiplicative Weights Update (MWU) for the maximization problem $\max_{\mathbf{x} \in D} P(\mathbf{x})$, where $P$ is non-concave, twice continuously differentiable and $D$ is a product of simplices. We show that MWU converges almost always for small enough stepsizes to critical points that satisfy the second order KKT conditions. We combine techniques from dynamical systems as well as taking advantage of a recent connection between Baum Eagon inequality and MWU (Palaiopanos et al).

preprint2020arXiv

On the Analysis of EM for truncated mixtures of two Gaussians

Motivated by a recent result of Daskalakis et al. 2018, we analyze the population version of Expectation-Maximization (EM) algorithm for the case of \textit{truncated} mixtures of two Gaussians. Truncated samples from a $d$-dimensional mixture of two Gaussians $\frac{1}{2} \mathcal{N}(\vecμ, \vecΣ)+ \frac{1}{2} \mathcal{N}(-\vecμ, \vecΣ)$ means that a sample is only revealed if it falls in some subset $S \subset \mathbb{R}^d$ of positive (Lebesgue) measure. We show that for $d=1$, EM converges almost surely (under random initialization) to the true mean (variance $σ^2$ is known) for any measurable set $S$. Moreover, for $d>1$ we show EM almost surely converges to the true mean for any measurable set $S$ when the map of EM has only three fixed points, namely $-\vecμ, \vec{0}, \vecμ$ (covariance matrix $\vecΣ$ is known), and prove local convergence if there are more than three fixed points. We also provide convergence rates of our findings. Our techniques deviate from those of Daskalakis et al. 2017, which heavily depend on symmetry that the untruncated problem exhibits. For example, for an arbitrary measurable set $S$, it is impossible to compute a closed form of the update rule of EM. Moreover, arbitrarily truncating the mixture, induces further correlations among the variables. We circumvent these challenges by using techniques from dynamical systems, probability and statistics; implicit function theorem, stability analysis around the fixed points of the update rule of EM and correlation inequalities (FKG).

preprint2016arXiv

Average Case Performance of Replicator Dynamics in Potential Games via Computing Regions of Attraction

What does it mean to fully understand the behavior of a network of adaptive agents? The golden standard typically is the behavior of learning dynamics in potential games, where many evolutionary dynamics, e.g., replicator, are known to converge to sets of equilibria. Even in such classic settings many critical questions remain unanswered. We examine issues such as: Point-wise convergence: Does the system actually equilibrate even in the presence of continuums of equilibria? Computing regions of attraction: Given point-wise convergence can we compute the region of asymptotic stability of each equilibrium (e.g., estimate its volume, geometry)? System invariants: Invariant functions remain constant along every system trajectory. This notion is orthogonal to the game theoretic concept of a potential function, which always strictly increases/decreases along system trajectories. Do dynamics in potential games exhibit invariant functions? If so, how many? How do these functions look like? Based on these geometric characterizations, we propose a novel quantitative framework for analyzing the efficiency of potential games with many equilibria. The predictions of different equilibria are weighted by their probability to arise under evolutionary dynamics given uniformly random initial conditions. This average case analysis is shown to offer novel insights in classic game theoretic challenges, including quantifying the risk dominance in stag-hunt games and allowing for more nuanced performance analysis in networked coordination and congestion games with large gaps between price of stability and price of anarchy.

preprint2016arXiv

Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions

Given a non-convex twice differentiable cost function f, we prove that the set of initial conditions so that gradient descent converges to saddle points where \nabla^2 f has at least one strictly negative eigenvalue has (Lebesgue) measure zero, even for cost functions f with non-isolated critical points, answering an open question in [Lee, Simchowitz, Jordan, Recht, COLT2016]. Moreover, this result extends to forward-invariant convex subspaces, allowing for weak (non-globally Lipschitz) smoothness assumptions. Finally, we produce an upper bound on the allowable step-size.

preprint2016arXiv

Mutation, Sexual Reproduction and Survival in Dynamic Environments

A new approach to understanding evolution [Val09], namely viewing it through the lens of computation, has already started yielding new insights, e.g., natural selection under sexual reproduction can be interpreted as the Multiplicative Weight Update (MWU) Algorithm in coordination games played among genes [CLPV14]. Using this machinery, we study the role of mutation in changing environments in the presence of sexual reproduction. Following [WVA05], we model changing environments via a Markov chain, with the states representing environments, each with its own fitness matrix. In this setting, we show that in the absence of mutation, the population goes extinct, but in the presence of mutation, the population survives with positive probability. On the way to proving the above theorem, we need to establish some facts about dynamics in games. We provide the first, to our knowledge, polynomial convergence bound for noisy MWU in a coordination game. Finally, we also show that in static environments, sexual evolution with mutation converges, for any level of mutation.

preprint2015arXiv

The Complexity of Genetic Diversity

A key question in biological systems is whether genetic diversity persists in the long run under evolutionary competition or whether a single dominant genotype emerges. Classic work by Kalmus in 1945 has established that even in simple diploid species (species with two chromosomes) diversity can be guaranteed as long as the heterozygote individuals enjoy a selective advantage. Despite the classic nature of the problem, as we move towards increasingly polymorphic traits (e.g. human blood types) predicting diversity and understanding its implications is still not fully understood. Our key contribution is to establish complexity theoretic hardness results implying that even in the textbook case of single locus diploid models predicting whether diversity survives or not given its fitness landscape is algorithmically intractable. We complement our results by establishing that under randomly chosen fitness landscapes diversity survives with significant probability. Our results are structurally robust along several dimensions (e.g., choice of parameter distribution, different definitions of stability/persistence, restriction to typical subclasses of fitness landscapes). Technically, our results exploit connections between game theory, nonlinear dynamical systems, complexity theory and biology and establish hardness results for predicting the evolution of a deterministic variant of the well known multiplicative weights update algorithm in symmetric coordination games which could be of independent interest.

preprint2014arXiv

Natural Selection as an Inhibitor of Genetic Diversity: Multiplicative Weights Updates Algorithm and a Conjecture of Haploid Genetics

In a recent series of papers a surprisingly strong connection was discovered between standard models of evolution in mathematical biology and Multiplicative Weights Updates Algorithm, a ubiquitous model of online learning and optimization. These papers establish that mathematical models of biological evolution are tantamount to applying discrete Multiplicative Weights Updates Algorithm, a close variant of MWUA, on coordination games. This connection allows for introducing insights from the study of game theoretic dynamics into the field of mathematical biology. Using these results as a stepping stone, we show that mathematical models of haploid evolution imply the extinction of genetic diversity in the long term limit, a widely believed conjecture in genetics. In game theoretic terms we show that in the case of coordination games, under minimal genericity assumptions, discrete MWUA converges to pure Nash equilibria for all but a zero measure of initial conditions. This result holds despite the fact that mixed Nash equilibria can be exponentially (or even uncountably) many, completely dominating in number the set of pure Nash equilibria. Thus, in haploid organisms the long term preservation of genetic diversity needs to be safeguarded by other evolutionary mechanisms such as mutations and speciation.

Ioannis Panageas

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

Accelerated Multiplicative Weights Update Avoids Saddle Points almost always

Efficiently Computing Nash Equilibria in Adversarial Team Markov Games

Better Depth-Width Trade-offs for Neural Networks through the lens of Dynamical Systems

Convergence to Second-Order Stationarity for Non-negative Matrix Factorization: Provably and Concurrently

Efficient Statistics for Sparse Graphical Models from Truncated Samples

Logistic-Regression with peer-group effects via inference in higher order Ising models

Multiplicative Weights Update as a Distributed Constrained Optimization Algorithm: Convergence to Second-order Stationary Points Almost Always

On the Analysis of EM for truncated mixtures of two Gaussians

Average Case Performance of Replicator Dynamics in Potential Games via Computing Regions of Attraction

Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions

Mutation, Sexual Reproduction and Survival in Dynamic Environments

The Complexity of Genetic Diversity

Natural Selection as an Inhibitor of Genetic Diversity: Multiplicative Weights Updates Algorithm and a Conjecture of Haploid Genetics