Source author record

Constantinos Daskalakis

Constantinos Daskalakis appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

51works

22topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

The Conflict Graph Design: Estimating Causal Effects under Arbitrary Neighborhood Interference

A fundamental problem in network experiments is selecting an appropriate experimental design in order to precisely estimate a given causal effect of interest. In this work, we propose the Conflict Graph Design, a general approach for constructing experiment designs under network interference with the goal of precisely estimating a pre-specified causal effect. A central aspect of our approach is the notion of a conflict graph, which captures the fundamental unobservability associated with the causal effect and the underlying network. In order to estimate effects, we propose a modified Horvitz--Thompson estimator. We show that its variance under the Conflict Graph Design is bounded as $O(λ(H) / n )$, where $λ(H)$ is the largest eigenvalue of the adjacency matrix of the conflict graph. These rates depend on both the underlying network and the particular causal effect under investigation. Not only does this yield the best known rates of estimation for several well-studied causal effects (e.g. the global and direct effects) but it also provides new methods for effects which have received less attention from the perspective of experiment design (e.g. spill-over effects). Finally, we construct conservative variance estimators which facilitate asymptotically valid confidence intervals for the causal effect of interest.

preprint2022arXiv

Efficient Truncated Linear Regression with Unknown Noise Variance

Truncated linear regression is a classical challenge in Statistics, wherein a label, $y = w^T x + \varepsilon$, and its corresponding feature vector, $x \in \mathbb{R}^k$, are only observed if the label falls in some subset $S \subseteq \mathbb{R}$; otherwise the existence of the pair $(x, y)$ is hidden from observation. Linear regression with truncated observations has remained a challenge, in its general form, since the early works of~\citet{tobin1958estimation,amemiya1973regression}. When the distribution of the error is normal with known variance, recent work of~\citet{daskalakis2019truncatedregression} provides computationally and statistically efficient estimators of the linear model, $w$. In this paper, we provide the first computationally and statistically efficient estimators for truncated linear regression when the noise variance is unknown, estimating both the linear model and the variance of the noise. Our estimator is based on an efficient implementation of Projected Stochastic Gradient Descent on the negative log-likelihood of the truncated sample. Importantly, we show that the error of our estimates is asymptotically normal, and we use this to provide explicit confidence regions for our estimates.

preprint2022arXiv

Estimation of Standard Auction Models

We provide efficient estimation methods for first- and second-price auctions under independent (asymmetric) private values and partial observability. Given a finite set of observations, each comprising the identity of the winner and the price they paid in a sequence of identical auctions, we provide algorithms for non-parametrically estimating the bid distribution of each bidder, as well as their value distributions under equilibrium assumptions. We provide finite-sample estimation bounds which are uniform in that their error rates do not depend on the bid/value distributions being estimated. Our estimation guarantees advance a body of work in Econometrics wherein only identification results have been obtained, unless the setting is symmetric, parametric, or all bids are observable. Our guarantees also provide computationally and statistically effective alternatives to classical techniques from reliability theory. Finally, our results are immediately applicable to Dutch and English auctions.

preprint2022arXiv

Fast Rates for Nonparametric Online Learning: From Realizability to Learning in Games

We study fast rates of convergence in the setting of nonparametric online regression, namely where regret is defined with respect to an arbitrary function class which has bounded complexity. Our contributions are two-fold: - In the realizable setting of nonparametric online regression with the absolute loss, we propose a randomized proper learning algorithm which gets a near-optimal cumulative loss in terms of the sequential fat-shattering dimension of the hypothesis class. In the setting of online classification with a class of Littlestone dimension $d$, our bound reduces to $d \cdot {\rm poly} \log T$. This result answers a question as to whether proper learners could achieve near-optimal cumulative loss; previously, even for online classification, the best known cumulative loss was $\tilde O( \sqrt{dT})$. Further, for the real-valued (regression) setting, a cumulative loss bound with near-optimal scaling on sequential fat-shattering dimension was not even known for improper learners, prior to this work. - Using the above result, we exhibit an independent learning algorithm for general-sum binary games of Littlestone dimension $d$, for which each player achieves regret $\tilde O(d^{3/4} \cdot T^{1/4})$. This result generalizes analogous results of Syrgkanis et al. (2015) who showed that in finite games the optimal regret can be accelerated from $O(\sqrt{T})$ in the adversarial setting to $O(T^{1/4})$ in the game setting. To establish the above results, we introduce several new techniques, including: a hierarchical aggregation rule to achieve the optimal cumulative loss for real-valued classes, a multi-scale extension of the proper online realizable learner of Hanneke et al. (2021), an approach to show that the output of such nonparametric learning algorithms is stable, and a proof that the minimax theorem holds in all online learnable games.

preprint2022arXiv

Recommender Systems meet Mechanism Design

Machine learning has developed a variety of tools for learning and representing high-dimensional distributions with structure. Recent years have also seen big advances in designing multi-item mechanisms. Akin to overfitting, however, these mechanisms can be extremely sensitive to the Bayesian prior that they target, which becomes problematic when that prior is only approximately known. At the same time, even if access to the exact Bayesian prior is given, it is known that optimal or even approximately optimal multi-item mechanisms run into sample, computational, representation and communication intractability barriers. We consider a natural class of multi-item mechanism design problems with very large numbers of items, but where the bidders' value distributions can be well-approximated by a topic model akin to those used in recommendation systems with very large numbers of possible recommendations. We propose a mechanism design framework for this setting, building on a recent robustification framework by Brustle et al., which disentangles the statistical challenge of estimating a multi-dimensional prior from the task of designing a good mechanism for it, and robustifies the performance of the latter against the estimation error of the former. We provide an extension of this framework appropriate for our setting, which allows us to exploit the expressive power of topic models to reduce the effective dimensionality of the mechanism design problem and remove the dependence of its computational, communication and representation complexity on the number of items.

preprint2022arXiv

Score-Guided Intermediate Layer Optimization: Fast Langevin Mixing for Inverse Problems

We prove fast mixing and characterize the stationary distribution of the Langevin Algorithm for inverting random weighted DNN generators. This result extends the work of Hand and Voroninski from efficient inversion to efficient posterior sampling. In practice, to allow for increased expressivity, we propose to do posterior sampling in the latent space of a pre-trained generative model. To achieve that, we train a score-based model in the latent space of a StyleGAN-2 and we use it to solve inverse problems. Our framework, Score-Guided Intermediate Layer Optimization (SGILO), extends prior work by replacing the sparsity regularization with a generative prior in the intermediate layer. Experimentally, we obtain significant improvements over the previous state-of-the-art, especially in the low measurement regime.

preprint2022arXiv

The Complexity of Markov Equilibrium in Stochastic Games

We show that computing approximate stationary Markov coarse correlated equilibria (CCE) in general-sum stochastic games is computationally intractable, even when there are two players, the game is turn-based, the discount factor is an absolute constant, and the approximation is an absolute constant. Our intractability results stand in sharp contrast to normal-form games where exact CCEs are efficiently computable. A fortiori, our results imply that there are no efficient algorithms for learning stationary Markov CCE policies in multi-agent reinforcement learning (MARL), even when the interaction is two-player and turn-based, and both the discount factor and the desired approximation of the learned policies is an absolute constant. In turn, these results stand in sharp contrast to single-agent reinforcement learning (RL) where near-optimal stationary Markov policies can be efficiently learned. Complementing our intractability results for stationary Markov CCEs, we provide a decentralized algorithm (assuming shared randomness among players) for learning a nonstationary Markov CCE policy with polynomial time and sample complexity in all problem parameters. Previous work for learning Markov CCE policies all required exponential time and sample complexity in the number of players.

preprint2021arXiv

Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization

The use of min-max optimization in adversarial training of deep neural network classifiers and training of generative adversarial networks has motivated the study of nonconvex-nonconcave optimization objectives, which frequently arise in these applications. Unfortunately, recent results have established that even approximate first-order stationary points of such objectives are intractable, even under smoothness conditions, motivating the study of min-max objectives with additional structure. We introduce a new class of structured nonconvex-nonconcave min-max optimization problems, proposing a generalization of the extragradient algorithm which provably converges to a stationary point. The algorithm applies not only to Euclidean spaces, but also to general $\ell_p$-normed finite-dimensional real vector spaces. We also discuss its stability under stochastic oracles and provide bounds on its sample complexity. Our iteration complexity and sample complexity bounds either match or improve the best known bounds for the same or less general nonconvex-nonconcave settings, such as those that satisfy variational coherence or in which a weak solution to the associated variational inequality problem is assumed to exist.

preprint2021arXiv

GANs with Conditional Independence Graphs: On Subadditivity of Probability Divergences

Generative Adversarial Networks (GANs) are modern methods to learn the underlying distribution of a data set. GANs have been widely used in sample synthesis, de-noising, domain transfer, etc. GANs, however, are designed in a model-free fashion where no additional information about the underlying distribution is available. In many applications, however, practitioners have access to the underlying independence graph of the variables, either as a Bayesian network or a Markov Random Field (MRF). We ask: how can one use this additional information in designing model-based GANs? In this paper, we provide theoretical foundations to answer this question by studying subadditivity properties of probability divergences, which establish upper bounds on the distance between two high-dimensional distributions by the sum of distances between their marginals over (local) neighborhoods of the graphical structure of the Bayes-net or the MRF. We prove that several popular probability divergences satisfy some notion of subadditivity under mild conditions. These results lead to a principled design of a model-based GAN that uses a set of simple discriminators on the neighborhoods of the Bayes-net/MRF, rather than a giant discriminator on the entire network, providing significant statistical and computational benefits. Our experiments on synthetic and real-world datasets demonstrate the benefits of our principled design of model-based GANs.

preprint2021arXiv

Independent Policy Gradient Methods for Competitive Reinforcement Learning

We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i.e., zero-sum stochastic games). We consider an episodic setting where in each episode, each player independently selects a policy and observes only their own actions and rewards, along with the state. We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule (which is necessary). To the best of our knowledge, this constitutes the first finite-sample convergence result for independent policy gradient methods in competitive RL; prior work has largely focused on centralized, coordinated procedures for equilibrium computation.

preprint2020arXiv

Constant-Expansion Suffices for Compressed Sensing with Generative Priors

Generative neural networks have been empirically found very promising in providing effective structural priors for compressed sensing, since they can be trained to span low-dimensional data manifolds in high-dimensional signal spaces. Despite the non-convexity of the resulting optimization problem, it has also been shown theoretically that, for neural networks with random Gaussian weights, a signal in the range of the network can be efficiently, approximately recovered from a few noisy measurements. However, a major bottleneck of these theoretical guarantees is a network expansivity condition: that each layer of the neural network must be larger than the previous by a logarithmic factor. Our main contribution is to break this strong expansivity assumption, showing that constant expansivity suffices to get efficient recovery algorithms, besides it also being information-theoretically necessary. To overcome the theoretical bottleneck in existing approaches we prove a novel uniform concentration theorem for random functions that might not be Lipschitz but satisfy a relaxed notion which we call "pseudo-Lipschitzness." Using this theorem we can show that a matrix concentration inequality known as the Weight Distribution Condition (WDC), which was previously only known to hold for Gaussian matrices with logarithmic aspect ratio, in fact holds for constant aspect ratios too. Since the WDC is a fundamental matrix concentration inequality in the heart of all existing theoretical guarantees on this problem, our tighter bound immediately yields improvements in all known results in the literature on compressed sensing with deep generative priors, including one-bit recovery, phase retrieval, low-rank matrix recovery, and more.

preprint2020arXiv

Last Iterate is Slower than Averaged Iterate in Smooth Convex-Concave Saddle Point Problems

In this paper we study the smooth convex-concave saddle point problem. Specifically, we analyze the last iterate convergence properties of the Extragradient (EG) algorithm. It is well known that the ergodic (averaged) iterates of EG converge at a rate of $O(1/T)$ (Nemirovski, 2004). In this paper, we show that the last iterate of EG converges at a rate of $O(1/\sqrt{T})$. To the best of our knowledge, this is the first paper to provide a convergence rate guarantee for the last iterate of EG for the smooth convex-concave saddle point problem. Moreover, we show that this rate is tight by proving a lower bound of $Ω(1/\sqrt{T})$ for the last iterate. This lower bound therefore shows a quadratic separation of the convergence rates of ergodic and last iterates in smooth convex-concave saddle point problems.

preprint2020arXiv

Logistic-Regression with peer-group effects via inference in higher order Ising models

Spin glass models, such as the Sherrington-Kirkpatrick, Hopfield and Ising models, are all well-studied members of the exponential family of discrete distributions, and have been influential in a number of application domains where they are used to model correlation phenomena on networks. Conventionally these models have quadratic sufficient statistics and consequently capture correlations arising from pairwise interactions. In this work we study extensions of these to models with higher-order sufficient statistics, modeling behavior on a social network with peer-group effects. In particular, we model binary outcomes on a network as a higher-order spin glass, where the behavior of an individual depends on a linear function of their own vector of covariates and some polynomial function of the behavior of others, capturing peer-group effects. Using a {\em single}, high-dimensional sample from such model our goal is to recover the coefficients of the linear function as well as the strength of the peer-group effects. The heart of our result is a novel approach for showing strong concavity of the log pseudo-likelihood of the model, implying statistical error rate of $\sqrt{d/n}$ for the Maximum Pseudo-Likelihood Estimator (MPLE), where $d$ is the dimensionality of the covariate vectors and $n$ is the size of the network (number of nodes). Our model generalizes vanilla logistic regression as well as the peer-effect models studied in recent works, and our results extend these results to accommodate higher-order interactions.

preprint2020arXiv

Multi-Item Mechanisms without Item-Independence: Learnability via Robustness

We study the sample complexity of learning revenue-optimal multi-item auctions. We obtain the first set of positive results that go beyond the standard but unrealistic setting of item-independence. In particular, we consider settings where bidders' valuations are drawn from correlated distributions that can be captured by Markov Random Fields or Bayesian Networks -- two of the most prominent graphical models. We establish parametrized sample complexity bounds for learning an up-to-$\varepsilon$ optimal mechanism in both models, which scale polynomially in the size of the model, i.e.~the number of items and bidders, and only exponential in the natural complexity measure of the model, namely either the largest in-degree (for Bayesian Networks) or the size of the largest hyper-edge (for Markov Random Fields). We obtain our learnability results through a novel and modular framework that involves first proving a robustness theorem. We show that, given only ``approximate distributions'' for bidder valuations, we can learn a mechanism whose revenue is nearly optimal simultaneously for all ``true distributions'' that are close to the ones we were given in Prokhorov distance. Thus, to learn a good mechanism, it suffices to learn approximate distributions. When item values are independent, learning in Prokhorov distance is immediate, hence our framework directly implies the main result of Gonczarowski and Weinberg. When item values are sampled from more general graphical models, we combine our robustness theorem with novel sample complexity results for learning Markov Random Fields or Bayesian Networks in Prokhorov distance, which may be of independent interest. Finally, in the single-item case, our robustness result can be strengthened to hold under an even weaker distribution distance, the Lévy distance.

preprint2020arXiv

SGD Learns One-Layer Networks in WGANs

Generative adversarial networks (GANs) are a widely used framework for learning generative models. Wasserstein GANs (WGANs), one of the most successful variants of GANs, require solving a minmax optimization problem to global optimality, but are in practice successfully trained using stochastic gradient descent-ascent. In this paper, we show that, when the generator is a one-layer network, stochastic gradient descent-ascent converges to a global solution with polynomial time and sample complexity.

preprint2020arXiv

The Complexity of Constrained Min-Max Optimization

Despite its important applications in Machine Learning, min-max optimization of nonconvex-nonconcave objectives remains elusive. Not only are there no known first-order methods converging even to approximate local min-max points, but the computational complexity of identifying them is also poorly understood. In this paper, we provide a characterization of the computational complexity of the problem, as well as of the limitations of first-order methods in constrained min-max optimization problems with nonconvex-nonconcave objectives and linear constraints. As a warm-up, we show that, even when the objective is a Lipschitz and smooth differentiable function, deciding whether a min-max point exists, in fact even deciding whether an approximate min-max point exists, is NP-hard. More importantly, we show that an approximate local min-max point of large enough approximation is guaranteed to exist, but finding one such point is PPAD-complete. The same is true of computing an approximate fixed point of Gradient Descent/Ascent. An important byproduct of our proof is to establish an unconditional hardness result in the Nemirovsky-Yudin model. We show that, given oracle access to some function $f : P \to [-1, 1]$ and its gradient $\nabla f$, where $P \subseteq [0, 1]^d$ is a known convex polytope, every algorithm that finds a $\varepsilon$-approximate local min-max point needs to make a number of queries that is exponential in at least one of $1/\varepsilon$, $L$, $G$, or $d$, where $L$ and $G$ are respectively the smoothness and Lipschitzness of $f$ and $d$ is the dimension. This comes in sharp contrast to minimization problems, where finding approximate local minima in the same setting can be done with Projected Gradient Descent using $O(L/\varepsilon)$ many queries. Our result is the first to show an exponential separation between these two fundamental optimization problems.

preprint2020arXiv

Truncated Linear Regression in High Dimensions

As in standard linear regression, in truncated linear regression, we are given access to observations $(A_i, y_i)_i$ whose dependent variable equals $y_i= A_i^{\rm T} \cdot x^* + η_i$, where $x^*$ is some fixed unknown vector of interest and $η_i$ is independent noise; except we are only given an observation if its dependent variable $y_i$ lies in some "truncation set" $S \subset \mathbb{R}$. The goal is to recover $x^*$ under some favorable conditions on the $A_i$'s and the noise distribution. We prove that there exists a computationally and statistically efficient method for recovering $k$-sparse $n$-dimensional vectors $x^*$ from $m$ truncated samples, which attains an optimal $\ell_2$ reconstruction error of $O(\sqrt{(k \log n)/m})$. As a corollary, our guarantees imply a computationally efficient and information-theoretically optimal algorithm for compressed sensing with truncation, which may arise from measurement saturation effects. Our result follows from a statistical and computational analysis of the Stochastic Gradient Descent (SGD) algorithm for solving a natural adaptation of the LASSO optimization problem that accommodates truncation. This generalizes the works of both: (1) [Daskalakis et al. 2018], where no regularization is needed due to the low-dimensionality of the data, and (2) [Wainright 2009], where the objective function is simple due to the absence of truncation. In order to deal with both truncation and high-dimensionality at the same time, we develop new techniques that not only generalize the existing ones but we believe are of independent interest.

preprint2016arXiv

A Size-Free CLT for Poisson Multinomials and its Applications

An $(n,k)$-Poisson Multinomial Distribution (PMD) is the distribution of the sum of $n$ independent random vectors supported on the set ${\cal B}_k=\{e_1,\ldots,e_k\}$ of standard basis vectors in $\mathbb{R}^k$. We show that any $(n,k)$-PMD is ${\rm poly}\left({k\over σ}\right)$-close in total variation distance to the (appropriately discretized) multi-dimensional Gaussian with the same first two moments, removing the dependence on $n$ from the Central Limit Theorem of Valiant and Valiant. Interestingly, our CLT is obtained by bootstrapping the Valiant-Valiant CLT itself through the structural characterization of PMDs shown in recent work by Daskalakis, Kamath, and Tzamos. In turn, our stronger CLT can be leveraged to obtain an efficient PTAS for approximate Nash equilibria in anonymous games, significantly improving the state of the art, and matching qualitatively the running time dependence on $n$ and $1/\varepsilon$ of the best known algorithm for two-strategy anonymous games. Our new CLT also enables the construction of covers for the set of $(n,k)$-PMDs, which are proper and whose size is shown to be essentially optimal. Our cover construction combines our CLT with the Shapley-Folkman theorem and recent sparsification results for Laplacian matrices by Batson, Spielman, and Srivastava. Our cover size lower bound is based on an algebraic geometric construction. Finally, leveraging the structural properties of the Fourier spectrum of PMDs we show that these distributions can be learned from $O_k(1/\varepsilon^2)$ samples in ${\rm poly}_k(1/\varepsilon)$-time, removing the quasi-polynomial dependence of the running time on $1/\varepsilon$ from the algorithm of Daskalakis, Kamath, and Tzamos.

preprint2016arXiv

Learning in Auctions: Regret is Hard, Envy is Easy

A line of recent work provides welfare guarantees of simple combinatorial auction formats, such as selling m items via simultaneous second price auctions (SiSPAs) (Christodoulou et al. 2008, Bhawalkar and Roughgarden 2011, Feldman et al. 2013). These guarantees hold even when the auctions are repeatedly executed and players use no-regret learning algorithms. Unfortunately, off-the-shelf no-regret algorithms for these auctions are computationally inefficient as the number of actions is exponential. We show that this obstacle is insurmountable: there are no polynomial-time no-regret algorithms for SiSPAs, unless RP$\supseteq$ NP, even when the bidders are unit-demand. Our lower bound raises the question of how good outcomes polynomially-bounded bidders may discover in such auctions. To answer this question, we propose a novel concept of learning in auctions, termed "no-envy learning." This notion is founded upon Walrasian equilibrium, and we show that it is both efficiently implementable and results in approximately optimal welfare, even when the bidders have fractionally subadditive (XOS) valuations (assuming demand oracles) or coverage valuations (without demand oracles). No-envy learning outcomes are a relaxation of no-regret outcomes, which maintain their approximate welfare optimality while endowing them with computational tractability. Our results extend to other auction formats that have been studied in the literature via the smoothness paradigm. Our results for XOS valuations are enabled by a novel Follow-The-Perturbed-Leader algorithm for settings where the number of experts is infinite, and the payoff function of the learner is non-linear. This algorithm has applications outside of auction settings, such as in security games. Our result for coverage valuations is based on a novel use of convex rounding schemes and a reduction to online convex optimization.

preprint2016arXiv

Revenue Maximization and Ex-Post Budget Constraints

We consider the problem of a revenue-maximizing seller with m items for sale to n additive bidders with hard budget constraints, assuming that the seller has some prior distribution over bidder values and budgets. The prior may be correlated across items and budgets of the same bidder, but is assumed independent across bidders. We target mechanisms that are Bayesian Incentive Compatible, but that are ex-post Individually Rational and ex-post budget respecting. Virtually no such mechanisms are known that satisfy all these conditions and guarantee any revenue approximation, even with just a single item. We provide a computationally efficient mechanism that is a $3$-approximation with respect to all BIC, ex-post IR, and ex-post budget respecting mechanisms. Note that the problem is NP-hard to approximate better than a factor of 16/15, even in the case where the prior is a point mass \cite{ChakrabartyGoel}. We further characterize the optimal mechanism in this setting, showing that it can be interpreted as a distribution over virtual welfare maximizers. We prove our results by making use of a black-box reduction from mechanism to algorithm design developed by \cite{CaiDW13b}. Our main technical contribution is a computationally efficient $3$-approximation algorithm for the algorithmic problem that results by an application of their framework to this problem. The algorithmic problem has a mixed-sign objective and is NP-hard to optimize exactly, so it is surprising that a computationally efficient approximation is possible at all. In the case of a single item ($m=1$), the algorithmic problem can be solved exactly via exhaustive search, leading to a computationally efficient exact algorithm and a stronger characterization of the optimal mechanism as a distribution over virtual value maximizers.

preprint2016arXiv

Sequential Mechanisms with ex-post Participation Guarantees

We provide a characterization of revenue-optimal dynamic mechanisms in settings where a monopolist sells k items over k periods to a buyer who realizes his value for item i in the beginning of period i. We require that the mechanism satisfies a strong individual rationality constraint, requiring that the stage utility of each agent be positive during each period. We show that the optimum mechanism can be computed by solving a nested sequence of static (single-period) mechanisms that optimize a tradeoff between the surplus of the allocation and the buyer's utility. We also provide a simple dynamic mechanism that obtains at least half of the optimal revenue. The mechanism either ignores history and posts the optimal monopoly price in each period, or allocates with a probability that is independent of the current report of the agent and is based only on previous reports. Our characterization extends to multi-agent auctions. We also formulate a discounted infinite horizon version of the problem, where we study the performance of "Markov mechanisms."

preprint2015arXiv

Learning Poisson Binomial Distributions

We consider a basic problem in unsupervised learning: learning an unknown \emph{Poisson Binomial Distribution}. A Poisson Binomial Distribution (PBD) over $\{0,1,\dots,n\}$ is the distribution of a sum of $n$ independent Bernoulli random variables which may have arbitrary, potentially non-equal, expectations. These distributions were first studied by S. Poisson in 1837 \cite{Poisson:37} and are a natural $n$-parameter generalization of the familiar Binomial Distribution. Surprisingly, prior to our work this basic learning problem was poorly understood, and known results for it were far from optimal. We essentially settle the complexity of the learning problem for this basic class of distributions. As our first main result we give a highly efficient algorithm which learns to $\eps$-accuracy (with respect to the total variation distance) using $\tilde{O}(1/\eps^3)$ samples \emph{independent of $n$}. The running time of the algorithm is \emph{quasilinear} in the size of its input data, i.e., $\tilde{O}(\log(n)/\eps^3)$ bit-operations. (Observe that each draw from the distribution is a $\log(n)$-bit string.) Our second main result is a {\em proper} learning algorithm that learns to $\eps$-accuracy using $\tilde{O}(1/\eps^2)$ samples, and runs in time $(1/\eps)^{\poly (\log (1/\eps))} \cdot \log n$. This is nearly optimal, since any algorithm {for this problem} must use $Ω(1/\eps^2)$ samples. We also give positive and negative results for some extensions of this learning problem to weighted sums of independent Bernoulli random variables.

preprint2015arXiv

Mechanism Design via Optimal Transport

Optimal mechanisms have been provided in quite general multi-item settings, as long as each bidder's type distribution is given explicitly by listing every type in the support along with its associated probability. In the implicit setting, e.g. when the bidders have additive valuations with independent and/or continuous values for the items, these results do not apply, and it was recently shown that exact revenue optimization is intractable, even when there is only one bidder. Even for item distributions with special structure, optimal mechanisms have been surprisingly rare and the problem is challenging even in the two-item case. In this paper, we provide a framework for designing optimal mechanisms using optimal transport theory and duality theory. We instantiate our framework to obtain conditions under which only pricing the grand bundle is optimal in multi-item settings (complementing the work of [Manelli and Vincent 2006], as well as to characterize optimal two-item mechanisms. We use our results to derive closed-form descriptions of the optimal mechanism in several two-item settings, exhibiting also a setting where a continuum of lotteries is necessary for revenue optimization but a closed-form representation of the mechanism can still be found efficiently using our framework.

preprint2015arXiv

On the Structure, Covering, and Learning of Poisson Multinomial Distributions

An $(n,k)$-Poisson Multinomial Distribution (PMD) is the distribution of the sum of $n$ independent random vectors supported on the set ${\cal B}_k=\{e_1,\ldots,e_k\}$ of standard basis vectors in $\mathbb{R}^k$. We prove a structural characterization of these distributions, showing that, for all $\varepsilon >0$, any $(n, k)$-Poisson multinomial random vector is $\varepsilon$-close, in total variation distance, to the sum of a discretized multidimensional Gaussian and an independent $(\text{poly}(k/\varepsilon), k)$-Poisson multinomial random vector. Our structural characterization extends the multi-dimensional CLT of Valiant and Valiant, by simultaneously applying to all approximation requirements $\varepsilon$. In particular, it overcomes factors depending on $\log n$ and, importantly, the minimum eigenvalue of the PMD's covariance matrix from the distance to a multidimensional Gaussian random variable. We use our structural characterization to obtain an $\varepsilon$-cover, in total variation distance, of the set of all $(n, k)$-PMDs, significantly improving the cover size of Daskalakis and Papadimitriou, and obtaining the same qualitative dependence of the cover size on $n$ and $\varepsilon$ as the $k=2$ cover of Daskalakis and Papadimitriou. We further exploit this structure to show that $(n,k)$-PMDs can be learned to within $\varepsilon$ in total variation distance from $\tilde{O}_k(1/\varepsilon^2)$ samples, which is near-optimal in terms of dependence on $\varepsilon$ and independent of $n$. In particular, our result generalizes the single-dimensional result of Daskalakis, Diakonikolas, and Servedio for Poisson Binomials to arbitrary dimension.

preprint2015arXiv

Optimal Pricing is Hard

We show that computing the revenue-optimal deterministic auction in unit-demand single-buyer Bayesian settings, i.e. the optimal item-pricing, is computationally hard even in single-item settings where the buyer's value distribution is a sum of independently distributed attributes, or multi-item settings where the buyer's values for the items are independent. We also show that it is intractable to optimally price the grand bundle of multiple items for an additive bidder whose values for the items are independent. These difficulties stem from implicit definitions of a value distribution. We provide three instances of how different properties of implicit distributions can lead to intractability: the first is a #P-hardness proof, while the remaining two are reductions from the SQRT-SUM problem of Garey, Graham, and Johnson. While simple pricing schemes can oftentimes approximate the best scheme in revenue, they can have drastically different underlying structure. We argue therefore that either the specification of the input distribution must be highly restricted in format, or it is necessary for the goal to be mere approximation to the optimal scheme's revenue instead of computing properties of the scheme itself.

preprint2015arXiv

Optimal Testing for Properties of Distributions

Given samples from an unknown distribution $p$, is it possible to distinguish whether $p$ belongs to some class of distributions $\mathcal{C}$ versus $p$ being far from every distribution in $\mathcal{C}$? This fundamental question has received tremendous attention in statistics, focusing primarily on asymptotic analysis, and more recently in information theory and theoretical computer science, where the emphasis has been on small sample size and computational complexity. Nevertheless, even for basic properties of distributions such as monotonicity, log-concavity, unimodality, independence, and monotone-hazard rate, the optimal sample complexity is unknown. We provide a general approach via which we obtain sample-optimal and computationally efficient testers for all these distribution families. At the core of our approach is an algorithm which solves the following problem: Given samples from an unknown distribution $p$, and a known distribution $q$, are $p$ and $q$ close in $χ^2$-distance, or far in total variation distance? The optimality of our testers is established by providing matching lower bounds with respect to both $n$ and $\varepsilon$. Finally, a necessary building block for our testers and an important byproduct of our work are the first known computationally efficient proper learners for discrete log-concave and monotone hazard rate distributions.

preprint2015arXiv

Optimum Statistical Estimation with Strategic Data Sources

We propose an optimum mechanism for providing monetary incentives to the data sources of a statistical estimator such as linear regression, so that high quality data is provided at low cost, in the sense that the sum of payments and estimation error is minimized. The mechanism applies to a broad range of estimators, including linear and polynomial regression, kernel regression, and, under some additional assumptions, ridge regression. It also generalizes to several objectives, including minimizing estimation error subject to budget constraints. Besides our concrete results for regression problems, we contribute a mechanism design framework through which to design and analyze statistical estimators whose examples are supplied by workers with cost for labeling said examples.

preprint2014arXiv

A Counter-Example to Karlin's Strong Conjecture for Fictitious Play

Fictitious play is a natural dynamic for equilibrium play in zero-sum games, proposed by [Brown 1949], and shown to converge by [Robinson 1951]. Samuel Karlin conjectured in 1959 that fictitious play converges at rate $O(1/\sqrt{t})$ with the number of steps $t$. We disprove this conjecture showing that, when the payoff matrix of the row player is the $n \times n$ identity matrix, fictitious play may converge with rate as slow as $Ω(t^{-1/n})$.

preprint2014arXiv

Bayesian Truthful Mechanisms for Job Scheduling from Bi-criterion Approximation Algorithms

We provide polynomial-time approximately optimal Bayesian mechanisms for makespan minimization on unrelated machines as well as for max-min fair allocations of indivisible goods, with approximation factors of $2$ and $\min\{m-k+1, \tilde{O}(\sqrt{k})\}$ respectively, matching the approximation ratios of best known polynomial-time \emph{algorithms} (for max-min fairness, the latter claim is true for certain ratios of the number of goods $m$ to people $k$). Our mechanisms are obtained by establishing a polynomial-time approximation-sensitive reduction from the problem of designing approximately optimal {\em mechanisms} for some arbitrary objective ${\cal O}$ to that of designing bi-criterion approximation {\em algorithms} for the same objective ${\cal O}$ plus a linear allocation cost term. Our reduction is itself enabled by extending the celebrated "equivalence of separation and optimization"[GLSS81,KP80] to also accommodate bi-criterion approximations. Moreover, to apply the reduction to the specific problems of makespan and max-min fairness we develop polynomial-time bi-criterion approximation algorithms for makespan minimization with costs and max-min fairness with costs, adapting the algorithms of [ST93], [BD05] and [AS07] to the type of bi-criterion approximation that is required by the reduction.

preprint2014arXiv

Extreme-Value Theorems for Optimal Multidimensional Pricing

We provide a near-optimal, computationally efficient algorithm for the unit-demand pricing problem, where a seller wants to price n items to optimize revenue against a unit-demand buyer whose values for the items are independently drawn from known distributions. For any chosen accuracy eps>0 and item values bounded in [0,1], our algorithm achieves revenue that is optimal up to an additive error of at most eps, in polynomial time. For values sampled from Monotone Hazard Rate (MHR) distributions, we achieve a (1-eps)-fraction of the optimal revenue in polynomial time, while for values sampled from regular distributions the same revenue guarantees are achieved in quasi-polynomial time. Our algorithm for bounded distributions applies probabilistic techniques to understand the statistical properties of revenue distributions, obtaining a reduction in the search space of the algorithm via dynamic programming. Adapting this approach to MHR and regular distributions requires the proof of novel extreme value theorems for such distributions. As a byproduct, our techniques establish structural properties of approximately-optimal and near-optimal solutions. We show that, for values independently distributed according to MHR distributions, pricing all items at the same price achieves a constant fraction of the optimal revenue. Moreover, for all eps >0, g(1/eps) distinct prices suffice to obtain a (1-eps)-fraction of the optimal revenue, where g(1/eps) is quadratic in 1/eps and independent of n. Similarly, for all eps>0 and n>0, at most g(1/(eps log n)) distinct prices suffice if the values are independently distributed according to regular distributions, where g() is a polynomial function. Finally, when the values are i.i.d. from some MHR distribution, we show that, if n is a sufficiently large function of 1/eps, a single price suffices to achieve a (1-eps)-fraction of the optimal revenue.

preprint2014arXiv

Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians

We provide an algorithm for properly learning mixtures of two single-dimensional Gaussians without any separability assumptions. Given $\tilde{O}(1/\varepsilon^2)$ samples from an unknown mixture, our algorithm outputs a mixture that is $\varepsilon$-close in total variation distance, in time $\tilde{O}(1/\varepsilon^5)$. Our sample complexity is optimal up to logarithmic factors, and significantly improves upon both Kalai et al., whose algorithm has a prohibitive dependence on $1/\varepsilon$, and Feldman et al., whose algorithm requires bounds on the mixture parameters and depends pseudo-polynomially in these parameters. One of our main contributions is an improved and generalized algorithm for selecting a good candidate distribution from among competing hypotheses. Namely, given a collection of $N$ hypotheses containing at least one candidate that is $\varepsilon$-close to an unknown distribution, our algorithm outputs a candidate which is $O(\varepsilon)$-close to the distribution. The algorithm requires ${O}(\log{N}/\varepsilon^2)$ samples from the unknown distribution and ${O}(N \log N/\varepsilon^2)$ time, which improves previous such results (such as the Scheffé estimator) from a quadratic dependence of the running time on $N$ to quasilinear. Given the wide use of such results for the purpose of hypothesis selection, our improved algorithm implies immediate improvements to any such use.

preprint2014arXiv

Learning $k$-Modal Distributions via Testing

A $k$-modal probability distribution over the discrete domain $\{1,...,n\}$ is one whose histogram has at most $k$ "peaks" and "valleys." Such distributions are natural generalizations of monotone ($k=0$) and unimodal ($k=1$) probability distributions, which have been intensively studied in probability theory and statistics. In this paper we consider the problem of \emph{learning} (i.e., performing density estimation of) an unknown $k$-modal distribution with respect to the $L_1$ distance. The learning algorithm is given access to independent samples drawn from an unknown $k$-modal distribution $p$, and it must output a hypothesis distribution $\widehat{p}$ such that with high probability the total variation distance between $p$ and $\widehat{p}$ is at most $ε.$ Our main goal is to obtain \emph{computationally efficient} algorithms for this problem that use (close to) an information-theoretically optimal number of samples. We give an efficient algorithm for this problem that runs in time $\mathrm{poly}(k,\log(n),1/ε)$. For $k \leq \tilde{O}(\log n)$, the number of samples used by our algorithm is very close (within an $\tilde{O}(\log(1/ε))$ factor) to being information-theoretically optimal. Prior to this work computationally efficient algorithms were known only for the cases $k=0,1$ \cite{Birge:87b,Birge:97}. A novel feature of our approach is that our learning algorithm crucially uses a new algorithm for \emph{property testing of probability distributions} as a key subroutine. The learning algorithm uses the property tester to efficiently decompose the $k$-modal distribution into $k$ (near-)monotone distributions, which are easier to learn.

preprint2014arXiv

Sparse Covers for Sums of Indicators

For all $n, ε>0$, we show that the set of Poisson Binomial distributions on $n$ variables admits a proper $ε$-cover in total variation distance of size $n^2+n \cdot (1/ε)^{O(\log^2 (1/ε))}$, which can also be computed in polynomial time. We discuss the implications of our construction for approximation algorithms and the computation of approximate Nash equilibria in anonymous games.

preprint2014arXiv

Testing Poisson Binomial Distributions

A Poisson Binomial distribution over $n$ variables is the distribution of the sum of $n$ independent Bernoullis. We provide a sample near-optimal algorithm for testing whether a distribution $P$ supported on $\{0,...,n\}$ to which we have sample access is a Poisson Binomial distribution, or far from all Poisson Binomial distributions. The sample complexity of our algorithm is $O(n^{1/4})$ to which we provide a matching lower bound. We note that our sample complexity improves quadratically upon that of the naive "learn followed by tolerant-test" approach, while instance optimal identity testing [VV14] is not applicable since we are looking to simultaneously test against a whole family of distributions.

preprint2013arXiv

A Polynomial-time Approximation Scheme for Fault-tolerant Distributed Storage

We consider a problem which has received considerable attention in systems literature because of its applications to routing in delay tolerant networks and replica placement in distributed storage systems. In abstract terms the problem can be stated as follows: Given a random variable $X$ generated by a known product distribution over $\{0,1\}^n$ and a target value $0 \leq θ\leq 1$, output a non-negative vector $w$, with $\|w\|_1 \le 1$, which maximizes the probability of the event $w \cdot X \ge θ$. This is a challenging non-convex optimization problem for which even computing the value $\Pr[w \cdot X \ge θ]$ of a proposed solution vector $w$ is #P-hard. We provide an additive EPTAS for this problem which, for constant-bounded product distributions, runs in $ \poly(n) \cdot 2^{\poly(1/\eps)}$ time and outputs an $\eps$-approximately optimal solution vector $w$ for this problem. Our approach is inspired by, and extends, recent structural results from the complexity-theoretic study of linear threshold functions. Furthermore, in spite of the objective function being non-smooth, we give a \emph{unicriterion} PTAS while previous work for such objective functions has typically led to a \emph{bicriterion} PTAS. We believe our techniques may be applicable to get unicriterion PTAS for other non-smooth objective functions.

preprint2013arXiv

Alignment-free phylogenetic reconstruction: Sample complexity via a branching process analysis

We present an efficient phylogenetic reconstruction algorithm allowing insertions and deletions which provably achieves a sequence-length requirement (or sample complexity) growing polynomially in the number of taxa. Our algorithm is distance-based, that is, it relies on pairwise sequence comparisons. More importantly, our approach largely bypasses the difficult problem of multiple sequence alignment.

preprint2013arXiv

How good is the Chord algorithm?

The Chord algorithm is a popular, simple method for the succinct approximation of curves, which is widely used, under different names, in a variety of areas, such as, multiobjective and parametric optimization, computational geometry, and graphics. We analyze the performance of the Chord algorithm, as compared to the optimal approximation that achieves a desired accuracy with the minimum number of points. We prove sharp upper and lower bounds, both in the worst case and average case setting.

preprint2013arXiv

Reducing Revenue to Welfare Maximization: Approximation Algorithms and other Generalizations

It was recently shown in [http://arxiv.org/abs/1207.5518] that revenue optimization can be computationally efficiently reduced to welfare optimization in all multi-dimensional Bayesian auction problems with arbitrary (possibly combinatorial) feasibility constraints and independent additive bidders with arbitrary (possibly combinatorial) demand constraints. This reduction provides a poly-time solution to the optimal mechanism design problem in all auction settings where welfare optimization can be solved efficiently, but it is fragile to approximation and cannot provide solutions to settings where welfare maximization can only be tractably approximated. In this paper, we extend the reduction to accommodate approximation algorithms, providing an approximation preserving reduction from (truthful) revenue maximization to (not necessarily truthful) welfare maximization. The mechanisms output by our reduction choose allocations via black-box calls to welfare approximation on randomly selected inputs, thereby generalizing also our earlier structural results on optimal multi-dimensional mechanisms to approximately optimal mechanisms. Unlike [http://arxiv.org/abs/1207.5518], our results here are obtained through novel uses of the Ellipsoid algorithm and other optimization techniques over {\em non-convex regions}.

preprint2013arXiv

The Complexity of Optimal Mechanism Design

Myerson's seminal work provides a computationally efficient revenue-optimal auction for selling one item to multiple bidders. Generalizing this work to selling multiple items at once has been a central question in economics and algorithmic game theory, but its complexity has remained poorly understood. We answer this question by showing that a revenue-optimal auction in multi-item settings cannot be found and implemented computationally efficiently, unless ZPP contains P^#P. This is true even for a single additive bidder whose values for the items are independently distributed on two rational numbers with rational probabilities. Our result is very general: we show that it is hard to compute any encoding of an optimal auction of any format (direct or indirect, truthful or non-truthful) that can be implemented in expected polynomial time. In particular, under well-believed complexity-theoretic assumptions, revenue-optimization in very simple multi-item settings can only be tractably approximated. We note that our hardness result applies to randomized mechanisms in a very simple setting, and is not an artifact of introducing combinatorial structure to the problem by allowing correlation among item values, introducing combinatorial valuations, or requiring the mechanism to be deterministic (whose structure is readily combinatorial). Our proof is enabled by a flow-interpretation of the solutions of an exponential-size linear program for revenue maximization with an additional supermodularity constraint.

preprint2013arXiv

Understanding Incentives: Mechanism Design becomes Algorithm Design

We provide a computationally efficient black-box reduction from mechanism design to algorithm design in very general settings. Specifically, we give an approximation-preserving reduction from truthfully maximizing \emph{any} objective under \emph{arbitrary} feasibility constraints with \emph{arbitrary} bidder types to (not necessarily truthfully) maximizing the same objective plus virtual welfare (under the same feasibility constraints). Our reduction is based on a fundamentally new approach: we describe a mechanism's behavior indirectly only in terms of the expected value it awards bidders for certain behavior, and never directly access the allocation rule at all. Applying our new approach to revenue, we exhibit settings where our reduction holds \emph{both ways}. That is, we also provide an approximation-sensitive reduction from (non-truthfully) maximizing virtual welfare to (truthfully) maximizing revenue, and therefore the two problems are computationally equivalent. With this equivalence in hand, we show that both problems are NP-hard to approximate within any polynomial factor, even for a single monotone submodular bidder. We further demonstrate the applicability of our reduction by providing a truthful mechanism maximizing fractional max-min fairness. This is the first instance of a truthful mechanism that optimizes a non-linear objective.

preprint2012arXiv

Connectivity and equilibrium in random games

We study how the structure of the interaction graph of a game affects the existence of pure Nash equilibria. In particular, for a fixed interaction graph, we are interested in whether there are pure Nash equilibria arising when random utility tables are assigned to the players. We provide conditions for the structure of the graph under which equilibria are likely to exist and complementary conditions which make the existence of equilibria highly unlikely. Our results have immediate implications for many deterministic graphs and generalize known results for random games on the complete graph. In particular, our results imply that the probability that bounded degree graphs have pure Nash equilibria is exponentially small in the size of the graph and yield a simple algorithm that finds small nonexistence certificates for a large family of graphs. Then we show that in any strongly connected graph of n vertices with expansion $(1+Ω(1))\log_2(n)$ the distribution of the number of equilibria approaches the Poisson distribution with parameter 1, asymptotically as $n \to +\infty$.

preprint2012arXiv

Optimal Multi-Dimensional Mechanism Design: Reducing Revenue to Welfare Maximization

We provide a reduction from revenue maximization to welfare maximization in multi-dimensional Bayesian auctions with arbitrary (possibly combinatorial) feasibility constraints and independent bidders with arbitrary (possibly combinatorial) demand constraints, appropriately extending Myerson's result to this setting. We also show that every feasible Bayesian auction can be implemented as a distribution over virtual VCG allocation rules. A virtual VCG allocation rule has the following simple form: Every bidder's type t_i is transformed into a virtual type f_i(t_i), via a bidder-specific function. Then, the allocation maximizing virtual welfare is chosen. Using this characterization, we show how to find and run the revenue-optimal auction given only black box access to an implementation of the VCG allocation rule. We generalize this result to arbitrarily correlated bidders, introducing the notion of a second-order VCG allocation rule. We obtain our reduction from revenue to welfare optimization via two algorithmic results on reduced forms in settings with arbitrary feasibility and demand constraints. First, we provide a separation oracle for determining feasibility of a reduced form. Second, we provide a geometric algorithm to decompose any feasible reduced form into a distribution over virtual VCG allocation rules. In addition, we show how to execute both algorithms given only black box access to an implementation of the VCG allocation rule. Our results are computationally efficient for all multi-dimensional settings where the bidders are additive. In this case, our mechanisms run in time polynomial in the total number of bidder types, but not type profiles. For generic correlated distributions, this is the natural description complexity of the problem. The runtime can be further improved to poly(#items, #bidders) in item-symmetric settings by making use of recent techniques.

preprint2011arXiv

Learning transformed product distributions

We consider the problem of learning an unknown product distribution $X$ over $\{0,1\}^n$ using samples $f(X)$ where $f$ is a \emph{known} transformation function. Each choice of a transformation function $f$ specifies a learning problem in this framework. Information-theoretic arguments show that for every transformation function $f$ the corresponding learning problem can be solved to accuracy $\eps$, using $\tilde{O}(n/\eps^2)$ examples, by a generic algorithm whose running time may be exponential in $n.$ We show that this learning problem can be computationally intractable even for constant $\eps$ and rather simple transformation functions. Moreover, the above sample complexity bound is nearly optimal for the general problem, as we give a simple explicit linear transformation function $f(x)=w \cdot x$ with integer weights $w_i \leq n$ and prove that the corresponding learning problem requires $Ω(n)$ samples. As our main positive result we give a highly efficient algorithm for learning a sum of independent unknown Bernoulli random variables, corresponding to the transformation function $f(x)= \sum_{i=1}^n x_i$. Our algorithm learns to $\eps$-accuracy in poly$(n)$ time, using a surprising poly$(1/\eps)$ number of samples that is independent of $n.$ We also give an efficient algorithm that uses $\log n \cdot \poly(1/\eps)$ samples but has running time that is only $\poly(\log n, 1/\eps).$

preprint2011arXiv

On Optimal Multi-Dimensional Mechanism Design

We efficiently solve the optimal multi-dimensional mechanism design problem for independent bidders with arbitrary demand constraints when either the number of bidders is a constant or the number of items is a constant. In the first setting, we need that each bidder's values for the items are sampled from a possibly correlated, item-symmetric distribution, allowing different distributions for each bidder. In the second setting, we allow the values of each bidder for the items to be arbitrarily correlated, but assume that the distribution of bidder types is bidder-symmetric. For all eps>0, we obtain an additive eps-approximation, when the value distributions are bounded, or a multiplicative (1-eps)-approximation when the value distributions are unbounded, but satisfy the Monotone Hazard Rate condition, covering a widely studied class of distributions in Economics. Our runtime is polynomial in max{#items,#bidders}, and not the size of the support of the joint distribution of all bidders' values for all items, which is typically exponential in both the number of items and the number of bidders. Our mechanisms are randomized, explicitly price bundles, and can sometimes accommodate budget constraints. Our results are enabled by establishing several new tools and structural properties of Bayesian mechanisms. We provide a symmetrization technique turning any truthful mechanism into one that has the same revenue and respects all symmetries in the underlying value distributions. We also prove that item-symmetric mechanisms satisfy a natural monotonicity property which, unlike cyclic-monotonicity, can be harnessed algorithmically. Finally, we provide a technique that turns any given eps-BIC mechanism (i.e. one where incentive constraints are violated by eps) into a truly-BIC mechanism at the cost of O(sqrt{eps}) revenue. We expect our tools to be used beyond the settings we consider here.

preprint2011arXiv

Testing $k$-Modal Distributions: Optimal Algorithms via Reductions

We give highly efficient algorithms, and almost matching lower bounds, for a range of basic statistical problems that involve testing and estimating the L_1 distance between two k-modal distributions $p$ and $q$ over the discrete domain $\{1,\dots,n\}$. More precisely, we consider the following four problems: given sample access to an unknown k-modal distribution $p$, Testing identity to a known or unknown distribution: 1. Determine whether $p = q$ (for an explicitly given k-modal distribution $q$) versus $p$ is $\eps$-far from $q$; 2. Determine whether $p=q$ (where $q$ is available via sample access) versus $p$ is $\eps$-far from $q$; Estimating $L_1$ distance ("tolerant testing'') against a known or unknown distribution: 3. Approximate $d_{TV}(p,q)$ to within additive $\eps$ where $q$ is an explicitly given k-modal distribution $q$; 4. Approximate $d_{TV}(p,q)$ to within additive $\eps$ where $q$ is available via sample access. For each of these four problems we give sub-logarithmic sample algorithms, that we show are tight up to additive $\poly(k)$ and multiplicative $\polylog\log n+\polylog k$ factors. Thus our bounds significantly improve the previous results of \cite{BKR:04}, which were for testing identity of distributions (items (1) and (2) above) in the special cases k=0 (monotone distributions) and k=1 (unimodal distributions) and required $O((\log n)^3)$ samples. As our main conceptual contribution, we introduce a new reduction-based approach for distribution-testing problems that lets us obtain all the above results in a unified way. Roughly speaking, this approach enables us to transform various distribution testing problems for k-modal distributions over $\{1,\dots,n\}$ to the corresponding distribution testing problems for unrestricted distributions over a much smaller domain $\{1,\dots,\ell\}$ where $\ell = O(k \log n).$

preprint2009arXiv

Evolutionary Trees and the Ising Model on the Bethe Lattice: a Proof of Steel's Conjecture

A major task of evolutionary biology is the reconstruction of phylogenetic trees from molecular data. The evolutionary model is given by a Markov chain on a tree. Given samples from the leaves of the Markov chain, the goal is to reconstruct the leaf-labelled tree. It is well known that in order to reconstruct a tree on $n$ leaves, sample sequences of length $Ω(\log n)$ are needed. It was conjectured by M. Steel that for the CFN/Ising evolutionary model, if the mutation probability on all edges of the tree is less than $p^{\ast} = (\sqrt{2}-1)/2^{3/2}$, then the tree can be recovered from sequences of length $O(\log n)$. The value $p^{\ast}$ is given by the transition point for the extremality of the free Gibbs measure for the Ising model on the binary tree. Steel's conjecture was proven by the second author in the special case where the tree is "balanced." The second author also proved that if all edges have mutation probability larger than $p^{\ast}$ then the length needed is $n^{Ω(1)}$. Here we show that Steel's conjecture holds true for general trees by giving a reconstruction algorithm that recovers the tree from $O(\log n)$-length sequences when the mutation probabilities are discretized and less than $p^\ast$. Our proof and results demonstrate that extremality of the free Gibbs measure on the infinite binary tree, which has been studied before in probability, statistical physics and computer science, determines how distinguishable are Gibbs measures on finite binary trees.

preprint2009arXiv

Global Alignment of Molecular Sequences via Ancestral State Reconstruction

Molecular phylogenetic techniques do not generally account for such common evolutionary events as site insertions and deletions (known as indels). Instead tree building algorithms and ancestral state inference procedures typically rely on substitution-only models of sequence evolution. In practice these methods are extended beyond this simplified setting with the use of heuristics that produce global alignments of the input sequences--an important problem which has no rigorous model-based solution. In this paper we consider a new version of the multiple sequence alignment in the context of stochastic indel models. More precisely, we introduce the following {\em trace reconstruction problem on a tree} (TRPT): a binary sequence is broadcast through a tree channel where we allow substitutions, deletions, and insertions; we seek to reconstruct the original sequence from the sequences received at the leaves of the tree. We give a recursive procedure for this problem with strong reconstruction guarantees at low mutation rates, providing also an alignment of the sequences at the leaves of the tree. The TRPT problem without indels has been studied in previous work (Mossel 2004, Daskalakis et al. 2006) as a bootstrapping step towards obtaining optimal phylogenetic reconstruction methods. The present work sets up a framework for extending these works to evolutionary models with indels.

preprint2009arXiv

Phylogenies without Branch Bounds: Contracting the Short, Pruning the Deep

We introduce a new phylogenetic reconstruction algorithm which, unlike most previous rigorous inference techniques, does not rely on assumptions regarding the branch lengths or the depth of the tree. The algorithm returns a forest which is guaranteed to contain all edges that are: 1) sufficiently long and 2) sufficiently close to the leaves. How much of the true tree is recovered depends on the sequence length provided. The algorithm is distance-based and runs in polynomial time.

preprint2008arXiv

Discretized Multinomial Distributions and Nash Equilibria in Anonymous Games

We show that there is a polynomial-time approximation scheme for computing Nash equilibria in anonymous games with any fixed number of strategies (a very broad and important class of games), extending the two-strategy result of Daskalakis and Papadimitriou 2007. The approximation guarantee follows from a probabilistic result of more general interest: The distribution of the sum of n independent unit vectors with values ranging over {e1, e2, ...,ek}, where ei is the unit vector along dimension i of the k-dimensional Euclidean space, can be approximated by the distribution of the sum of another set of independent unit vectors whose probabilities of obtaining each value are multiples of 1/z for some integer z, and so that the variational distance of the two distributions is at most eps, where eps is bounded by an inverse polynomial in z and a function of k, but with no dependence on n. Our probabilistic result specifies the construction of a surprisingly sparse eps-cover -- under the total variation distance -- of the set of distributions of sums of independent unit vectors, which is of interest on its own right.

preprint2008arXiv

Probabilistic Analysis of Linear Programming Decoding

We initiate the probabilistic analysis of linear programming (LP) decoding of low-density parity-check (LDPC) codes. Specifically, we show that for a random LDPC code ensemble, the linear programming decoder of Feldman et al. succeeds in correcting a constant fraction of errors with high probability. The fraction of correctable errors guaranteed by our analysis surpasses previous non-asymptotic results for LDPC codes, and in particular exceeds the best previous finite-length result on LP decoding by a factor greater than ten. This improvement stems in part from our analysis of probabilistic bit-flipping channels, as opposed to adversarial channels. At the core of our analysis is a novel combinatorial characterization of LP decoding success, based on the notion of a generalized matching. An interesting by-product of our analysis is to establish the existence of ``probabilistic expansion'' in random bipartite graphs, in which one requires only that almost every (as opposed to every) set of a certain size expands, for sets much larger than in the classical worst-case setting.

preprint2007arXiv

First to Market is not Everything: an Analysis of Preferential Attachment with Fitness

In this paper, we provide a rigorous analysis of preferential attachment with fitness, a random graph model introduced by Bianconi and Barabasi. Depending on the shape of the fitness distribution, we observe three distinct phases: a first-mover-advantage phase, a fit-get-richer phase and an innovation-pays-off phase.

Constantinos Daskalakis

What is connected

Connect this record

See the researcher in context

Building this map preview

51 published item(s)

The Conflict Graph Design: Estimating Causal Effects under Arbitrary Neighborhood Interference

Efficient Truncated Linear Regression with Unknown Noise Variance

Estimation of Standard Auction Models

Fast Rates for Nonparametric Online Learning: From Realizability to Learning in Games

Recommender Systems meet Mechanism Design

Score-Guided Intermediate Layer Optimization: Fast Langevin Mixing for Inverse Problems

The Complexity of Markov Equilibrium in Stochastic Games

Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization

GANs with Conditional Independence Graphs: On Subadditivity of Probability Divergences

Independent Policy Gradient Methods for Competitive Reinforcement Learning

Constant-Expansion Suffices for Compressed Sensing with Generative Priors

Last Iterate is Slower than Averaged Iterate in Smooth Convex-Concave Saddle Point Problems

Logistic-Regression with peer-group effects via inference in higher order Ising models

Multi-Item Mechanisms without Item-Independence: Learnability via Robustness

SGD Learns One-Layer Networks in WGANs

The Complexity of Constrained Min-Max Optimization

Truncated Linear Regression in High Dimensions

A Size-Free CLT for Poisson Multinomials and its Applications

Learning in Auctions: Regret is Hard, Envy is Easy

Revenue Maximization and Ex-Post Budget Constraints

Sequential Mechanisms with ex-post Participation Guarantees

Learning Poisson Binomial Distributions

Mechanism Design via Optimal Transport

On the Structure, Covering, and Learning of Poisson Multinomial Distributions

Optimal Pricing is Hard

Optimal Testing for Properties of Distributions

Optimum Statistical Estimation with Strategic Data Sources

A Counter-Example to Karlin's Strong Conjecture for Fictitious Play

Bayesian Truthful Mechanisms for Job Scheduling from Bi-criterion Approximation Algorithms

Extreme-Value Theorems for Optimal Multidimensional Pricing

Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians

Learning $k$-Modal Distributions via Testing

Sparse Covers for Sums of Indicators

Testing Poisson Binomial Distributions

A Polynomial-time Approximation Scheme for Fault-tolerant Distributed Storage

Alignment-free phylogenetic reconstruction: Sample complexity via a branching process analysis

How good is the Chord algorithm?

Reducing Revenue to Welfare Maximization: Approximation Algorithms and other Generalizations

The Complexity of Optimal Mechanism Design

Understanding Incentives: Mechanism Design becomes Algorithm Design

Connectivity and equilibrium in random games

Optimal Multi-Dimensional Mechanism Design: Reducing Revenue to Welfare Maximization

Learning transformed product distributions

On Optimal Multi-Dimensional Mechanism Design

Testing $k$-Modal Distributions: Optimal Algorithms via Reductions

Evolutionary Trees and the Ising Model on the Bethe Lattice: a Proof of Steel's Conjecture

Global Alignment of Molecular Sequences via Ancestral State Reconstruction

Phylogenies without Branch Bounds: Contracting the Short, Pruning the Deep

Discretized Multinomial Distributions and Nash Equilibria in Anonymous Games

Probabilistic Analysis of Linear Programming Decoding

First to Market is not Everything: an Analysis of Preferential Attachment with Fitness