Source author record

Elchanan Mossel

Elchanan Mossel appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

92works

33topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Hierarchical Language Model with Predictable Scaling Laws and Provable Benefits of Reasoning

We introduce a family of synthetic languages with hierarchical structure -- generated by a broadcast process on trees -- for which the role of context length and reasoning in autoregressive generation can be analyzed precisely. At the heart of our analytic approach is an \emph{exact $k$-gram ansatz} in place of transformers with context length $k$, a substitution we then validate empirically. Using this ansatz we derive explicit asymptotic predictions for distributional statistics of the sequences produced by a trained model, instantiated in two settings. For the \emph{Ising broadcast process} (a soft-constrained language), we prove that the variance of the generated sum scales log-linearly in the context depth and its kurtosis converges to that of a Gaussian -- both deviating from the true language for any sublinear context. For the \emph{coloring broadcast process} (a hard-constrained language) in the freezing regime, bounded-context autoregression produces sequences that, with high probability, are inconsistent with \emph{any} valid coloring of the underlying tree. Together these results imply an $Ω(n)$ lower bound on the context length required to faithfully sample length-$n$ sequences. In contrast, we prove that an autoregressive \emph{reasoning} model with only $Θ(\log n)$ working memory can sample exactly from the true language -- an exponential improvement. We confirm both the lower-bound predictions and the reasoning-based upper bound empirically with transformers trained on the synthetic language; the trained models track our asymptotic predictions quantitatively across a wide range of context sizes.

preprint2026arXiv

A Theory of Online Learning with Autoregressive Chain-of-Thought Reasoning

Autoregressive generation lies at the heart of the mechanism of large language models. It can be viewed as the repeated application of a next-token generator: starting from an input string (prompt), the generator is applied for $M$ steps, and the last generated token is taken as the final output. [Joshi et al., 2025] proposed a PAC model for studying the learnability of the input-output maps arising from this process. We develop an online analogue of this framework, focusing on the mistake bound of learning the final output induced by an unknown next-token generator. We distinguish between two forms of feedback. In the End-to-End model, after each round the learner observes only the final token produced after $M$ autoregressive steps. In the Chain-of-Thought model, the learner is additionally shown the entire $M$-step trajectory. Our goal is to understand how the optimal mistake bound depends on the generation horizon $M$, and to what extent observing intermediate tokens can reduce this dependence. Our main results show that the online theory of autoregressive learning exhibits a qualitative picture analogous to the statistical one found by [Hanneke et al., 2026], but with a different scale of dependence on the generation horizon. In the End-to-End model, we prove a taxonomy of possible mistake-bound growth rates in the generation horizon $M$: essentially any rate between constant and logarithmic can arise. We further show that this logarithmic ceiling is unavoidable. In the Chain-of-Thought model, we show that access to the full generated trajectory eliminates the dependence on $M$ altogether. We also analyze autoregressive linear threshold classes, and prove optimal mistake bounds, as well as a new lower bound for the statistical setting. Along the way, our results resolve several questions left open by [Joshi et al., 2025].

preprint2026arXiv

Detecting Mutual Excitations in Non-Stationary Hawkes Processes

We consider the problem of learning the network of mutual excitations (i.e., the dependency graph) in a non-stationary, multivariate Hawkes process. We consider a general setting where baseline rates at each node are time-varying and delay kernels are not shift-invariant. Our main results show that if the dependency graph of an $n$-variate Hawkes process is sparse (i.e., it has a maximum degree that is bounded with respect to $n$), our algorithm accurately reconstructs it from data after observing the Hawkes process for $T = \mathrm{polylog}(n)$ time, with high probability. Our algorithm is computationally efficient, and provably succeeds in learning dependencies even if only a subset of time series are observed and event times are not precisely known.

preprint2026arXiv

The Benefits of Temporal Correlations: SGD Learns k-Juntas from Random Walks Efficiently

We study how temporal correlations in the data can make certain sparse learning problems efficiently learnable by gradient-based methods. Our focus is on Boolean k-juntas, a canonical sparse learning problem known to pose barriers for gradient-based methods under independent uniform samples. We show that this picture changes when the samples are generated by a lazy random walk on the hypercube. In this setting, the temporal dependencies can be exploited by a two-layer ReLU network trained using stylized-SGD with a temporal-difference loss, which compares target and predicted increments across consecutive samples. For every fixed k, the resulting sample complexity is essentially linear in the ambient dimension d. By contrast, we show that for large-batch gradient methods using standard convex pointwise losses, temporal correlations do not provide the same advantage.

preprint2024arXiv

Influence Maximization in Ising Models

Given a complex high-dimensional distribution over $\{\pm 1\}^n$, what is the best way to increase the expected number of $+1$'s by controlling the values of only a small number of variables? Such a problem is known as influence maximization and has been widely studied in social networks, biology, and computer science. In this paper, we consider influence maximization on the Ising model which is a prototypical example of undirected graphical models and has wide applications in many real-world problems. We establish a sharp computational phase transition for influence maximization on sparse Ising models under a bounded budget: In the high-temperature regime, we give a linear-time algorithm for finding a small subset of variables and their values which achieve nearly optimal influence; In the low-temperature regime, we show that the influence maximization problem cannot be solved in polynomial time under commonly-believed complexity assumption. The critical temperature coincides with the tree uniqueness/non-uniqueness threshold for Ising models which is also a critical point for other computational problems including approximate sampling and counting.

preprint2023arXiv

Stable matchings with correlated Preferences

The stable matching problem has been the subject of intense theoretical and empirical study since the seminal 1962 paper by Gale and Shapley. The number of stable matchings for different systems of preferences has been studied in many contexts, going back to Donald Knuth in the 1970s. In this paper, we consider a family of distributions defined by the Mallows permutations and show that with high probability the number of stable matchings for these preferences is exponential in the number of people.

preprint2022arXiv

Almost-Linear Planted Cliques Elude the Metropolis Process

A seminal work of Jerrum (1992) showed that large cliques elude the Metropolis process. More specifically, Jerrum showed that the Metropolis algorithm cannot find a clique of size $k=Θ(n^α), α\in (0,1/2)$, which is planted in the Erdős-Rényi random graph $G(n,1/2)$, in polynomial time. Information theoretically it is possible to find such planted cliques as soon as $k \ge (2+ε) \log n$. Since the work of Jerrum, the computational problem of finding a planted clique in $G(n,1/2)$ was studied extensively and many polynomial time algorithms were shown to find the planted clique if it is of size $k = Ω(\sqrt{n})$, while no polynomial-time algorithm is known to work when $k=o(\sqrt{n})$. Notably, the first evidence of the problem's algorithmic hardness is commonly attributed to the result of Jerrum from 1992. In this paper we revisit the original Metropolis algorithm suggested by Jerrum. Interestingly, we find that the Metropolis algorithm actually fails to recover a planted clique of size $k=Θ(n^α)$ for any constant $0 \leq α< 1$. Moreover, we strengthen Jerrum's results in a number of other ways including: Like many results in the MCMC literature, the result of Jerrum shows that there exists a starting state (which may depend on the instance) for which the Metropolis algorithm fails. For a wide range of temperatures, we show that the algorithm fails when started at the most natural initial state, which is the empty clique. This answers an open problem stated in Jerrum (1992). We also show that the simulated tempering version of the Metropolis algorithm, a more sophisticated temperature-exchange variant of it, also fails at the same regime of parameters. Finally, our results confirm recent predictions by Gamarnik and Zadik (2019) and Angelini, Fachin, de Feo (2021).

preprint2022arXiv

Inference in Opinion Dynamics under Social Pressure

We introduce a new opinion dynamics model where a group of agents holds two kinds of opinions: inherent and declared. Each agent's inherent opinion is fixed and unobservable by the other agents. At each time step, agents broadcast their declared opinions on a social network, which are governed by the agents' inherent opinions and social pressure. In particular, we assume that agents may declare opinions that are not aligned with their inherent opinions to conform with their neighbors. This raises the natural question: Can we estimate the agents' inherent opinions from observations of declared opinions? For example, agents' inherent opinions may represent their true political alliances (Democrat or Republican), while their declared opinions may model the political inclinations of tweets on social media. In this context, we may seek to predict the election results by observing voters' tweets, which do not necessarily reflect their political support due to social pressure. We analyze this question in the special case where the underlying social network is a complete graph. We prove that, as long as the population does not include large majorities, estimation of aggregate and individual inherent opinions is possible. On the other hand, large majorities force minorities to lie over time, which makes asymptotic estimation impossible.

preprint2022arXiv

On the Second Kahn--Kalai Conjecture

For any given graph $H$, we are interested in $p_\mathrm{crit}(H)$, the minimal $p$ such that the Erdős-Rényi graph $G(n,p)$ contains a copy of $H$ with probability at least $1/2$. Kahn and Kalai (2007) conjectured that $p_\mathrm{crit}(H)$ is given up to a logarithmic factor by a simpler "subgraph expectation threshold" $p_\mathrm{E}(H)$, which is the minimal $p$ such that for every subgraph $H'\subseteq H$, the Erdős-Rényi graph $G(n,p)$ contains \emph{in expectation} at least $1/2$ copies of $H'$. It is trivial that $p_\mathrm{E}(H) \le p_\mathrm{crit}(H)$, and the so-called "second Kahn-Kalai conjecture" states that $p_\mathrm{crit}(H) \lesssim p_\mathrm{E}(H) \log e(H)$ where $e(H)$ is the number of edges in $H$. In this article, we present a natural modification $p_\mathrm{E, new}(H)$ of the Kahn--Kalai subgraph expectation threshold, which we show is sandwiched between $p_\mathrm{E}(H)$ and $p_\mathrm{crit}(H)$. The new definition $p_\mathrm{E, new}(H)$ is based on the simple observation that if $G(n,p)$ contains a copy of $H$ and $H$ contains \emph{many} copies of $H'$, then $G(n,p)$ must also contain \emph{many} copies of $H'$. We then show that $p_\mathrm{crit}(H) \lesssim p_\mathrm{E, new}(H) \log e(H)$, thus proving a modification of the second Kahn--Kalai conjecture. The bound follows by a direct application of the set-theoretic "spread" property, which led to recent breakthroughs in the sunflower conjecture by Alweiss, Lovett, Wu and Zhang and the first fractional Kahn--Kalai conjecture by Frankston, Kahn, Narayanan and Park.

preprint2022arXiv

Seeding with Costly Network Information

We study the task of selecting $k$ nodes, in a social network of size $n$, to seed a diffusion with maximum expected spread size, under the independent cascade model with cascade probability $p$. Most of the previous work on this problem (known as influence maximization) focuses on efficient algorithms to approximate the optimal seed set with provable guarantees given knowledge of the entire network; however, obtaining full knowledge of the network is often very costly in practice. Here we develop algorithms and guarantees for approximating the optimal seed set while bounding how much network information is collected. First, we study the achievable guarantees using a sublinear influence sample size. We provide an almost tight approximation algorithm with an additive $εn$ loss and show that the squared dependence of sample size on $k$ is asymptotically optimal when $ε$ is small. We then propose a probing algorithm that queries edges from the graph and use them to find a seed set with the same almost tight approximation guarantee. We also provide a matching (up to logarithmic factors) lower-bound on the required number of edges. This algorithm is implementable in field surveys or in crawling online networks. Our probing takes $p$ as an input which may not be known in advance, and we show how to down-sample the probed edges to match the best estimate of $p$ if they are collected with a higher probability. Finally, we test our algorithms on an empirical network to quantify the tradeoff between the cost of obtaining more refined network information and the benefit of the added information for guiding improved seeding strategies.

preprint2022arXiv

Shotgun Assembly of Erdos-Renyi Random Graphs

Graph shotgun assembly refers to the problem of reconstructing a graph from a collection of local neighborhoods. In this paper, we consider shotgun assembly of \ER random graphs $G(n, p_n)$, where $p_n = n^{-α}$ for $0 < α< 1$. We consider both reconstruction up to isomorphism as well as exact reconstruction (recovering the vertex labels as well as the structure). We show that given the collection of distance-$1$ neighborhoods, $G$ is exactly reconstructable for $0 < α< \frac{1}{3}$, but not reconstructable for $\frac{1}{2} < α< 1$. Given the collection of distance-$2$ neighborhoods, $G$ is exactly reconstructable for $α\in \left(0, \frac{1}{2}\right) \cup \left(\frac{1}{2}, \frac{3}{5}\right)$, but not reconstructable for $\frac{3}{4} < α< 1$.

preprint2022arXiv

Shotgun assembly of labeled graphs

We consider the problem of reconstructing graphs or labeled graphs from neighborhoods of a given radius r. Special instances of this problem include the well known: DNA shotgun assembly; the lesser-known: neural network reconstruction; and a new problem: assembling random jigsaw puzzles. We provide some necessary and some sufficient conditions for correct recovery both in combinatorial terms and for some generative models including random labelings of lattices, Erdos-Renyi random graphs, and the random jigsaw puzzle model. Many open problems and conjectures are provided.

preprint2022arXiv

Spoofing Generalization: When Can't You Trust Proprietary Models?

In this work, we study the computational complexity of determining whether a machine learning model that perfectly fits the training data will generalizes to unseen data. In particular, we study the power of a malicious agent whose goal is to construct a model g that fits its training data and nothing else, but is indistinguishable from an accurate model f. We say that g strongly spoofs f if no polynomial-time algorithm can tell them apart. If instead we restrict to algorithms that run in $n^c$ time for some fixed $c$, we say that g c-weakly spoofs f. Our main results are 1. Under cryptographic assumptions, strong spoofing is possible and 2. For any c> 0, c-weak spoofing is possible unconditionally While the assumption of a malicious agent is an extreme scenario (hopefully companies training large models are not malicious), we believe that it sheds light on the inherent difficulties of blindly trusting large proprietary models or data.

preprint2021arXiv

Learning to Sample from Censored Markov Random Fields

We study learning Censor Markov Random Fields (abbreviated CMRFs). These are Markov Random Fields where some of the nodes are censored (not observed). We present an algorithm for learning high-temperature CMRFs within o(n) transportation distance. Crucially our algorithm makes no assumption about the structure of the graph or the number or location of the observed nodes. We obtain stronger results for high girth high-temperature CMRFs as well as computational lower bounds indicating that our results can not be qualitatively improved.

preprint2021arXiv

Robust testing of low-dimensional functions

A natural problem in high-dimensional inference is to decide if a classifier $f:\mathbb{R}^n \rightarrow \{-1,1\}$ depends on a small number of linear directions of its input data. Call a function $g: \mathbb{R}^n \rightarrow \{-1,1\}$, a linear $k$-junta if it is completely determined by some $k$-dimensional subspace of the input space. A recent work of the authors showed that linear $k$-juntas are testable. Thus there exists an algorithm to distinguish between: 1. $f: \mathbb{R}^n \rightarrow \{-1,1\}$ which is a linear $k$-junta with surface area $s$, 2. $f$ is $ε$-far from any linear $k$-junta with surface area $(1+ε)s$, where the query complexity of the algorithm is independent of the ambient dimension $n$. Following the surge of interest in noise-tolerant property testing, in this paper we prove a noise-tolerant (or robust) version of this result. Namely, we give an algorithm which given any $c>0$, $ε>0$, distinguishes between 1. $f: \mathbb{R}^n \rightarrow \{-1,1\}$ has correlation at least $c$ with some linear $k$-junta with surface area $s$. 2. $f$ has correlation at most $c-ε$ with any linear $k$-junta with surface area at most $s$. The query complexity of our tester is $k^{\mathsf{poly}(s/ε)}$. Using our techniques, we also obtain a fully noise tolerant tester with the same query complexity for any class $\mathcal{C}$ of linear $k$-juntas with surface area bounded by $s$. As a consequence, we obtain a fully noise tolerant tester with query complexity $k^{O(\mathsf{poly}(\log k/ε))}$ for the class of intersection of $k$-halfspaces (for constant $k$) over the Gaussian space. Our query complexity is independent of the ambient dimension $n$. Previously, no non-trivial noise tolerant testers were known even for a single halfspace.

preprint2020arXiv

Broadcasting on Random Directed Acyclic Graphs

We study a generalization of the well-known model of broadcasting on trees. Consider a directed acyclic graph (DAG) with a unique source vertex $X$, and suppose all other vertices have indegree $d\geq 2$. Let the vertices at distance $k$ from $X$ be called layer $k$. At layer $0$, $X$ is given a random bit. At layer $k\geq 1$, each vertex receives $d$ bits from its parents in layer $k-1$, which are transmitted along independent binary symmetric channel edges, and combines them using a $d$-ary Boolean processing function. The goal is to reconstruct $X$ with probability of error bounded away from $1/2$ using the values of all vertices at an arbitrarily deep layer. This question is closely related to models of reliable computation and storage, and information flow in biological networks. In this paper, we analyze randomly constructed DAGs, for which we show that broadcasting is only possible if the noise level is below a certain degree and function dependent critical threshold. For $d\geq 3$, and random DAGs with layer sizes $Ω(\log k)$ and majority processing functions, we identify the critical threshold. For $d=2$, we establish a similar result for NAND processing functions. We also prove a partial converse for odd $d\geq 3$ illustrating that the identified thresholds are impossible to improve by selecting different processing functions if the decoder is restricted to using a single vertex. Finally, for any noise level, we construct explicit DAGs (using expander graphs) with bounded degree and layer sizes $Θ(\log k)$ admitting reconstruction. In particular, we show that such DAGs can be generated in deterministic quasi-polynomial time or randomized polylogarithmic time in the depth. These results portray a doubly-exponential advantage for storing a bit in DAGs compared to trees, where $d=1$ but layer sizes must grow exponentially with depth in order to enable broadcasting.

preprint2020arXiv

Consistency Thresholds for the Planted Bisection Model

The planted bisection model is a random graph model in which the nodes are divided into two equal-sized communities and then edges are added randomly in a way that depends on the community membership. We establish necessary and sufficient conditions for the asymptotic recoverability of the planted bisection in this model. When the bisection is asymptotically recoverable, we give an efficient algorithm that successfully recovers it. We also show that the planted bisection is recoverable asymptotically if and only if with high probability every node belongs to the same community as the majority of its neighbors. Our algorithm for finding the planted bisection runs in time almost linear in the number of edges. It has three stages: spectral clustering to compute an initial guess, a "replica" stage to get almost every vertex correct, and then some simple local moves to finish the job. An independent work by Abbe, Bandeira, and Hall establishes similar (slightly weaker) results but only in the case of logarithmic average degree.

preprint2020arXiv

Distributed Corruption Detection in Networks

We consider the problem of distributed corruption detection in networks. In this model, each vertex of a directed graph is either truthful or corrupt. Each vertex reports the type (truthful or corrupt) of each of its outneighbors. If it is truthful, it reports the truth, whereas if it is corrupt, it reports adversarially. This model, first considered by Preparata, Metze, and Chien in 1967, motivated by the desire to identify the faulty components of a digital system by having the other components checking them, became known as the PMC model. The main known results for this model characterize networks in which \emph{all} corrupt (that is, faulty) vertices can be identified, when there is a known upper bound on their number. We are interested in networks in which the identity of a \emph{large fraction} of the vertices can be identified. It is known that in the PMC model, in order to identify all corrupt vertices when their number is $t$, all indegrees have to be at least $t$. In contrast, we show that in $d$ regular-graphs with strong expansion properties, a $1-O(1/d)$ fraction of the corrupt vertices, and a $1-O(1/d)$ fraction of the truthful vertices can be identified, whenever there is a majority of truthful vertices. We also observe that if the graph is very far from being a good expander, namely, if the deletion of a small set of vertices splits the graph into small components, then no corruption detection is possible even if most of the vertices are truthful. Finally, we discuss the algorithmic aspects and the computational hardness of the problem.

preprint2020arXiv

Efficient Reconstruction of Stochastic Pedigrees

We introduce a new algorithm called {\sc Rec-Gen} for reconstructing the genealogy or \textit{pedigree} of an extant population purely from its genetic data. We justify our approach by giving a mathematical proof of the effectiveness of {\sc Rec-Gen} when applied to pedigrees from an idealized generative model that replicates some of the features of real-world pedigrees. Our algorithm is iterative and provides an accurate reconstruction of a large fraction of the pedigree while having relatively low \emph{sample complexity}, measured in terms of the length of the genetic sequences of the population. We propose our approach as a prototype for further investigation of the pedigree reconstruction problem toward the goal of applications to real-world examples. As such, our results have some conceptual bearing on the increasingly important issue of genomic privacy.

preprint2020arXiv

Rational Groupthink

We study how long-lived rational agents learn from repeatedly observing a private signal and each others' actions. With normal signals, a group of any size learns more slowly than just four agents who directly observe each others' private signals in each period. Similar results apply to general signal structures. We identify rational groupthink---in which agents ignore their private signals and choose the same action for long periods of time---as the cause of this failure of information aggregation.

preprint2019arXiv

Bayesian Decision Making in Groups is Hard

We study the computations that Bayesian agents undertake when exchanging opinions over a network. The agents act repeatedly on their private information and take myopic actions that maximize their expected utility according to a fully rational posterior belief. We show that such computations are NP-hard for two natural utility functions: one with binary actions, and another where agents reveal their posterior beliefs. In fact, we show that distinguishing between posteriors that are concentrated on different states of the world is NP-hard. Therefore, even approximating the Bayesian posterior beliefs is hard. We also describe a natural search algorithm to compute agents' actions, which we call elimination of impossible signals, and show that if the network is transitive, the algorithm can be modified to run in polynomial time.

preprint2019arXiv

How Many Subpopulations is Too Many? Exponential Lower Bounds for Inferring Population Histories

Reconstruction of population histories is a central problem in population genetics. Existing coalescent-based methods, like the seminal work of Li and Durbin (Nature, 2011), attempt to solve this problem using sequence data but have no rigorous guarantees. Determining the amount of data needed to correctly reconstruct population histories is a major challenge. Using a variety of tools from information theory, the theory of extremal polynomials, and approximation theory, we prove new sharp information-theoretic lower bounds on the problem of reconstructing population structure -- the history of multiple subpopulations that merge, split and change sizes over time. Our lower bounds are exponential in the number of subpopulations, even when reconstructing recent histories. We demonstrate the sharpness of our lower bounds by providing algorithms for distinguishing and learning population histories with matching dependence on the number of subpopulations. Along the way and of independent interest, we essentially determine the optimal number of samples needed to learn an exponential mixture distribution information-theoretically, proving the upper bound by analyzing natural (and efficient) algorithms for this problem.

preprint2019arXiv

Social learning equilibria

We consider a large class of social learning models in which a group of agents face uncertainty regarding a state of the world, share the same utility function, observe private signals, and interact in a general dynamic setting. We introduce Social Learning Equilibria, a static equilibrium concept that abstracts away from the details of the given extensive form, but nevertheless captures the corresponding asymptotic equilibrium behavior. We establish general conditions for agreement, herding, and information aggregation in equilibrium, highlighting a connection between agreement and information aggregation.

preprint2016arXiv

Belief propagation, robust reconstruction and optimal recovery of block models

We consider the problem of reconstructing sparse symmetric block models with two blocks and connection probabilities $a/n$ and $b/n$ for inter- and intra-block edge probabilities, respectively. It was recently shown that one can do better than a random guess if and only if $(a-b)^2>2(a+b)$. Using a variant of belief propagation, we give a reconstruction algorithm that is optimal in the sense that if $(a-b)^2>C(a+b)$ for some constant $C$ then our algorithm maximizes the fraction of the nodes labeled correctly. Ours is the only algorithm proven to achieve the optimal fraction of nodes labeled correctly. Along the way, we prove some results of independent interest regarding robust reconstruction for the Ising model on regular and Poisson trees.

preprint2016arXiv

Density Evolution in the Degree-correlated Stochastic Block Model

There is a recent surge of interest in identifying the sharp recovery thresholds for cluster recovery under the stochastic block model. In this paper, we address the more refined question of how many vertices that will be misclassified on average. We consider the binary form of the stochastic block model, where $n$ vertices are partitioned into two clusters with edge probability $a/n$ within the first cluster, $c/n$ within the second cluster, and $b/n$ across clusters. Suppose that as $n \to \infty$, $a= b+ μ\sqrt{ b} $, $c=b+ ν\sqrt{ b} $ for two fixed constants $μ, ν$, and $b \to \infty$ with $b=n^{o(1)}$. When the cluster sizes are balanced and $μ\neq ν$, we show that the minimum fraction of misclassified vertices on average is given by $Q(\sqrt{v^*})$, where $Q(x)$ is the Q-function for standard normal, $v^*$ is the unique fixed point of $v= \frac{(μ-ν)^2}{16} + \frac{ (μ+ν)^2 }{16} \mathbb{E}[ \tanh(v+ \sqrt{v} Z)],$ and $Z$ is standard normal. Moreover, the minimum misclassified fraction on average is attained by a local algorithm, namely belief propagation, in time linear in the number of edges. Our proof techniques are based on connecting the cluster recovery problem to tree reconstruction problems, and analyzing the density evolution of belief propagation on trees with Gaussian approximations.

preprint2016arXiv

Invariance principle on the slice

We prove an invariance principle for functions on a slice of the Boolean cube, which is the set of all vectors {0,1}^n with Hamming weight k. Our invariance principle shows that a low-degree, low-influence function has similar distributions on the slice, on the entire Boolean cube, and on Gaussian space. Our proof relies on a combination of ideas from analysis and probability, algebra and combinatorics. Our result imply a version of majority is stablest for functions on the slice, a version of Bourgain's tail bound, and a version of the Kindler-Safra theorem. As a corollary of the Kindler-Safra theorem, we prove a stability result of Wilson's theorem for t-intersecting families of sets, improving on a result of Friedgut.

preprint2016arXiv

Linear Sketching over $\mathbb F_2$

We initiate a systematic study of linear sketching over $\mathbb F_2$. For a given Boolean function $f \colon \{0,1\}^n \to \{0,1\}$ a randomized $\mathbb F_2$-sketch is a distribution $\mathcal M$ over $d \times n$ matrices with elements over $\mathbb F_2$ such that $\mathcal Mx$ suffices for computing $f(x)$ with high probability. We study a connection between $\mathbb F_2$-sketching and a two-player one-way communication game for the corresponding XOR-function. Our results show that this communication game characterizes $\mathbb F_2$-sketching under the uniform distribution (up to dependence on error). Implications of this result include: 1) a composition theorem for $\mathbb F_2$-sketching complexity of a recursive majority function, 2) a tight relationship between $\mathbb F_2$-sketching complexity and Fourier sparsity, 3) lower bounds for a certain subclass of symmetric functions. We also fully resolve a conjecture of Montanaro and Osborne regarding one-way communication complexity of linear threshold functions by designing an $\mathbb F_2$-sketch of optimal size. Furthermore, we show that (non-uniform) streaming algorithms that have to process random updates over $\mathbb F_2$ can be constructed as $\mathbb F_2$-sketches for the uniform distribution with only a minor loss. In contrast with the previous work of Li, Nguyen and Woodruff (STOC'14) who show an analogous result for linear sketches over integers in the adversarial setting our result doesn't require the stream length to be triply exponential in $n$ and holds for streams of length $\tilde O(n)$ constructed through uniformly random updates. Finally, we state a conjecture that asks whether optimal one-way communication protocols for XOR-functions can be constructed as $\mathbb F_2$-sketches with only a small loss.

preprint2016arXiv

Noise Stability and Correlation with Half Spaces

Benjamini, Kalai and Schramm showed that a monotone function $f : \{-1,1\}^n \to \{-1,1\}$ is noise stable if and only if it is correlated with a half-space (a set of the form $\{x: \langle x, a\rangle \le b\}$). We study noise stability in terms of correlation with half-spaces for general (not necessarily monotone) functions. We show that a function $f: \{-1, 1\}^n \to \{-1, 1\}$ is noise stable if and only if it becomes correlated with a half-space when we modify $f$ by randomly restricting a constant fraction of its coordinates. Looking at random restrictions is necessary: we construct noise stable functions whose correlation with any half-space is $o(1)$. The examples further satisfy that different restrictions are correlated with different half-spaces: for any fixed half-space, the probability that a random restriction is correlated with it goes to zero. We also provide quantitative versions of the above statements, and versions that apply for the Gaussian measure on $\mathbb{R}^n$ instead of the discrete cube. Our work is motivated by questions in learning theory and a recent question of Khot and Moshkovitz.

preprint2016arXiv

Sequence assembly from corrupted shotgun reads

The prevalent technique for DNA sequencing consists of two main steps: shotgun sequencing, where many randomly located fragments, called reads, are extracted from the overall sequence, followed by an assembly algorithm that aims to reconstruct the original sequence. There are many different technologies that generate the reads: widely-used second-generation methods create short reads with low error rates, while emerging third-generation methods create long reads with high error rates. Both error rates and error profiles differ among methods, so reconstruction algorithms are often tailored to specific shotgun sequencing technologies. As these methods change over time, a fundamental question is whether there exist reconstruction algorithms which are robust, i.e., which perform well under a wide range of error distributions. Here we study this question of sequence assembly from corrupted reads. We make no assumption on the types of errors in the reads, but only assume a bound on their magnitude. More precisely, for each read we assume that instead of receiving the true read with no errors, we receive a corrupted read which has edit distance at most $ε$ times the length of the read from the true read. We show that if the reads are long enough and there are sufficiently many of them, then approximate reconstruction is possible: we construct a simple algorithm such that for almost all original sequences the output of the algorithm is a sequence whose edit distance from the original one is at most $O(ε)$ times the length of the original sequence.

preprint2016arXiv

Shotgun Assembly of Random Jigsaw Puzzles

In a recent work, Mossel and Ross considered the shotgun assembly problem for a random jigsaw puzzle. Their model consists of a puzzle - an $n\times n$ grid, where each vertex is viewed as a center of a piece. They assume that each of the four edges adjacent to a vertex, is assigned one of $q$ colors (corresponding to "jigs", or cut shapes) uniformly at random. Mossel and Ross asked: how large should $q = q(n)$ be so that with high probability the puzzle can be assembled uniquely given the collection of individual tiles? They showed that if $q = ω(n^2)$, then the puzzle can be assembled uniquely with high probability, while if $q = o(n^{2/3})$, then with high probability the puzzle cannot be uniquely assembled. Here we improve the upper bound and show that for any $\eps > 0$, the puzzle can be assembled uniquely with high probability if $q \geq n^{1+\eps}$. The proof uses an algorithm of $n^{Θ(1/\eps)}$ running time.

preprint2015arXiv

A Proof Of The Block Model Threshold Conjecture

We study a random graph model named the "block model" in statistics and the "planted partition model" in theoretical computer science. In its simplest form, this is a random graph with two equal-sized clusters, with a between-class edge probability of $q$ and a within-class edge probability of $p$. A striking conjecture of Decelle, Krzkala, Moore and Zdeborová based on deep, non-rigorous ideas from statistical physics, gave a precise prediction for the algorithmic threshold of clustering in the sparse planted partition model. In particular, if $p = a/n$ and $q = b/n$, $s=(a-b)/2$ and $p=(a+b)/2$ then Decelle et al.\ conjectured that it is possible to efficiently cluster in a way correlated with the true partition if $s^2 > p$ and impossible if $s^2 < p$. By comparison, the best-known rigorous result is that of Coja-Oghlan, who showed that clustering is possible if $s^2 > C p \ln p$ for some sufficiently large $C$. In a previous work, we proved that indeed it is information theoretically impossible to to cluster if $s^2 < p$ and furthermore it is information theoretically impossible to even estimate the model parameters from the graph when $s^2 < p$. Here we complete the proof of the conjecture by providing an efficient algorithm for clustering in a way that is correlated with the true partition when $s^2 > p$. A different independent proof of the same result was recently obtained by Laurent Massoulie.

preprint2015arXiv

Coexistence in preferential attachment networks

We introduce a new model of competition on growing networks. This extends the preferential attachment model, with the key property that node choices evolve simultaneously with the network. When a new node joins the network, it chooses neighbours by preferential attachment, and selects its type based on the number of initial neighbours of each type. The model is analysed in detail, and in particular, we determine the possible proportions of the various types in the limit of large networks. An important qualitative feature we find is that, in contrast to many current theoretical models, often several competitors will coexist. This matches empirical observations in many real-world networks.

preprint2015arXiv

Local Algorithms for Block Models with Side Information

There has been a recent interest in understanding the power of local algorithms for optimization and inference problems on sparse graphs. Gamarnik and Sudan (2014) showed that local algorithms are weaker than global algorithms for finding large independent sets in sparse random regular graphs. Montanari (2015) showed that local algorithms are suboptimal for finding a community with high connectivity in the sparse Erdős-Rényi random graphs. For the symmetric planted partition problem (also named community detection for the block models) on sparse graphs, a simple observation is that local algorithms cannot have non-trivial performance. In this work we consider the effect of side information on local algorithms for community detection under the binary symmetric stochastic block model. In the block model with side information each of the $n$ vertices is labeled $+$ or $-$ independently and uniformly at random; each pair of vertices is connected independently with probability $a/n$ if both of them have the same label or $b/n$ otherwise. The goal is to estimate the underlying vertex labeling given 1) the graph structure and 2) side information in the form of a vertex labeling positively correlated with the true one. Assuming that the ratio between in and out degree $a/b$ is $Θ(1)$ and the average degree $ (a+b) / 2 = n^{o(1)}$, we characterize three different regimes under which a local algorithm, namely, belief propagation run on the local neighborhoods, maximizes the expected fraction of vertices labeled correctly. Thus, in contrast to the case of symmetric block models without side information, we show that local algorithms can achieve optimal performance for the block model with side information.

preprint2015arXiv

MCMC Learning

The theory of learning under the uniform distribution is rich and deep, with connections to cryptography, computational complexity, and the analysis of boolean functions to name a few areas. This theory however is very limited due to the fact that the uniform distribution and the corresponding Fourier basis are rarely encountered as a statistical model. A family of distributions that vastly generalizes the uniform distribution on the Boolean cube is that of distributions represented by Markov Random Fields (MRF). Markov Random Fields are one of the main tools for modeling high dimensional data in many areas of statistics and machine learning. In this paper we initiate the investigation of extending central ideas, methods and algorithms from the theory of learning under the uniform distribution to the setup of learning concepts given examples from MRF distributions. In particular, our results establish a novel connection between properties of MCMC sampling of MRFs and learning under the MRF distribution.

preprint2015arXiv

On the Correlation of Increasing Families

The classical correlation inequality of Harris asserts that any two monotone increasing families on the discrete cube are nonnegatively correlated. In 1996, Talagrand established a lower bound on the correlation in terms of how much the two families depend simultaneously on the same coordinates. Talagrand's method and results inspired a number of important works in combinatorics and probability theory. In this paper we present stronger correlation lower bounds that hold when the increasing families satisfy natural regularity or symmetry conditions. In addition, we present several new classes of examples for which Talagrand's bound is tight. A central tool in the paper is a simple lemma asserting that for monotone events noise decreases correlation. This lemma gives also a very simple derivation of the classical FKG inequality for product measures, and leads to a simplification of part of Talagrand's proof.

preprint2015arXiv

On the Impossibility of Learning the Missing Mass

This paper shows that one cannot learn the probability of rare events without imposing further structural assumptions. The event of interest is that of obtaining an outcome outside the coverage of an i.i.d. sample from a discrete distribution. The probability of this event is referred to as the "missing mass". The impossibility result can then be stated as: the missing mass is not distribution-free PAC-learnable in relative error. The proof is semi-constructive and relies on a coupling argument using a dithered geometric distribution. This result formalizes the folklore that in order to predict rare events, one necessarily needs distributions with "heavy tails".

preprint2015arXiv

Quickest Online Selection of an Increasing Subsequence of Specified Size

Given a sequence of independent random variables with a common continuous distribution, we consider the online decision problem where one seeks to minimize the expected value of the time that is needed to complete the selection of a monotone increasing subsequence of a prespecified length $n$. This problem is dual to some online decision problems that have been considered earlier, and this dual problem has some notable advantages. In particular, the recursions and equations of optimality lead with relative ease to asymptotic formulas for mean and variance of the minimal selection time.

preprint2015arXiv

Robust dimension free isoperimetry in Gaussian space

We prove the first robust dimension free isoperimetric result for the standard Gaussian measure $γ_n$ and the corresponding boundary measure $γ_n^+$ in $\mathbb {R}^n$. The main result in the theory of Gaussian isoperimetry (proven in the 1970s by Sudakov and Tsirelson, and independently by Borell) states that if $γ_n(A)=1/2$ then the surface area of $A$ is bounded by the surface area of a half-space with the same measure, $γ_n^+(A)\leq(2π)^{-1/2}$. Our results imply in particular that if $A\subset \mathbb {R}^n$ satisfies $γ_n(A)=1/2$ and $γ_n^+(A)\leq(2π)^{-1/2}+δ$ then there exists a half-space $B\subset \mathbb {R}^n$ such that $γ_n(AΔB)\leq C\smash{\log^{-1/2}}(1/δ)$ for an absolute constant $C$. Since the Gaussian isoperimetric result was established, only recently a robust version of the Gaussian isoperimetric result was obtained by Cianchi et al., who showed that $γ_n(AΔB)\le C(n)\sqrtδ$ for some function $C(n)$ with no effective bounds. Compared to the results of Cianchi et al., our results have optimal (i.e., no) dependence on the dimension, but worse dependence on $ δ$.

preprint2015arXiv

Strong Contraction and Influences in Tail Spaces

We study contraction under a Markov semi-group and influence bounds for functions in $L^2$ tail spaces, i.e. functions all of whose low level Fourier coefficients vanish. It is natural to expect that certain analytic inequalities are stronger for such functions than for general functions in $L^2$. In the positive direction we prove an $L^{p}$ Poincaré inequality and moment decay estimates for mean $0$ functions and for all $1<p<\infty$, proving the degree one case of a conjecture of Mendel and Naor as well as the general degree case of the conjecture when restricted to Boolean functions. In the negative direction, we answer negatively two questions of Hatami and Kalai concerning extensions of the Kahn-Kalai-Linial and Harper Theorems to tail spaces. That is, we construct a function $f\colon\{-1,1\}^{n}\to\{-1,1\}$ whose Fourier coefficients vanish up to level $c \log n$, with all influences bounded by $C \log n/n$ for some constants $0<c,C< \infty$. We also construct a function $f\colon\{-1,1\}^{n}\to\{0,1\}$ with nonzero mean whose remaining Fourier coefficients vanish up to level $c' \log n$, with the sum of the influences bounded by $C'(\mathbb{E}f)\log(1/\mathbb{E}f)$ for some constants $0<c',C'<\infty$.

preprint2014arXiv

A Statistical Test for Clades in Phylogenies

We investigated testing the likelihood of a phylogenetic tree by comparison to its subtree pruning and regrafting (SPR) neighbors, with or without re-optimizing branch lengths. This is inspired by aspects of Bayesian significance tests, and the use of SPRs for heuristically finding maximum likelihood trees. Through a number of simulations with the Jukes-Cantor model on various topologies, it is observed that the SPR tests are informative, and reasonably fast compared to searching for the maximum likelihood tree. This suggests that the SPR tests would be a useful addition to the suite of existing statistical tests, for identifying potential inaccuracies of inferred topologies.

preprint2014arXiv

Can one hear the shape of a population history?

Reconstructing past population size from present day genetic data is a major goal of population genetics. Recent empirical studies infer population size history using coalescent-based models applied to a small number of individuals. Here we provide tight bounds on the amount of exact coalescence time data needed to recover the population size history of a single, panmictic population at a certain level of accuracy. In practice, coalescence times are estimated from sequence data and so our lower bounds should be taken as rather conservative.

preprint2014arXiv

Competing first passage percolation on random regular graphs

We consider two competing first passage percolation processes started from uniformly chosen subsets of a random regular graph on $N$ vertices. The processes are allowed to spread with different rates, start from vertex subsets of different sizes or at different times. We obtain tight results regarding the sizes of the vertex sets occupied by each process, showing that in the generic situation one process will occupy $Θ(1)N^α$ vertices, for some $0 < α< 1$. The value of $α$ is calculated in terms of the relative rates of the processes, as well as the sizes of the initial vertex sets and the possible time advantage of one process. The motivation for this work comes from the study of viral marketing on social networks. The described processes can be viewed as two competing products spreading through a social network (random regular graph). Considering the processes which grow at different rates (corresponding to different attraction levels of the two products) or starting at different times (the first to market advantage) allows to model aspects of real competition. The results obtained can be interpreted as one of the two products taking the lion share of the market. We compare these results to the same process run on $d$ dimensional grids where we show that in the generic situation the two products will have a linear fraction of the market each.

preprint2014arXiv

From trees to seeds: on the inference of the seed from large trees in the uniform attachment model

We study the influence of the seed in random trees grown according to the uniform attachment model, also known as uniform random recursive trees. We show that different seeds lead to different distributions of limiting trees from a total variation point of view. To do this, we construct statistics that measure, in a certain well-defined sense, global "balancedness" properties of such trees. Our paper follows recent results on the same question for the preferential attachment model.

preprint2014arXiv

Global and Local Information in Clustering Labeled Block Models

The stochastic block model is a classical cluster-exhibiting random graph model that has been widely studied in statistics, physics and computer science. In its simplest form, the model is a random graph with two equal-sized clusters, with intra-cluster edge probability p, and inter-cluster edge probability q. We focus on the sparse case, i.e., p, q = O(1/n), which is practically more relevant and also mathematically more challenging. A conjecture of Decelle, Krzakala, Moore and Zdeborova, based on ideas from statistical physics, predicted a specific threshold for clustering. The negative direction of the conjecture was proved by Mossel, Neeman and Sly (2012), and more recently the positive direction was proven independently by Massoulie and Mossel, Neeman, and Sly. In many real network clustering problems, nodes contain information as well. We study the interplay between node and network information in clustering by studying a labeled block model, where in addition to the edge information, the true cluster labels of a small fraction of the nodes are revealed. In the case of two clusters, we show that below the threshold, a small amount of node information does not affect recovery. On the other hand, we show that for any small amount of information efficient local clustering is achievable as long as the number of clusters is sufficiently large (as a function of the amount of revealed information).

preprint2014arXiv

Majority rule has transition ratio 4 on Yule trees under a 2-state symmetric model

Inferring the ancestral state at the root of a phylogenetic tree from states observed at the leaves is a problem arising in evolutionary biology. The simplest technique -- majority rule -- estimates the root state by the most frequently occurring state at the leaves. Alternative methods -- such as maximum parsimony - explicitly take the tree structure into account. Since either method can outperform the other on particular trees, it is useful to consider the accuracy of the methods on trees generated under some evolutionary null model, such as a Yule pure-birth model. In this short note, we answer a recently posed question concerning the performance of majority rule on Yule trees under a symmetric 2-state Markovian substitution model of character state change. We show that majority rule is accurate precisely when the ratio of the birth (speciation) rate of the Yule process to the substitution rate exceeds the value $4$. By contrast, maximum parsimony has been shown to be accurate only when this ratio is at least 6. Our proof relies on a second moment calculation, coupling, and a novel application of a reflection principle.

preprint2014arXiv

On the influence of the seed graph in the preferential attachment model

We study the influence of the seed graph in the preferential attachment model, focusing on the case of trees. We first show that the seed has no effect from a weak local limit point of view. On the other hand, we conjecture that different seeds lead to different distributions of limiting trees from a total variation point of view. We take a first step in proving this conjecture by showing that seeds with different degree profiles lead to different limiting distributions for the (appropriately normalized) maximum degree, implying that such seeds lead to different (in total variation) limiting trees.

preprint2014arXiv

Standard Simplices and Pluralities are Not the Most Noise Stable

The Standard Simplex Conjecture and the Plurality is Stablest Conjecture are two conjectures stating that certain partitions are optimal with respect to Gaussian and discrete noise stability respectively. These two conjectures are natural generalizations of the Gaussian noise stability result by Borell (1985) and the Majority is Stablest Theorem (2004). Here we show that the standard simplex is not the most stable partition in Gaussian space and that Plurality is not the most stable low influence partition in discrete space for every number of parts $k \geq 3$, for every value $ρ\neq 0$ of the noise and for every prescribed measures for the different parts as long as they are not all equal to $1/k$. Our results do not contradict the original statements of the Plurality is Stablest and Standard Simplex Conjectures in their original statements concerning partitions to sets of equal measure. However, they indicate that if these conjectures are true, their veracity and their proofs will crucially rely on assuming that the sets are of equal measures, in stark contrast to Borell's result, the Majority is Stablest Theorem and many other results in isoperimetric theory. Given our results it is natural to ask for (conjectured) partitions achieving the optimum noise stability.

preprint2013arXiv

A Smooth Transition from Powerlessness to Absolute Power

We study the phase transition of the coalitional manipulation problem for generalized scoring rules. Previously it has been shown that, under some conditions on the distribution of votes, if the number of manipulators is $o(\sqrt{n})$, where $n$ is the number of voters, then the probability that a random profile is manipulable by the coalition goes to zero as the number of voters goes to infinity, whereas if the number of manipulators is $ω(\sqrt{n})$, then the probability that a random profile is manipulable goes to one. Here we consider the critical window, where a coalition has size $c\sqrt{n}$, and we show that as $c$ goes from zero to infinity, the limiting probability that a random profile is manipulable goes from zero to one in a smooth fashion, i.e., there is a smooth phase transition between the two regimes. This result analytically validates recent empirical results, and suggests that deciding the coalitional manipulation problem may be of limited computational hardness in practice.

preprint2013arXiv

Computation in anonymous networks

We identify and investigate a computational model arising in molecular computing, social computing and sensor network. The model is made of of multiple agents who are computationally limited and posses no global information. The agents may represent nodes in a social network, sensors, or molecules in a molecular computer. Assuming that each agent is in one of $k$ states, we say that {\em the system computes} $f:[k]^{n} \to [k]$ if all agents eventually converge to the correct value of $f$. We present number of general results characterizing the computational power of the mode. We further present protocols for computing the plurality function with $O(\log k)$ memory and for approximately counting the number of nodes of a given color with $O(\log \log n)$ memory, where $n$ is the number of agents in the networks. These results are tight.

preprint2013arXiv

Exact thresholds for Ising-Gibbs samplers on general graphs

We establish tight results for rapid mixing of Gibbs samplers for the Ferromagnetic Ising model on general graphs. We show that if \[(d-1)\tanhβ<1,\] then there exists a constant C such that the discrete time mixing time of Gibbs samplers for the ferromagnetic Ising model on any graph of n vertices and maximal degree d, where all interactions are bounded by $β$, and arbitrary external fields are bounded by $Cn\log n$. Moreover, the spectral gap is uniformly bounded away from 0 for all such graphs, as well as for infinite graphs of maximal degree d. We further show that when $d\tanhβ<1$, with high probability over the Erdos-Renyi random graph $G(n,d/n)$, it holds that the mixing time of Gibbs samplers is \[n^{1+Θ({1}/{\log\log n})}.\] Both results are tight, as it is known that the mixing time for random regular and Erdos-Renyi random graphs is, with high probability, exponential in n when $(d-1)\tanhβ>1$, and $d\tanhβ>1$, respectively. To our knowledge our results give the first tight sufficient conditions for rapid mixing of spin systems on general graphs. Moreover, our results are the first rigorous results establishing exact thresholds for dynamics on random graphs in terms of spatial thresholds on trees.

preprint2013arXiv

Explicit Optimal Hardness via Gaussian stability results

The results of Raghavendra (2008) show that assuming Khot's Unique Games Conjecture (2002), for every constraint satisfaction problem there exists a generic semi-definite program that achieves the optimal approximation factor. This result is existential as it does not provide an explicit optimal rounding procedure nor does it allow to calculate exactly the Unique Games hardness of the problem. Obtaining an explicit optimal approximation scheme and the corresponding approximation factor is a difficult challenge for each specific approximation problem. An approach for determining the exact approximation factor and the corresponding optimal rounding was established in the analysis of MAX-CUT (KKMO 2004) and the use of the Invariance Principle (MOO 2005). However, this approach crucially relies on results explicitly proving optimal partitions in Gaussian space. Until recently, Borell's result (Borell 1985) was the only non-trivial Gaussian partition result known. In this paper we derive the first explicit optimal approximation algorithm and the corresponding approximation factor using a new result on Gaussian partitions due to Isaksson and Mossel (2012). This Gaussian result allows us to determine exactly the Unique Games Hardness of MAX-3-EQUAL. In particular, our results show that Zwick algorithm for this problem achieves the optimal approximation factor and prove that the approximation achieved by the algorithm is $\approx 0.796$ as conjectured by Zwick. We further use the previously known optimal Gaussian partitions results to obtain a new Unique Games Hardness factor for MAX-k-CSP : Using the well known fact that jointly normal pairwise independent random variables are fully independent, we show that the the UGC hardness of Max-k-CSP is $\frac{\lceil (k+1)/2 \rceil}{2^{k-1}}$, improving on results of Austrin and Mossel (2009).

preprint2013arXiv

Mixing under monotone censoring

We initiate the study of mixing times of Markov chain under monotone censoring. Suppose we have some Markov Chain $M$ on a state space $Ω$ with stationary distribution $π$ and a monotone set $A \subset Ω$. We consider the chain $M'$ which is the same as the chain $M$ started at some $x \in A$ except that moves of $M$ of the form $x \to y$ where $x \in A$ and $y \notin A$ are {\em censored} and replaced by the move $x \to x$. If $M$ is ergodic and $A$ is connected, the new chain converges to $π$ conditional on $A$. In this paper we are interested in the mixing time of the chain $M'$ in terms of properties of $M$ and $A$. Our results are based on new connections with the field of property testing. A number of open problems are presented.

preprint2013arXiv

Robust Optimality of Gaussian Noise Stability

We prove that under the Gaussian measure, half-spaces are uniquely the most noise stable sets. We also prove a quantitative version of uniqueness, showing that a set which is almost optimally noise stable must be close to a half-space. This extends a theorem of Borell, who proved the same result but without uniqueness, and it also answers a question of Ledoux, who asked whether it was possible to prove Borell's theorem using a direct semigroup argument. Our quantitative uniqueness result has various applications in diverse fields.

preprint2013arXiv

Spectral redemption: clustering sparse networks

Spectral algorithms are classic approaches to clustering and community detection in networks. However, for sparse networks the standard versions of these algorithms are suboptimal, in some cases completely failing to detect communities even when other algorithms such as belief propagation can do so. Here we introduce a new class of spectral algorithms based on a non-backtracking walk on the directed edges of the graph. The spectrum of this operator is much better-behaved than that of the adjacency matrix or other commonly used matrices, maintaining a strong separation between the bulk eigenvalues and the eigenvalues relevant to community structure even in the sparse case. We show that our algorithm is optimal for graphs generated by the stochastic block model, detecting communities all the way down to the theoretical limit. We also show the spectrum of the non-backtracking operator for some real-world networks, illustrating its advantages over traditional spectral clustering.

preprint2012arXiv

A quantitative Gibbard-Satterthwaite theorem without neutrality

Recently, quantitative versions of the Gibbard-Satterthwaite theorem were proven for $k=3$ alternatives by Friedgut, Kalai, Keller and Nisan and for neutral functions on $k \geq 4$ alternatives by Isaksson, Kindler and Mossel. We prove a quantitative version of the Gibbard-Satterthwaite theorem for general social choice functions for any number $k \geq 3$ of alternatives. In particular we show that for a social choice function $f$ on $k \geq 3$ alternatives and $n$ voters, which is $ε$-far from the family of nonmanipulable functions, a uniformly chosen voter profile is manipulable with probability at least inverse polynomial in $n$, $k$, and $ε^{-1}$. Removing the neutrality assumption of previous theorems is important for multiple reasons. For one, it is known that there is a conflict between anonymity and neutrality, and since most common voting rules are anonymous, they cannot always be neutral. Second, virtual elections are used in many applications in artificial intelligence, where there are often restrictions on the outcome of the election, and so neutrality is not a natural assumption in these situations. Ours is a unified proof which in particular covers all previous cases established before. The proof crucially uses reverse hypercontractivity in addition to several ideas from the two previous proofs. Much of the work is devoted to understanding functions of a single voter, and in particular we also prove a quantitative Gibbard-Satterthwaite theorem for one voter.

preprint2012arXiv

Asymptotic Learning on Bayesian Social Networks

Understanding information exchange and aggregation on networks is a central problem in theoretical economics, probability and statistics. We study a standard model of economic agents on the nodes of a social network graph who learn a binary "state of the world" S, from initial signals, by repeatedly observing each other's best guesses. Asymptotic learning is said to occur on a family of graphs G_n = (V_n, E_n), with |V_n| tending to infinity, if with probability tending to 1 as n tends to infinity all agents in G_n eventually estimate S correctly. We identify sufficient conditions for asymptotic learning and contruct examples where learning does not occur when the conditions do not hold.

preprint2012arXiv

Bundling Customers: How to Exploit Trust Among Customers to Maximize Seller Profit

We consider an auction of identical digital goods to customers whose valuations are drawn independently from known distributions. Myerson's classic result identifies the truthful mechanism that maximizes the seller's expected profit. Under the assumption that in small groups customers can learn each others' valuations, we show how Myerson's result can be improved to yield a higher payoff to the seller using a mechanism that offers groups of customers to buy bundles of items.

preprint2012arXiv

Connectivity and equilibrium in random games

We study how the structure of the interaction graph of a game affects the existence of pure Nash equilibria. In particular, for a fixed interaction graph, we are interested in whether there are pure Nash equilibria arising when random utility tables are assigned to the players. We provide conditions for the structure of the graph under which equilibria are likely to exist and complementary conditions which make the existence of equilibria highly unlikely. Our results have immediate implications for many deterministic graphs and generalize known results for random games on the complete graph. In particular, our results imply that the probability that bounded degree graphs have pure Nash equilibria is exponentially small in the size of the graph and yield a simple algorithm that finds small nonexistence certificates for a large family of graphs. Then we show that in any strongly connected graph of n vertices with expansion $(1+Ω(1))\log_2(n)$ the distribution of the number of equilibria approaches the Poisson distribution with parameter 1, asymptotically as $n \to +\infty$.

preprint2012arXiv

Exit time tails from pairwise decorrelation in hidden Markov chains, with applications to dynamical percolation

Consider a Markov process ω_t at equilibrium and some event C (a subset of the state-space of the process). A natural measure of correlations in the process is the pairwise correlation \Pr[ω_0,ω_t \in C] - \Pr[ω_0 \in C]^2. A second natural measure is the probability of the continual occurrence event \{ω_s \in C, \forall s\in [0,t]\}. We show that for reversible Markov chains, and any event C, pairwise decorrelation of the event C implies a decay of the probability of the continual occurrence event \{ω_s \in C, \forall s \in [0,t]\} as t\to\infty. We provide examples showing that our results are often sharp. Our main applications are to dynamical critical percolation. Let C be the left-right crossing event of a large box, and let us scale time so that the expected number of changes to C is order 1 in unit time. We show that the continual connection event has superpolynomial decay. Furthermore, on the infinite lattice without any time scaling, the first exceptional time with an infinite cluster appears with an exponential tail.

preprint2012arXiv

From Agreement to Asymptotic Learning

We consider a group of Bayesian agents who are each given an independent signal about an unknown state of the world, and proceed to communicate with each other. We study the question of asymptotic learning: do agents learn the state of the world with probability that approaches one as the number of agents tends to infinity? We show that under general conditions asymptotic learning follows from agreement on posterior actions or posterior beliefs, regardless of the communication dynamics. In particular, we prove that asymptotic learning holds for the Gale-Kariv model on undirected networks and non-atomic private beliefs.

preprint2012arXiv

Geometric influences

We present a new definition of influences in product spaces of continuous distributions. Our definition is geometric, and for monotone sets it is identical with the measure of the boundary with respect to uniform enlargement. We prove analogs of the Kahn-Kalai-Linial (KKL) and Talagrand's influence sum bounds for the new definition. We further prove an analog of a result of Friedgut showing that sets with small "influence sum" are essentially determined by a small number of coordinates. In particular, we establish the following tight analog of the KKL bound: for any set in $\mathbb{R}^n$ of Gaussian measure $t$, there exists a coordinate $i$ such that the $i$th geometric influence of the set is at least $ct(1-t)\sqrt{\log n}/n$, where $c$ is a universal constant. This result is then used to obtain an isoperimetric inequality for the Gaussian measure on $\mathbb{R}^n$ and the class of sets invariant under transitive permutation group of the coordinates.

preprint2012arXiv

Geometric Influences II: Correlation Inequalities and Noise Sensitivity

In a recent paper, we presented a new definition of influences in product spaces of continuous distributions, and showed that analogues of the most fundamental results on discrete influences, such as the KKL theorem, hold for the new definition in Gaussian space. In this paper we prove Gaussian analogues of two of the central applications of influences: Talagrand's lower bound on the correlation of increasing subsets of the discrete cube, and the Benjamini-Kalai-Schramm (BKS) noise sensitivity theorem. We then use the Gaussian results to obtain analogues of Talagrand's bound for all discrete probability spaces and to reestablish analogues of the BKS theorem for biased two-point product spaces.

preprint2012arXiv

Majority Dynamics and Aggregation of Information in Social Networks

Consider n individuals who, by popular vote, choose among q >= 2 alternatives, one of which is "better" than the others. Assume that each individual votes independently at random, and that the probability of voting for the better alternative is larger than the probability of voting for any other. It follows from the law of large numbers that a plurality vote among the n individuals would result in the correct outcome, with probability approaching one exponentially quickly as n tends to infinity. Our interest in this paper is in a variant of the process above where, after forming their initial opinions, the voters update their decisions based on some interaction with their neighbors in a social network. Our main example is "majority dynamics", in which each voter adopts the most popular opinion among its friends. The interaction repeats for some number of rounds and is then followed by a population-wide plurality vote. The question we tackle is that of "efficient aggregation of information": in which cases is the better alternative chosen with probability approaching one as n tends to infinity? Conversely, for which sequences of growing graphs does aggregation fail, so that the wrong alternative gets chosen with probability bounded away from zero? We construct a family of examples in which interaction prevents efficient aggregation of information, and give a condition on the social network which ensures that aggregation occurs. For the case of majority dynamics we also investigate the question of unanimity in the limit. In particular, if the voters' social network is an expander graph, we show that if the initial population is sufficiently biased towards a particular alternative then that alternative will eventually become the unanimous preference of the entire population.

preprint2012arXiv

Majority is Stablest : Discrete and SoS

The Majority is Stablest Theorem has numerous applications in hardness of approximation and social choice theory. We give a new proof of the Majority is Stablest Theorem by induction on the dimension of the discrete cube. Unlike the previous proof, it uses neither the "invariance principle" nor Borell's result in Gaussian space. The new proof is general enough to include all previous variants of majority is stablest such as "it ain't over until it's over" and "Majority is most predictable". Moreover, the new proof allows us to derive a proof of Majority is Stablest in a constant level of the Sum of Squares hierarchy.This implies in particular that Khot-Vishnoi instance of Max-Cut does not provide a gap instance for the Lasserre hierarchy.

preprint2012arXiv

On extracting common random bits from correlated sources on large alphabets

Suppose Alice and Bob receive strings $X=(X_1,...,X_n)$ and $Y=(Y_1,...,Y_n)$ each uniformly random in $[s]^n$ but so that $X$ and $Y$ are correlated . For each symbol $i$, we have that $Y_i = X_i$ with probability $1-\eps$ and otherwise $Y_i$ is chosen independently and uniformly from $[s]$. Alice and Bob wish to use their respective strings to extract a uniformly chosen common sequence from $[s]^k$ but without communicating. How well can they do? The trivial strategy of outputting the first $k$ symbols yields an agreement probability of $(1 - \eps + \eps/s)^k$. In a recent work by Bogdanov and Mossel it was shown that in the binary case where $s=2$ and $k = k(\eps)$ is large enough then it is possible to extract $k$ bits with a better agreement probability rate. In particular, it is possible to achieve agreement probability $(k\eps)^{-1/2} \cdot 2^{-k\eps/(2(1 - \eps/2))}$ using a random construction based on Hamming balls, and this is optimal up to lower order terms. In the current paper we consider the same problem over larger alphabet sizes $s$ and we show that the agreement probability rate changes dramatically as the alphabet grows. In particular we show no strategy can achieve agreement probability better than $(1-\eps)^k (1+δ(s))^k$ where $δ(s) \to 0$ as $s \to \infty$. We also show that Hamming ball based constructions have {\em much lower} agreement probability rate than the trivial algorithm as $s \to \infty$. Our proofs and results are intimately related to subtle properties of hypercontractive inequalities.

preprint2012arXiv

On reverse hypercontractivity

We study the notion of reverse hypercontractivity. We show that reverse hypercontractive inequalities are implied by standard hypercontractive inequalities as well as by the modified log-Sobolev inequality. Our proof is based on a new comparison lemma for Dirichlet forms and an extension of the Strook-Varapolos inequality. A consequence of our analysis is that {\em all} simple operators $L=Id-\E$ as well as their tensors satisfy uniform reverse hypercontractive inequalities. That is, for all $q<p<1$ and every positive valued function $f$ for $t \geq \log \frac{1-q}{1-p}$ we have $\| e^{-tL}f\|_{q} \geq \| f\|_{p}$. This should be contrasted with the case of hypercontractive inequalities for simple operators where $t$ is known to depend not only on $p$ and $q$ but also on the underlying space. The new reverse hypercontractive inequalities established here imply new mixing and isoperimetric results for short random walks in product spaces, for certain card-shufflings, for Glauber dynamics in high-temperatures spin systems as well as for queueing processes. The inequalities further imply a quantitative Arrow impossibility theorem for general product distributions and inverse polynomial bounds in the number of players for the non-interactive correlation distillation problem with $m$-sided dice.

preprint2012arXiv

Phylogenetic mixtures: Concentration of measure in the large-tree limit

The reconstruction of phylogenies from DNA or protein sequences is a major task of computational evolutionary biology. Common phenomena, notably variations in mutation rates across genomes and incongruences between gene lineage histories, often make it necessary to model molecular data as originating from a mixture of phylogenies. Such mixed models play an increasingly important role in practice. Using concentration of measure techniques, we show that mixtures of large trees are typically identifiable. We also derive sequence-length requirements for high-probability reconstruction.

preprint2012arXiv

Stochastic Block Models and Reconstruction

The planted partition model (also known as the stochastic blockmodel) is a classical cluster-exhibiting random graph model that has been extensively studied in statistics, physics, and computer science. In its simplest form, the planted partition model is a model for random graphs on $n$ nodes with two equal-sized clusters, with an between-class edge probability of $q$ and a within-class edge probability of $p$. Although most of the literature on this model has focused on the case of increasing degrees (ie.\ $pn, qn \to \infty$ as $n \to \infty$), the sparse case $p, q = O(1/n)$ is interesting both from a mathematical and an applied point of view. A striking conjecture of Decelle, Krzkala, Moore and Zdeborová based on deep, non-rigorous ideas from statistical physics gave a precise prediction for the algorithmic threshold of clustering in the sparse planted partition model. In particular, if $p = a/n$ and $q = b/n$, then Decelle et al.\ conjectured that it is possible to cluster in a way correlated with the true partition if $(a - b)^2 > 2(a + b)$, and impossible if $(a - b)^2 < 2(a + b)$. By comparison, the best-known rigorous result is that of Coja-Oghlan, who showed that clustering is possible if $(a - b)^2 > C (a + b)$ for some sufficiently large $C$. We prove half of their prediction, showing that it is indeed impossible to cluster if $(a - b)^2 < 2(a + b)$. Furthermore we show that it is impossible even to estimate the model parameters from the graph when $(a - b)^2 < 2(a + b)$; on the other hand, we provide a simple and efficient algorithm for estimating $a$ and $b$ when $(a - b)^2 > 2(a + b)$. Following Decelle et al, our work establishes a rigorous connection between the clustering problem, spin-glass models on the Bethe lattice and the so called reconstruction problem. This connection points to fascinating applications and open problems.

preprint2011arXiv

A Note on the Entropy/Influence Conjecture

The entropy/influence conjecture, raised by Friedgut and Kalai in 1996, seeks to relate two different measures of concentration of the Fourier coefficients of a Boolean function. Roughly saying, it claims that if the Fourier spectrum is "smeared out", then the Fourier coefficients are concentrated on "high" levels. In this note we generalize the conjecture to biased product measures on the discrete cube, and prove a variant of the conjecture for functions with an extremely low Fourier weight on the "high" levels.

preprint2011arXiv

Identifiability and inference of non-parametric rates-across-sites models on large-scale phylogenies

Mutation rate variation across loci is well known to cause difficulties, notably identifiability issues, in the reconstruction of evolutionary trees from molecular sequences. Here we introduce a new approach for estimating general rates-across-sites models. Our results imply, in particular, that large phylogenies are typically identifiable under rate variation. We also derive sequence-length requirements for high-probability reconstruction. Our main contribution is a novel algorithm that clusters sites according to their mutation rate. Following this site clustering step, standard reconstruction techniques can be used to recover the phylogeny. Our results rely on a basic insight: that, for large trees, certain site statistics experience concentration-of-measure phenomena.

preprint2011arXiv

On extracting common random bits from correlated sources

Suppose Alice and Bob receive strings of unbiased independent but noisy bits from some random source. They wish to use their respective strings to extract a common sequence of random bits with high probability but without communicating. How many such bits can they extract? The trivial strategy of outputting the first $k$ bits yields an agreement probability of $(1 - \eps)^k < 2^{-1.44k\eps}$, where $\eps$ is the amount of noise. We show that no strategy can achieve agreement probability better than $2^{-k\eps/(1 - \eps)}$. On the other hand, we show that when $k \geq 10 + 2 (1 - \eps) / \eps$, there exists a strategy which achieves an agreement probability of $0.1 (k\eps)^{-1/2} \cdot 2^{-k\eps/(1 - \eps)}$.

preprint2011arXiv

Robust estimation of latent tree graphical models: Inferring hidden states with inexact parameters

Latent tree graphical models are widely used in computational biology, signal and image processing, and network tomography. Here we design a new efficient, estimation procedure for latent tree models, including Gaussian and discrete, reversible models, that significantly improves on previous sample requirement bounds. Our techniques are based on a new hidden state estimator which is robust to inaccuracies in estimated parameters. More precisely, we prove that latent tree models can be estimated with high probability in the so-called Kesten-Stigum regime with $O(log^2 n)$ samples where $n$ is the number of nodes.

preprint2011arXiv

VC bounds on the cardinality of nearly orthogonal function classes

We bound the number of nearly orthogonal vectors with fixed VC-dimension over $\setpm^n$. Our bounds are of interest in machine learning and empirical process theory and improve previous bounds by Haussler. The bounds are based on a simple projection argument and the generalize to other product spaces. Along the way we derive tight bounds on the sum of binomial coefficients in terms of the entropy function.

preprint2010arXiv

Co-evolution is Incompatible with the Markov Assumption in Phylogenetics

Markov models are extensively used in the analysis of molecular evolution. A recent line of research suggests that pairs of proteins with functional and physical interactions co-evolve with each other. Here, by analyzing hundreds of orthologous sets of three fungi and their co-evolutionary relations, we demonstrate that co-evolutionary assumption may violate the Markov assumption. Our results encourage developing alternative probabilistic models for the cases of extreme co-evolution.

preprint2010arXiv

Complete convergence of message passing algorithms for some satisfiability problems

In this paper we analyze the performance of Warning Propagation, a popular message passing algorithm. We show that for 3CNF formulas drawn from a certain distribution over random satisfiable 3CNF formulas, commonly referred to as the planted-assignment distribution, running Warning Propagation in the standard way (run message passing until convergence, simplify the formula according to the resulting assignment, and satisfy the remaining subformula, if necessary, using a simple "off the shelf" heuristic) results in a satisfying assignment when the clause-variable ratio is a sufficiently large constant.

preprint2010arXiv

Noise Correlation Bounds for Uniform Low Degree Functions

We study correlation bounds under pairwise independent distributions for functions with no large Fourier coefficients. Functions in which all Fourier coefficients are bounded by $δ$ are called $δ$-{\em uniform}. The search for such bounds is motivated by their potential applicability to hardness of approximation, derandomization, and additive combinatorics. In our main result we show that $\E[f_1(X_1^1,...,X_1^n) ... f_k(X_k^1,...,X_k^n)]$ is close to 0 under the following assumptions: 1. The vectors $\{(X_1^j,...,X_k^j) : 1 \leq j \leq n\}$ are i.i.d, and for each $j$ the vector $(X_1^j,...,X_k^j)$ has a pairwise independent distribution. 2. The functions $f_i$ are uniform. 3. The functions $f_i$ are of low degree. We compare our result with recent results by the second author for low influence functions and to recent results in additive combinatorics using the Gowers norm. Our proofs extend some techniques from the theory of hypercontractivity to a multilinear setup.

preprint2010arXiv

On the inference of large phylogenies with long branches: How long is too long?

Recent work has highlighted deep connections between sequence-length requirements for high-probability phylogeny reconstruction and the related problem of the estimation of ancestral sequences. In [Daskalakis et al.'09], building on the work of [Mossel'04], a tight sequence-length requirement was obtained for the CFN model. In particular the required sequence length for high-probability reconstruction was shown to undergo a sharp transition (from $O(\log n)$ to $\hbox{poly}(n)$, where $n$ is the number of leaves) at the "critical" branch length $\critmlq$ (if it exists) of the ancestral reconstruction problem. Here we consider the GTR model. For this model, recent results of [Roch'09] show that the tree can be accurately reconstructed with sequences of length $O(\log(n))$ when the branch lengths are below $\critksq$, known as the Kesten-Stigum (KS) bound. Although for the CFN model $\critmlq = \critksq$, it is known that for the more general GTR models one has $\critmlq \geq \critksq$ with a strict inequality in many cases. Here, we show that this phenomenon also holds for phylogenetic reconstruction by exhibiting a family of symmetric models $Q$ and a phylogenetic reconstruction algorithm which recovers the tree from $O(\log n)$-length sequences for some branch lengths in the range $(\critksq,\critmlq)$. Second we prove that phylogenetic reconstruction under GTR models requires a polynomial sequence-length for branch lengths above $\critmlq$.

preprint2010arXiv

Reconstruction of Markov Random Fields from Samples: Some Easy Observations and Algorithms

Markov random fields are used to model high dimensional distributions in a number of applied areas. Much recent interest has been devoted to the reconstruction of the dependency structure from independent samples from the Markov random fields. We analyze a simple algorithm for reconstructing the underlying graph defining a Markov random field on $n$ nodes and maximum degree $d$ given observations. We show that under mild non-degeneracy conditions it reconstructs the generating graph with high probability using $Θ(d ε^{-2}δ^{-4} \log n)$ samples where $ε,δ$ depend on the local interactions. For most local interaction $\eps,δ$ are of order $\exp(-O(d))$. Our results are optimal as a function of $n$ up to a multiplicative constant depending on $d$ and the strength of the local interactions. Our results seem to be the first results for general models that guarantee that {\em the} generating model is reconstructed. Furthermore, we provide explicit $O(n^{d+2} ε^{-2}δ^{-4} \log n)$ running time bound. In cases where the measure on the graph has correlation decay, the running time is $O(n^2 \log n)$ for all fixed $d$. We also discuss the effect of observing noisy samples and show that as long as the noise level is low, our algorithm is effective. On the other hand, we construct an example where large noise implies non-identifiability even for generic noise and interactions. Finally, we briefly show that in some simple cases, models with hidden nodes can also be recovered.

preprint2010arXiv

Scaling Limits for Width Two Partially Ordered Sets: The Incomparability Window

We study the structure of a uniformly randomly chosen partial order of width 2 on n elements. We show that under the appropriate scaling, the number of incomparable elements converges to the height of a one dimensional Brownian excursion at a uniformly chosen random time in the interval [0,1], which follows the Rayleigh distribution.

preprint2010arXiv

Sharp Thresholds for Monotone Non Boolean Functions and Social Choice Theory

A key fact in the theory of Boolean functions $f : \{0,1\}^n \to \{0,1\}$ is that they often undergo sharp thresholds. For example: if the function $f : \{0,1\}^n \to \{0,1\}$ is monotone and symmetric under a transitive action with $\E_p[f] = \eps$ and $\E_q[f] = 1-\eps$ then $q-p \to 0$ as $n \to \infty$. Here $\E_p$ denotes the product probability measure on $\{0,1\}^n$ where each coordinate takes the value $1$ independently with probability $p$. The fact that symmetric functions undergo sharp thresholds is important in the study of random graphs and constraint satisfaction problems as well as in social choice.In this paper we prove sharp thresholds for monotone functions taking values in an arbitrary finite sets. We also provide examples of applications of the results to social choice and to random graph problems. Among the applications is an analog for Condorcet's jury theorem and an indeterminacy result for a large class of social choice functions.

preprint2010arXiv

The Computational Complexity of Estimating Convergence Time

An important problem in the implementation of Markov Chain Monte Carlo algorithms is to determine the convergence time, or the number of iterations before the chain is close to stationarity. For many Markov chains used in practice this time is not known. Even in cases where the convergence time is known to be polynomial, the theoretical bounds are often too crude to be practical. Thus, practitioners like to carry out some form of statistical analysis in order to assess convergence. This has led to the development of a number of methods known as convergence diagnostics which attempt to diagnose whether the Markov chain is far from stationarity. We study the problem of testing convergence in the following settings and prove that the problem is hard in a computational sense: Given a Markov chain that mixes rapidly, it is hard for Statistical Zero Knowledge (SZK-hard) to distinguish whether starting from a given state, the chain is close to stationarity by time t or far from stationarity at time ct for a constant c. We show the problem is in AM intersect coAM. Second, given a Markov chain that mixes rapidly it is coNP-hard to distinguish whether it is close to stationarity by time t or far from stationarity at time ct for a constant c. The problem is in coAM. Finally, it is PSPACE-complete to distinguish whether the Markov chain is close to stationarity by time t or far from being mixed at time ct for c at least 1.

preprint2010arXiv

The Geometry of Manipulation - a Quantitative Proof of the Gibbard Satterthwaite Theorem

We prove a quantitative version of the Gibbard-Satterthwaite theorem. We show that a uniformly chosen voter profile for a neutral social choice function f of $q \ge 4$ alternatives and n voters will be manipulable with probability at least $10^{-4} \eps^2 n^{-3} q^{-30}$, where $\eps$ is the minimal statistical distance between f and the family of dictator functions. Our results extend those of FrKaNi:08, which were obtained for the case of 3 alternatives, and imply that the approach of masking manipulations behind computational hardness (as considered in BarthOrline:91, ConitzerS03b, ElkindL05, ProcacciaR06 and ConitzerS06) cannot hide manipulations completely. Our proof is geometric. More specifically it extends the method of canonical paths to show that the measure of the profiles that lie on the interface of 3 or more outcomes is large. To the best of our knowledge our result is the first isoperimetric result to establish interface of more than two bodies.

preprint2010arXiv

Truthful Fair Division

We address the problem of fair division, or cake cutting, with the goal of finding truthful mechanisms. In the case of a general measure space ("cake") and non-atomic, additive individual preference measures - or utilities - we show that there exists a truthful "mechanism" which ensures that each of the k players gets at least 1/k of the cake. This mechanism also minimizes risk for truthful players. Furthermore, in the case where there exist at least two different measures we present a different truthful mechanism which ensures that each of the players gets more than 1/k of the cake. We then turn our attention to partitions of indivisible goods with bounded utilities and a large number of goods. Here we provide similar mechanisms, but with slightly weaker guarantees. These guarantees converge to those obtained in the non-atomic case as the number of goods goes to infinity.

preprint2009arXiv

Evolutionary Trees and the Ising Model on the Bethe Lattice: a Proof of Steel's Conjecture

A major task of evolutionary biology is the reconstruction of phylogenetic trees from molecular data. The evolutionary model is given by a Markov chain on a tree. Given samples from the leaves of the Markov chain, the goal is to reconstruct the leaf-labelled tree. It is well known that in order to reconstruct a tree on $n$ leaves, sample sequences of length $Ω(\log n)$ are needed. It was conjectured by M. Steel that for the CFN/Ising evolutionary model, if the mutation probability on all edges of the tree is less than $p^{\ast} = (\sqrt{2}-1)/2^{3/2}$, then the tree can be recovered from sequences of length $O(\log n)$. The value $p^{\ast}$ is given by the transition point for the extremality of the free Gibbs measure for the Ising model on the binary tree. Steel's conjecture was proven by the second author in the special case where the tree is "balanced." The second author also proved that if all edges have mutation probability larger than $p^{\ast}$ then the length needed is $n^{Ω(1)}$. Here we show that Steel's conjecture holds true for general trees by giving a reconstruction algorithm that recovers the tree from $O(\log n)$-length sequences when the mutation probabilities are discretized and less than $p^\ast$. Our proof and results demonstrate that extremality of the free Gibbs measure on the infinite binary tree, which has been studied before in probability, statistical physics and computer science, determines how distinguishable are Gibbs measures on finite binary trees.

preprint2009arXiv

Iterative Maximum Likelihood on Networks

We consider n agents located on the vertices of a connected graph. Each agent v receives a signal X_v(0)~N(s, 1) where s is an unknown quantity. A natural iterative way of estimating s is to perform the following procedure. At iteration t + 1 let X_v(t + 1) be the average of X_v(t) and of X_w(t) among all the neighbors w of v. In this paper we consider a variant of simple iterative averaging, which models "greedy" behavior of the agents. At iteration t, each agent v declares the value of its estimator X_v(t) to all of its neighbors. Then, it updates X_v(t + 1) by taking the maximum likelihood (or minimum variance) estimator of s, given X_v(t) and X_w(t) for all neighbors w of v, and the structure of the graph. We give an explicit efficient procedure for calculating X_v(t), study the convergence of the process as t goes to infinity and show that if the limit exists then it is the same for all v and w. For graphs that are symmetric under actions of transitive groups, we show that the process is efficient. Finally, we show that the greedy process is in some cases more efficient than simple averaging, while in other cases the converse is true, so that, in this model, "greed" of the individual agents may or may not have an adverse affect on the outcome. The model discussed here may be viewed as the Maximum-Likelihood version of models studied in Bayesian Economics. The ML variant is more accessible and allows in particular to show the significance of symmetry in the efficiency of estimators using networks of agents.

preprint2009arXiv

On the Submodularity of Influence in Social Networks

We prove and extend a conjecture of Kempe, Kleinberg, and Tardos (KKT) on the spread of influence in social networks. A social network can be represented by a directed graph where the nodes are individuals and the edges indicate a form of social relationship. A simple way to model the diffusion of ideas, innovative behavior, or ``word-of-mouth'' effects on such a graph is to consider an increasing process of ``infected'' (or active) nodes: each node becomes infected once an activation function of the set of its infected neighbors crosses a certain threshold value. Such a model was introduced by KKT in \cite{KeKlTa:03,KeKlTa:05} where the authors also impose several natural assumptions: the threshold values are (uniformly) random; and the activation functions are monotone and submodular. For an initial set of active nodes $S$, let $σ(S)$ denote the expected number of active nodes at termination. Here we prove a conjecture of KKT: we show that the function $σ(S)$ is submodular under the assumptions above. We prove the same result for the expected value of any monotone, submodular function of the set of active nodes at termination.

preprint2009arXiv

Phylogenies without Branch Bounds: Contracting the Short, Pruning the Deep

We introduce a new phylogenetic reconstruction algorithm which, unlike most previous rigorous inference techniques, does not rely on assumptions regarding the branch lengths or the depth of the tree. The algorithm returns a forest which is guaranteed to contain all edges that are: 1) sufficiently long and 2) sufficiently close to the leaves. How much of the true tree is recovered depends on the sequence length provided. The algorithm is distance-based and runs in polynomial time.

preprint2007arXiv

Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci

We introduce a simple algorithm for reconstructing phylogenies from multiple gene trees in the presence of incomplete lineage sorting, that is, when the topology of the gene trees may differ from that of the species tree. We show that our technique is statistically consistent under standard stochastic assumptions, that is, it returns the correct tree given sufficiently many unlinked loci. We also show that it can tolerate moderate estimation errors.

preprint2006arXiv

Learning nonsingular phylogenies and hidden Markov models

In this paper we study the problem of learning phylogenies and hidden Markov models. We call a Markov model nonsingular if all transition matrices have determinants bounded away from 0 (and 1). We highlight the role of the nonsingularity condition for the learning problem. Learning hidden Markov models without the nonsingularity condition is at least as hard as learning parity with noise, a well-known learning problem conjectured to be computationally hard. On the other hand, we give a polynomial-time algorithm for learning nonsingular phylogenies and hidden Markov models.

preprint2006arXiv

The Kesten-Stigum Reconstruction Bound Is Tight for Roughly Symmetric Binary Channels

We establish the exact threshold for the reconstruction problem for a binary asymmetric channel on the b-ary tree, provided that the asymmetry is sufficiently small. This is the first exact reconstruction threshold obtained in roughly a decade. We discuss the implications of our result for Glauber dynamics, phylogenetic reconstruction, and so-called ``replica symmetry breaking'' in spin glasses and random satisfiability problems.

preprint2004arXiv

Glauber Dynamics on Trees and Hyperbolic Graphs

We study continuous time Glauber dynamics for random configurations with local constraints (e.g. proper coloring, Ising and Potts models) on finite graphs with $n$ vertices and of bounded degree. We show that the relaxation time (defined as the reciprocal of the spectral gap $|λ_1-λ_2|$) for the dynamics on trees and on planar hyperbolic graphs, is polynomial in $n$. For these hyperbolic graphs, this yields a general polynomial sampling algorithm for random configurations. We then show that if the relaxation time $τ_2$ satisfies $τ_2=O(1)$, then the correlation coefficient, and the mutual information, between any local function (which depends only on the configuration in a fixed window) and the boundary conditions, decays exponentially in the distance between the window and the boundary. For the Ising model on a regular tree, this condition is sharp.

preprint2004arXiv

How much can evolved characters tell us about the tree that generated them?

In this paper we review some recent results that shed light on a fundamental question in molecular systematics: how much phylogenetic `signal' can we expect from characters that have evolved under some Markov process? There are many sides to this question and we begin by describing some explicit bounds on the probability of correctly reconstructing an ancestral state from the states observed at the tips. We show how this bound sets upper limits on the probability of tree reconstruction from aligned sequences, and we provide some new extensions that allow site-to-site rate variation or a covarion mechanism. We then explore the relationship between the number of sites required for accurate tree reconstruction and other model parameters - such as the number of species, and substitution probabilities, and we describe a phase transition that occurs when substitution probabilities exceed a critical value. In the remainder of this paper we turn to models of character evolution where the state space is assumed to be either infinite or very large. These models have some relevance to certain types of genomic data (such as gene order) and here we again investigate how many characters are required for accurate tree reconstruction.

Elchanan Mossel

What is connected

Connect this record

See the researcher in context

Building this map preview

92 published item(s)

A Hierarchical Language Model with Predictable Scaling Laws and Provable Benefits of Reasoning

A Theory of Online Learning with Autoregressive Chain-of-Thought Reasoning

Detecting Mutual Excitations in Non-Stationary Hawkes Processes

The Benefits of Temporal Correlations: SGD Learns k-Juntas from Random Walks Efficiently

Influence Maximization in Ising Models

Stable matchings with correlated Preferences

Almost-Linear Planted Cliques Elude the Metropolis Process

Inference in Opinion Dynamics under Social Pressure

On the Second Kahn--Kalai Conjecture

Seeding with Costly Network Information

Shotgun Assembly of Erdos-Renyi Random Graphs

Shotgun assembly of labeled graphs

Spoofing Generalization: When Can't You Trust Proprietary Models?

Learning to Sample from Censored Markov Random Fields

Robust testing of low-dimensional functions

Broadcasting on Random Directed Acyclic Graphs

Consistency Thresholds for the Planted Bisection Model

Distributed Corruption Detection in Networks

Efficient Reconstruction of Stochastic Pedigrees

Rational Groupthink

Bayesian Decision Making in Groups is Hard

How Many Subpopulations is Too Many? Exponential Lower Bounds for Inferring Population Histories

Social learning equilibria

Belief propagation, robust reconstruction and optimal recovery of block models

Density Evolution in the Degree-correlated Stochastic Block Model

Invariance principle on the slice

Linear Sketching over $\mathbb F_2$

Noise Stability and Correlation with Half Spaces

Sequence assembly from corrupted shotgun reads

Shotgun Assembly of Random Jigsaw Puzzles

A Proof Of The Block Model Threshold Conjecture

Coexistence in preferential attachment networks

Local Algorithms for Block Models with Side Information

MCMC Learning

On the Correlation of Increasing Families

On the Impossibility of Learning the Missing Mass

Quickest Online Selection of an Increasing Subsequence of Specified Size

Robust dimension free isoperimetry in Gaussian space

Strong Contraction and Influences in Tail Spaces

A Statistical Test for Clades in Phylogenies

Can one hear the shape of a population history?

Competing first passage percolation on random regular graphs

From trees to seeds: on the inference of the seed from large trees in the uniform attachment model

Global and Local Information in Clustering Labeled Block Models

Majority rule has transition ratio 4 on Yule trees under a 2-state symmetric model

On the influence of the seed graph in the preferential attachment model

Standard Simplices and Pluralities are Not the Most Noise Stable

A Smooth Transition from Powerlessness to Absolute Power

Computation in anonymous networks

Exact thresholds for Ising-Gibbs samplers on general graphs

Explicit Optimal Hardness via Gaussian stability results

Mixing under monotone censoring

Robust Optimality of Gaussian Noise Stability

Spectral redemption: clustering sparse networks

A quantitative Gibbard-Satterthwaite theorem without neutrality

Asymptotic Learning on Bayesian Social Networks

Bundling Customers: How to Exploit Trust Among Customers to Maximize Seller Profit

Connectivity and equilibrium in random games

Exit time tails from pairwise decorrelation in hidden Markov chains, with applications to dynamical percolation

From Agreement to Asymptotic Learning

Geometric influences

Geometric Influences II: Correlation Inequalities and Noise Sensitivity

Majority Dynamics and Aggregation of Information in Social Networks

Majority is Stablest : Discrete and SoS

On extracting common random bits from correlated sources on large alphabets

On reverse hypercontractivity

Phylogenetic mixtures: Concentration of measure in the large-tree limit

Stochastic Block Models and Reconstruction

A Note on the Entropy/Influence Conjecture

Identifiability and inference of non-parametric rates-across-sites models on large-scale phylogenies

On extracting common random bits from correlated sources

Robust estimation of latent tree graphical models: Inferring hidden states with inexact parameters

VC bounds on the cardinality of nearly orthogonal function classes

Co-evolution is Incompatible with the Markov Assumption in Phylogenetics