Source author record

Cristopher Moore

Cristopher Moore appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

64works

40topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Belief propagation for permutations, rankings, and partial orders

Many datasets give partial information about an ordering or ranking by indicating which team won a game, which item a user prefers, or who infected whom. We define a continuous spin system whose Gibbs distribution is the posterior distribution on permutations, given a probabilistic model of these interactions. Using the cavity method we derive a belief propagation algorithm that computes the marginal distribution of each node's position. In addition, the Bethe free energy lets us approximate the number of linear extensions of a partial order and perform model selection between competing probabilistic models, such as the Bradley-Terry-Luce model of noisy comparisons and its cousins.

preprint2022arXiv

Effective Resistance for Pandemics: Mobility Network Sparsification for High-Fidelity Epidemic Simulation

Network science has increasingly become central to the field of epidemiology and our ability to respond to infectious disease threats. However, many networks derived from modern datasets are not just large, but dense, with a high ratio of edges to nodes. This includes human mobility networks where most locations have a large number of links to many other locations. Simulating large-scale epidemics requires substantial computational resources and in many cases is practically infeasible. One way to reduce the computational cost of simulating epidemics on these networks is sparsification, where a representative subset of edges is selected based on some measure of their importance. We test several sparsification strategies, ranging from naive thresholding to random sampling of edges, on mobility data from the U.S. Following recent work in computer science, we find that the most accurate approach uses the effective resistances of edges, which prioritizes edges that are the only efficient way to travel between their endpoints. The resulting sparse network preserves many aspects of the behavior of an SIR model, including both global quantities, like the epidemic size, and local details of stochastic events, including the probability each node becomes infected and its distribution of arrival times. This holds even when the sparse network preserves fewer than $10\%$ of the edges of the original network. In addition to its practical utility, this method helps illuminate which links of a weighted, undirected network are most important to disease spread.

preprint2022arXiv

Reconstruction of Random Geometric Graphs: Breaking the Omega(r) distortion barrier

Embedding graphs in a geographical or latent space, i.e.\ inferring locations for vertices in Euclidean space or on a smooth manifold or submanifold, is a common task in network analysis, statistical inference, and graph visualization. We consider the classic model of random geometric graphs where $n$ points are scattered uniformly in a square of area $n$, and two points have an edge between them if and only if their Euclidean distance is less than $r$. The reconstruction problem then consists of inferring the vertex positions, up to the symmetries of the square, given only the adjacency matrix of the resulting graph. We give an algorithm that, if $r=n^α$ for any $α> 0$, with high probability reconstructs the vertex positions with a maximum error of $O(n^β)$ where $β=1/2-(4/3)α$, until $α\ge 3/8$ where $β=0$ and the error becomes $O(\sqrt{\log n})$. This improves over earlier results, which were unable to reconstruct with error less than $r$. Our method estimates Euclidean distances using a hybrid of graph distances and short-range estimates based on the number of common neighbors. We extend our results to the surface of the sphere in $\R^3$ and to hypercubes in any constant fixed dimension. Additionally we examine the extent to which reconstruction is still possible when the original adjacency lists have had a subset of the edges independently deleted at random.

preprint2022arXiv

The role of directionality, heterogeneity and correlations in epidemic risk and spread

Most models of epidemic spread, including many designed specifically for COVID-19, implicitly assume mass-action contact patterns and undirected contact networks, meaning that the individuals most likely to spread the disease are also the most at risk to receive it from others. Here, we review results from the theory of random directed graphs which show that many important quantities, including the reproduction number and the epidemic size, depend sensitively on the joint distribution of in- and out-degrees ("risk" and "spread"), including their heterogeneity and the correlation between them. By considering joint distributions of various kinds, we elucidate why some types of heterogeneity cause a deviation from the standard Kermack-McKendrick analysis of SIR models, i.e., so-called mass-action models where contacts are homogeneous and random, and some do not. We also show that some structured SIR models informed by realistic complex contact patterns among types of individuals (age or activity) are simply mixtures of Poisson processes and tend not to deviate significantly from the simplest mass-action model. Finally, we point out some possible policy implications of this directed structure, both for contact tracing strategy and for interventions designed to prevent superspreading events. In particular, directed graphs have a forward and backward version of the classic "friendship paradox" -- forward edges tend to lead to individuals with high risk, while backward edges lead to individuals with high spread -- such that a combination of both forward and backward contact tracing is necessary to find superspreading events and prevent future cascades of infection.

preprint2022arXiv

The spectrum of the Grigoriev-Laurent pseudomoments

Grigoriev (2001) and Laurent (2003) independently showed that the sum-of-squares hierarchy of semidefinite programs does not exactly represent the hypercube $\{\pm 1\}^n$ until degree at least $n$ of the hierarchy. Laurent also observed that the pseudomoment matrices her proof constructs appear to have surprisingly simple and recursively structured spectra as $n$ increases. While several new proofs of the Grigoriev-Laurent lower bound have since appeared, Laurent's observations have remained unproved. We give yet another, representation-theoretic proof of the lower bound, which also yields exact formulae for the eigenvalues of the Grigoriev-Laurent pseudomoments. Using these, we prove and elaborate on Laurent's observations. Our arguments have two features that may be of independent interest. First, we show that the Grigoriev-Laurent pseudomoments are a special case of a Gram matrix construction of pseudomoments proposed by Bandeira and Kunisky (2020). Second, we find a new realization of the irreducible representations of the symmetric group corresponding to Young diagrams with two rows, as spaces of multivariate polynomials that are multiharmonic with respect to an equilateral simplex.

preprint2020arXiv

Spectral Planting and the Hardness of Refuting Cuts, Colorability, and Communities in Random Graphs

We study the problem of efficiently refuting the k-colorability of a graph, or equivalently certifying a lower bound on its chromatic number. We give formal evidence of average-case computational hardness for this problem in sparse random regular graphs, showing optimality of a simple spectral certificate. This evidence takes the form of a computationally-quiet planting: we construct a distribution of d-regular graphs that has significantly smaller chromatic number than a typical regular graph drawn uniformly at random, while providing evidence that these two distributions are indistinguishable by a large class of algorithms. We generalize our results to the more general problem of certifying an upper bound on the maximum k-cut. This quiet planting is achieved by minimizing the effect of the planted structure (e.g. colorings or cuts) on the graph spectrum. Specifically, the planted structure corresponds exactly to eigenvectors of the adjacency matrix. This avoids the pushout effect of random matrix theory, and delays the point at which the planting becomes visible in the spectrum or local statistics. To illustrate this further, we give similar results for a Gaussian analogue of this problem: a quiet version of the spiked model, where we plant an eigenspace rather than adding a generic low-rank perturbation. Our evidence for computational hardness of distinguishing two distributions is based on three different heuristics: stability of belief propagation, the local statistics hierarchy, and the low-degree likelihood ratio. Of independent interest, our results include general-purpose bounds on the low-degree likelihood ratio for multi-spiked matrix models, and an improved low-degree analysis of the stochastic block model.

preprint2016arXiv

Codes, Lower Bounds, and Phase Transitions in the Symmetric Rendezvous Problem

In the rendezvous problem, two parties with different labelings of the vertices of a complete graph are trying to meet at some vertex at the same time. It is well-known that if the parties have predetermined roles, then the strategy where one of them waits at one vertex, while the other visits all $n$ vertices in random order is optimal, taking at most $n$ steps and averaging about $n/2$. Anderson and Weber considered the symmetric rendezvous problem, where both parties must use the same randomized strategy. They analyzed strategies where the parties repeatedly play the optimal asymmetric strategy, determining their role independently each time by a biased coin-flip. By tuning the bias, Anderson and Weber achieved an expected meeting time of about $0.829 n$, which they conjectured to be asymptotically optimal. We change perspective slightly: instead of minimizing the expected meeting time, we seek to maximize the probability of meeting within a specified time $T$. The Anderson-Weber strategy, which fails with constant probability when $T= Θ(n)$, is not asymptotically optimal for large $T$ in this setting. Specifically, we exhibit a symmetric strategy that succeeds with probability $1-o(1)$ in $T=4n$ steps. This is tight: for any $α< 4$, any symmetric strategy with $T = αn$ fails with constant probability. Our strategy uses a new combinatorial object that we dub a "rendezvous code," which may be of independent interest. When $T \le n$, we show that the probability of meeting within $T$ steps is indeed asymptotically maximized by the Anderson-Weber strategy. Our results imply new lower bounds, showing that the best symmetric strategy takes at least $0.638 n$ steps in expectation. We also present some partial results for the symmetric rendezvous problem on other vertex-transitive graphs.

preprint2016arXiv

Information-theoretic thresholds for community detection in sparse networks

We give upper and lower bounds on the information-theoretic threshold for community detection in the stochastic block model. Specifically, consider the symmetric stochastic block model with $q$ groups, average degree $d$, and connection probabilities $c_\text{in}/n$ and $c_\text{out}/n$ for within-group and between-group edges respectively; let $λ= (c_\text{in}-c_\text{out})/(qd)$. We show that, when $q$ is large, and $λ= O(1/q)$, the critical value of $d$ at which community detection becomes possible---in physical terms, the condensation threshold---is \[ d_\text{c} = Θ\!\left( \frac{\log q}{q λ^2} \right) \, , \] with tighter results in certain regimes. Above this threshold, we show that any partition of the nodes into $q$ groups which is as `good' as the planted one, in terms of the number of within- and between-group edges, is correlated with it. This gives an exponential-time algorithm that performs better than chance; specifically, community detection becomes possible below the Kesten-Stigum bound for $q \ge 5$ in the disassortative case $λ< 0$, and for $q \ge 11$ in the assortative case $λ>0$ (similar upper bounds were obtained independently by Abbe and Sandon). Conversely, below this threshold, we show that no algorithm can label the vertices better than chance, or even distinguish the block model from an \ER\ random graph with high probability. Our lower bound on $d_\text{c}$ uses Robinson and Wormald's small subgraph conditioning method, and we also give (less explicit) results for non-symmetric stochastic block models. In the symmetric case, we obtain explicit results by using bounds on certain functions of doubly stochastic matrices due to Achlioptas and Naor; indeed, our lower bound on $d_\text{c}$ is their second moment lower bound on the $q$-colorability threshold for random graphs with a certain effective degree.

preprint2016arXiv

Information-theoretic thresholds for community detection in sparse networks

We give upper and lower bounds on the information-theoretic threshold for community detection in the stochastic block model. Specifically, let $k$ be the number of groups, $d$ be the average degree, the probability of edges between vertices within and between groups be $c_\mathrm{in}/n$ and $c_\mathrm{out}/n$ respectively, and let $λ= (c_\mathrm{in}-c_\mathrm{out})/(kd)$. We show that, when $k$ is large, and $λ= O(1/k)$, the critical value of $d$ at which community detection becomes possible -- in physical terms, the condensation threshold -- is \[ d_c = Θ\!\left( \frac{\log k}{k λ^2} \right) \, , \] with tighter results in certain regimes. Above this threshold, we show that the only partitions of the nodes into $k$ groups are correlated with the ground truth, giving an exponential-time algorithm that performs better than chance -- in particular, detection is possible for $k \ge 5$ in the disassortative case $λ< 0$ and for $k \ge 11$ in the assortative case $λ> 0$. (Similar upper bounds were obtained independently by Abbe and Sandon.) Below this threshold, we use recent results of Neeman and Netrapalli (who generalized arguments of Mossel, Neeman, and Sly) to show that no algorithm can label the vertices better than chance, or even distinguish the block model from an Erdős-Rényi random graph with high probability. We also rely on bounds on certain functions of doubly stochastic matrices due to Achlioptas and Naor; indeed, our lower bound on $d_c$ is the second moment lower bound on the $k$-colorability threshold for random graphs with a certain effective degree.

preprint2016arXiv

Matrix multiplication algorithms from group orbits

We show how to construct highly symmetric algorithms for matrix multiplication. In particular, we consider algorithms which decompose the matrix multiplication tensor into a sum of rank-1 tensors, where the decomposition itself consists of orbits under some finite group action. We show how to use the representation theory of the corresponding group to derive simple constraints on the decomposition, which we solve by hand for n=2,3,4,5, recovering Strassen's algorithm (in a particularly symmetric form) and new algorithms for larger n. While these new algorithms do not improve the known upper bounds on tensor rank or the matrix multiplication exponent, they are beautiful in their own right, and we point out modifications of this idea that could plausibly lead to further improvements. Our constructions also suggest further patterns that could be mined for new algorithms, including a tantalizing connection with lattices. In particular, using lattices we give the most transparent proof to date of Strassen's algorithm; the same proof works for all n, to yield a decomposition with $n^3 - n + 1$ terms.

preprint2015arXiv

A message-passing approach for recurrent-state epidemic models on networks

Epidemic processes are common out-of-equilibrium phenomena of broad interdisciplinary interest. Recently, dynamic message-passing (DMP) has been proposed as an efficient algorithm for simulating epidemic models on networks, and in particular for estimating the probability that a given node will become infectious at a particular time. To date, DMP has been applied exclusively to models with one-way state changes, as opposed to models like SIS (susceptible-infectious-susceptible) and SIRS (susceptible-infectious-recovered-susceptible) where nodes can return to previously inhabited states. Because many real-world epidemics can exhibit such recurrent dynamics, we propose a DMP algorithm for complex, recurrent epidemic models on networks. Our approach takes correlations between neighboring nodes into account while preventing causal signals from backtracking to their immediate source, and thus avoids "echo chamber effects" where a pair of adjacent nodes each amplify the probability that the other is infectious. We demonstrate that this approach well approximates results obtained from Monte Carlo simulation and that its accuracy is often superior to the pair approximation (which also takes second-order correlations into account). Moreover, our approach is more computationally efficient than the pair approximation, especially for complex epidemic models: the number of variables in our DMP approach grows as $2mk$ where $m$ is the number of edges and $k$ is the number of states, as opposed to $mk^2$ for the pair approximation. We suspect that the resulting reduction in computational effort, as well as the conceptual simplicity of DMP, will make it a useful tool in epidemic modeling, especially for inference tasks where there is a large parameter space to explore.

preprint2015arXiv

Community detection in networks with unequal groups

Recently, a phase transition has been discovered in the network community detection problem below which no algorithm can tell which nodes belong to which communities with success any better than a random guess. This result has, however, so far been limited to the case where the communities have the same size or the same average degree. Here we consider the case where the sizes or average degrees are different. This asymmetry allows us to assign nodes to communities with better-than- random success by examining their local neighborhoods. Using the cavity method, we show that this removes the detectability transition completely for networks with four groups or fewer, while for more than four groups the transition persists up to a critical amount of asymmetry but not beyond. The critical point in the latter case coincides with the point at which local information percolates, causing a global transition from a less-accurate solution to a more-accurate one.

preprint2015arXiv

Detectability thresholds and optimal algorithms for community structure in dynamic networks

We study the fundamental limits on learning latent community structure in dynamic networks. Specifically, we study dynamic stochastic block models where nodes change their community membership over time, but where edges are generated independently at each time step. In this setting (which is a special case of several existing models), we are able to derive the detectability threshold exactly, as a function of the rate of change and the strength of the communities. Below this threshold, we claim that no algorithm can identify the communities better than chance. We then give two algorithms that are optimal in the sense that they succeed all the way down to this limit. The first uses belief propagation (BP), which gives asymptotically optimal accuracy, and the second is a fast spectral clustering algorithm, based on linearizing the BP equations. We verify our analytic and algorithmic results via numerical simulation, and close with a brief discussion of extensions and open questions.

preprint2015arXiv

On the universal structure of human lexical semantics

How universal is human conceptual structure? The way concepts are organized in the human brain may reflect distinct features of cultural, historical, and environmental background in addition to properties universal to human cognition. Semantics, or meaning expressed through language, provides direct access to the underlying conceptual structure, but meaning is notoriously difficult to measure, let alone parameterize. Here we provide an empirical measure of semantic proximity between concepts using cross-linguistic dictionaries. Across languages carefully selected from a phylogenetically and geographically stratified sample of genera, translations of words reveal cases where a particular language uses a single polysemous word to express concepts represented by distinct words in another. We use the frequency of polysemies linking two concepts as a measure of their semantic proximity, and represent the pattern of such linkages by a weighted network. This network is highly uneven and fragmented: certain concepts are far more prone to polysemy than others, and there emerge naturally interpretable clusters loosely connected to each other. Statistical analysis shows such structural properties are consistent across different language groups, largely independent of geography, environment, and literacy. It is therefore possible to conclude the conceptual structure connecting basic vocabulary studied is primarily due to universal features of human cognition and language use.

preprint2015arXiv

Spatial Mixing for Independent Sets in Poisson Random Trees

We consider correlation decay in the hard-core model with fugacity $λ$ on a rooted tree $T$ in which the arity of each vertex is independently Poisson distributed with mean $d$. Specifically, we investigate the question of which parameter settings $(d, λ)$ result in strong spatial mixing, weak spatial mixing, or neither. (In our context, weak spatial mixing is equivalent to Gibbs uniqueness.) For finite fugacity, a zero-one law implies that these spatial mixing properties hold either almost surely or almost never, once we have conditioned on whether $T$ is finite or infinite. We provide a partial answer to this question, which implies in particular that 1. As $d \to \infty$, weak spatial mixing on the Poisson tree occurs whenever $λ< f(d) - o(1)$ but not when $λ$ is slightly above $f(d)$, where $f(d)$ is the threshold for WSM (and SSM) on the $d$-regular tree. This suggests that, in most cases, Poisson trees have similar spatial mixing behavior to regular trees. 2. When $1 < d \le 1.179$, there is weak spatial mixing on the Poisson($d$) tree for all values of $λ$. However, strong spatial mixing does not hold for sufficiently large $λ$. This is in contrast to regular trees, for which strong spatial mixing and weak spatial mixing always coincide. For infinite fugacity SSM holds only when the tree is finite, and hence almost surely fails on the Poisson($d$) tree when $d>1$. We show that WSM almost surely holds on the Poisson($d$) tree for $d < \mathbf{e}^{1/\sqrt{2}}/\sqrt{2} =1.434...$, but that it fails with positive probability if $d>\mathbf{e}$.

preprint2015arXiv

The phase transition in random regular exact cover

A $k$-uniform, $d$-regular instance of Exact Cover is a family of $m$ sets $F_{n,d,k} = \{ S_j \subseteq \{1,...,n\} \}$, where each subset has size $k$ and each $1 \le i \le n$ is contained in $d$ of the $S_j$. It is satisfiable if there is a subset $T \subseteq \{1,...,n\}$ such that $|T \cap S_j|=1$ for all $j$. Alternately, we can consider it a $d$-regular instance of Positive 1-in-$k$ SAT, i.e., a Boolean formula with $m$ clauses and $n$ variables where each clause contains $k$ variables and demands that exactly one of them is true. We determine the satisfiability threshold for random instances of this type with $k > 2$. Letting $d^\star = \frac{\ln k}{(k-1)(- \ln (1-1/k))} + 1$, we show that $F_{n,d,k}$ is satisfiable with high probability if $d < d^\star$ and unsatisfiable with high probability if $d > d^\star$. We do this with a simple application of the first and second moment methods, boosting the probability of satisfiability below $d^\star$ to $1-o(1)$ using the small subgraph conditioning method.

preprint2015arXiv

Untangling the roles of parasites in food webs with generative network models

Food webs represent the set of consumer-resource interactions among a set of species that co-occur in a habitat, but most food web studies have omitted parasites and their interactions. Recent studies have provided conflicting evidence on whether including parasites changes food web structure, with some suggesting that parasitic interactions are structurally distinct from those among free-living species while others claim the opposite. Here, we describe a principled method for understanding food web structure that combines an efficient optimization algorithm from statistical physics called parallel tempering with a probabilistic generalization of the empirically well-supported food web niche model. This generative model approach allows us to rigorously estimate the degree to which interactions that involve parasites are statistically distinguishable from interactions among free-living species, whether parasite niches behave similarly to free-living niches, and the degree to which existing hypotheses about food web structure are naturally recovered. We apply this method to the well-studied Flensburg Fjord food web and show that while predation on parasites, concomitant predation of parasites, and parasitic intraguild trophic interactions are largely indistinguishable from free-living predation interactions, parasite-host interactions are different. These results provide a powerful new tool for evaluating the impact of classes of species and interactions on food web structure to shed new light on the roles of parasites in food webs

preprint2014arXiv

Bounds on the quantum satisfiability threshold

Quantum k-SAT is the problem of deciding whether there is a n-qubit state which is perpendicular to a set of vectors, each of which lies in the Hilbert space of k qubits. Equivalently, the problem is to decide whether a particular type of local Hamiltonian has a ground state with zero energy. We consider random quantum k-SAT formulas with n variables and m = αn clauses, and ask at what value of αthese formulas cease to be satisfiable. We show that the threshold for random quantum 3-SAT is at most 3.594. For comparison, convincing arguments from statistical physics suggest that the classical 3-SAT threshold is α\approx 4.267. For larger k, we show that the quantum threshold is a constant factor smaller than the classical one. Our bounds work by determining the generic rank of the satisfying subspace for certain gadgets, and then using the technique of differential equations to analyze various algorithms that partition the hypergraph into a collection of these gadgets. Our use of differential equation to establish upper bounds on a satisfiability threshold appears to be novel, and our techniques may apply to various classical problems as well.

preprint2014arXiv

Computational Complexity, Phase Transitions, and Message-Passing for Community Detection

We take a whirlwind tour of problems and techniques at the boundary of computer science and statistical physics. We start with a brief description of P, NP, and NP-completeness. We then discuss random graphs, including the emergence of the giant component and the k-core, using techniques from branching processes and differential equations. Using these tools as well as the second moment method, we give upper and lower bounds on the critical clause density for random k-SAT. We end with community detection in networks, variational methods, the Bethe free energy, belief propagation, the detectability transition, and the non-backtracking matrix.

preprint2014arXiv

Group representations that resist random sampling

We show that there exists a family of groups $G_n$ and nontrivial irreducible representations $ρ_n$ such that, for any constant $t$, the average of $ρ_n$ over $t$ uniformly random elements $g_1, \ldots, g_t \in G_n$ has operator norm $1$ with probability approaching 1 as $n \rightarrow \infty$. More quantitatively, we show that there exist families of finite groups for which $Ω(\log \log |G|)$ random elements are required to bound the norm of a typical representation below $1$. This settles a conjecture of A. Wigderson.

preprint2014arXiv

Lower Bounds on the Critical Density in the Hard Disk Model via Optimized Metrics

We prove a new lower bound on the critical density $ρ_c$ of the hard disk model, i.e., the density below which it is possible to efficiently sample random configurations of $n$ non-overlapping disks in a unit torus. We use a classic Markov chain which moves one disk at a time, but with an improved path coupling analysis. Our main tool is an optimized metric on neighboring pairs of configurations, i.e., configurations that differ in the position of a single disk: we define a metric that depends on the difference in these positions, and which approaches zero continuously as they coincide. This improves the previous lower bound $ρ_c \ge 1/8$ to $ρ_c \ge 0.154$.

preprint2014arXiv

Phase transitions in semisupervised clustering of sparse networks

Predicting labels of nodes in a network, such as community memberships or demographic variables, is an important problem with applications in social and biological networks. A recently-discovered phase transition puts fundamental limits on the accuracy of these predictions if we have access only to the network topology. However, if we know the correct labels of some fraction $α$ of the nodes, we can do better. We study the phase diagram of this "semisupervised" learning problem for networks generated by the stochastic block model. We use the cavity method and the associated belief propagation algorithm to study what accuracy can be achieved as a function of $α$. For $k = 2$ groups, we find that the detectability transition disappears for any $α> 0$, in agreement with previous work. For larger $k$ where a hard but detectable regime exists, we find that the easy/hard transition (the point at which efficient algorithms can do better than chance) becomes a line of transitions where the accuracy jumps discontinuously at a critical value of $α$. This line ends in a critical point with a second-order transition, beyond which the accuracy is a continuous function of $α$. We demonstrate qualitatively similar transitions in two real-world networks.

preprint2014arXiv

Scalable detection of statistically significant communities and hierarchies, using message-passing for modularity

Modularity is a popular measure of community structure. However, maximizing the modularity can lead to many competing partitions, with almost the same modularity, that are poorly correlated with each other. It can also produce illusory "communities" in random graphs where none exist. We address this problem by using the modularity as a Hamiltonian at finite temperature, and using an efficient Belief Propagation algorithm to obtain the consensus of many partitions with high modularity, rather than looking for a single partition that maximizes it. We show analytically and numerically that the proposed algorithm works all the way down to the detectability transition in networks generated by the stochastic block model. It also performs well on real-world networks, revealing large communities in some networks where previous work has claimed no communities exist. Finally we show that by applying our algorithm recursively, subdividing communities until no statistically-significant subcommunities can be found, we can detect hierarchical structure in real-world networks more efficiently than previous methods.

preprint2013arXiv

A message-passing approach for threshold models of behavior in networks

We study a simple model of how social behaviors, like trends and opinions, propagate in networks where individuals adopt the trend when they are informed by threshold $T$ neighbors who are adopters. Using a dynamic message-passing algorithm, we develop a tractable and computationally efficient method that provides complete time evolution of each individual's probability of adopting the trend or of the frequency of adopters and non-adopters in any arbitrary networks. We validate the method by comparing it with Monte Carlo based agent simulation in real and synthetic networks and provide an exact analytic scheme for large random networks, where simulation results match well. Our approach is general enough to incorporate non-Markovian processes and to include heterogeneous thresholds and thus can be applied to explore rich sets of complex heterogeneous agent-based models.

preprint2013arXiv

Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications

In this paper we extend our previous work on the stochastic block model, a commonly used generative model for social and biological networks, and the problem of inferring functional groups or communities from the topology of the network. We use the cavity method of statistical physics to obtain an asymptotically exact analysis of the phase diagram. We describe in detail properties of the detectability/undetectability phase transition and the easy/hard phase transition for the community detection problem. Our analysis translates naturally into a belief propagation algorithm for inferring the group memberships of the nodes in an optimal way, i.e., that maximizes the overlap with the underlying group memberships, and learning the underlying parameters of the block model. Finally, we apply the algorithm to two examples of real-world networks and discuss its performance.

preprint2013arXiv

Model Selection for Degree-corrected Block Models

The proliferation of models for networks raises challenging problems of model selection: the data are sparse and globally dependent, and models are typically high-dimensional and have large numbers of latent variables. Together, these issues mean that the usual model-selection criteria do not work properly for networks. We illustrate these challenges, and show one way to resolve them, by considering the key network-analysis problem of dividing a graph into communities or blocks of nodes with homogeneous patterns of links to the rest of the network. The standard tool for doing this is the stochastic block model, under which the probability of a link between two nodes is a function solely of the blocks to which they belong. This imposes a homogeneous degree distribution within each block; this can be unrealistic, so degree-corrected block models add a parameter for each node, modulating its over-all degree. The choice between ordinary and degree-corrected block models matters because they make very different inferences about communities. We present the first principled and tractable approach to model selection between standard and degree-corrected block models, based on new large-graph asymptotics for the distribution of log-likelihood ratios under the stochastic block model, finding substantial departures from classical results for sparse graphs. We also develop linear-time approximations for log-likelihoods under both the stochastic block model and the degree-corrected model, using belief propagation. Applications to simulated and real networks show excellent agreement with our approximations. Our results thus both solve the practical problem of deciding on degree correction, and point to a general approach to model selection in network analysis.

preprint2013arXiv

Optimal epsilon-biased sets with just a little randomness

Subsets of F_2^n that are eps-biased, meaning that the parity of any set of bits is even or odd with probability eps close to 1/2, are powerful tools for derandomization. A simple randomized construction shows that such sets exist of size O(n/eps^2), and known deterministic constructions achieve sets of size O(n/eps^3), O(n^2/eps^2), and O((n/eps^2)^{5/4}). Rather than derandomizing these sets completely in exchange for making them larger, we attempt a partial derandomization while keeping them small, constructing sets of size O(n/eps^2) with as few random bits as possible. The naive randomized construction requires O(n^2/eps^2) random bits. We give two constructions. The first uses Nisan's space-bounded pseudorandom generator to partly derandomize a folklore probabilistic construction of an error-correcting code, and requires O(n log (1/eps)) bits. Our second construction requires O(n log (n/eps)) bits, but is more elementary; it adds randomness to a Legendre symbol construction on Alon, Goldreich, Håstad, and Peralta, and uses Weil sums to bound high moments of the bias.

preprint2013arXiv

Phase Transitions in Community Detection: A Solvable Toy Model

Recently, it was shown that there is a phase transition in the community detection problem. This transition was first computed using the cavity method, and has been proved rigorously in the case of $q=2$ groups. However, analytic calculations using the cavity method are challenging since they require us to understand probability distributions of messages. We study analogous transitions in so-called "zero-temperature inference" model, where this distribution is supported only on the most-likely messages. Furthermore, whenever several messages are equally likely, we break the tie by choosing among them with equal probability. While the resulting analysis does not give the correct values of the thresholds, it does reproduce some of the qualitative features of the system. It predicts a first-order detectability transition whenever $q > 2$, while the finite-temperature cavity method shows that this is the case only when $q > 4$. It also has a regime analogous to the "hard but detectable" phase, where the community structure can be partially recovered, but only when the initial messages are sufficiently accurate. Finally, we study a semisupervised setting where we are given the correct labels for a fraction $ρ$ of the nodes. For $q > 2$, we find a regime where the accuracy jumps discontinuously at a critical value of $ρ$.

preprint2013arXiv

Scalable Text and Link Analysis with Mixed-Topic Link Models

Many data sets contain rich information about objects, as well as pairwise relations between them. For instance, in networks of websites, scientific papers, and other documents, each node has content consisting of a collection of words, as well as hyperlinks or citations to other nodes. In order to perform inference on such data sets, and make predictions and recommendations, it is useful to have models that are able to capture the processes which generate the text at each node and the links between them. In this paper, we combine classic ideas in topic modeling with a variant of the mixed-membership block model recently developed in the statistical physics community. The resulting model has the advantage that its parameters, including the mixture of topics of each document and the resulting overlapping communities, can be inferred with a simple and scalable expectation-maximization algorithm. We test our model on three data sets, performing unsupervised topic classification and link prediction. For both tasks, our model outperforms several existing state-of-the-art methods, achieving higher accuracy with significantly less computation, analyzing a data set with 1.3 million words and 44 thousand links in a few minutes.

preprint2013arXiv

Small-Bias Sets for Nonabelian Groups: Derandomizing the Alon-Roichman Theorem

In analogy with epsilon-biased sets over Z_2^n, we construct explicit epsilon-biased sets over nonabelian finite groups G. That is, we find sets S subset G such that | Exp_{x in S} rho(x)| <= epsilon for any nontrivial irreducible representation rho. Equivalently, such sets make G's Cayley graph an expander with eigenvalue |lambda| <= epsilon. The Alon-Roichman theorem shows that random sets of size O(log |G| / epsilon^2) suffice. For groups of the form G = G_1 x ... x G_n, our construction has size poly(max_i |G_i|, n, epsilon^{-1}), and we show that a set S \subset G^n considered by Meka and Zuckerman that fools read-once branching programs over G is also epsilon-biased in this sense. For solvable groups whose abelian quotients have constant exponent, we obtain epsilon-biased sets of size (log |G|)^{1+o(1)} poly(epsilon^{-1}). Our techniques include derandomized squaring (in both the matrix product and tensor product senses) and a Chernoff-like bound on the expected norm of the product of independently random operators that may be of independent interest.

preprint2013arXiv

Spectral redemption: clustering sparse networks

Spectral algorithms are classic approaches to clustering and community detection in networks. However, for sparse networks the standard versions of these algorithms are suboptimal, in some cases completely failing to detect communities even when other algorithms such as belief propagation can do so. Here we introduce a new class of spectral algorithms based on a non-backtracking walk on the directed edges of the graph. The spectrum of this operator is much better-behaved than that of the adjacency matrix or other commonly used matrices, maintaining a strong separation between the bulk eigenvalues and the eigenvalues relevant to community structure even in the sparse case. We show that our algorithm is optimal for graphs generated by the stochastic block model, detecting communities all the way down to the theoretical limit. We also show the spectrum of the non-backtracking operator for some real-world networks, illustrating its advantages over traditional spectral clustering.

preprint2013arXiv

Transdisciplinary electric power grid science

The 20th-century engineering feat that most improved the quality of human life, the electric power system, now faces discipline-spanning challenges that threaten that distinction. So multilayered and complex that they resemble ecosystems, power grids face risks from their interdependent cyber, physical, social and economic layers. Only with a holistic understanding of the dynamics of electricity infrastructure and human operators, automatic controls, electricity markets, weather, climate and policy can we fortify worldwide access to electricity.

preprint2012arXiv

An Entropic Proof of Chang's Inequality

Chang's lemma is a useful tool in additive combinatorics and the analysis of Boolean functions. Here we give an elementary proof using entropy. The constant we obtain is tight, and we give a slight improvement in the case where the variables are highly biased.

preprint2012arXiv

Continuum Percolation Thresholds in Two Dimensions

A wide variety of methods have been used to compute percolation thresholds. In lattice percolation, the most powerful of these methods consists of microcanonical simulations using the union-find algorithm to efficiently determine the connected clusters, and (in two dimensions) using exact values from conformal field theory for the probability, at the phase transition, that various kinds of wrapping clusters exist on the torus. We apply this approach to percolation in continuum models, finding overlaps between objects with real-valued positions and orientations. In particular, we find precise values of the percolation transition for disks, squares, rotated squares, and rotated sticks in two dimensions, and confirm that these transitions behave as conformal field theory predicts. The running time and memory use of our algorithm are essentially linear as a function of the number of objects at criticality.

preprint2012arXiv

From spin glasses to hard satisfiable formulas

We introduce a highly structured family of hard satisfiable 3-SAT formulas corresponding to an ordered spin-glass model from statistical physics. This model has provably "glassy" behavior; that is, it has many local optima with large energy barriers between them, so that local search algorithms get stuck and have difficulty finding the true ground state, i.e., the unique satisfying assignment. We test the hardness of our formulas with two Davis-Putnam solvers, Satz and zChaff, the recently introduced Survey Propagation (SP), and two local search algorithms, Walksat and Record-to-Record Travel (RRT). We compare our formulas to random 3-XOR-SAT formulas and to two other generators of hard satisfiable instances, the minimum disagreement parity formulas of Crawford et al., and Hirsch's hgen. For the complete solvers the running time of our formulas grows exponentially in sqrt(n), and exceeds that of random 3-XOR-SAT formulas for small problem sizes. SP is unable to solve our formulas with as few as 25 variables. For Walksat, our formulas appear to be harder than any other known generator of satisfiable instances. Finally, our formulas can be solved efficiently by RRT but only if the parameter d is tuned to the height of the barriers between local minima, and we use this parameter to measure the barrier heights in random 3-XOR-SAT formulas as well.

preprint2012arXiv

Oriented and Degree-generated Block Models: Generating and Inferring Communities with Inhomogeneous Degree Distributions

The stochastic block model is a powerful tool for inferring community structure from network topology. However, it predicts a Poisson degree distribution within each community, while most real-world networks have a heavy-tailed degree distribution. The degree-corrected block model can accommodate arbitrary degree distributions within communities. But since it takes the vertex degrees as parameters rather than generating them, it cannot use them to help it classify the vertices, and its natural generalization to directed graphs cannot even use the orientations of the edges. In this paper, we present variants of the block model with the best of both worlds: they can use vertex degrees and edge orientations in the classification process, while tolerating heavy-tailed degree distributions within communities. We show that for some networks, including synthetic networks and networks of word adjacencies in English text, these new block models achieve a higher accuracy than either standard or degree-corrected block models.

preprint2012arXiv

Stability analysis of financial contagion due to overlapping portfolios

Common asset holdings are widely believed to have been the primary vector of contagion in the recent financial crisis. We develop a network approach to the amplification of financial contagion due to the combination of overlapping portfolios and leverage, and we show how it can be understood in terms of a generalized branching process. By studying a stylized model we estimate the circumstances under which systemic instabilities are likely to occur as a function of parameters such as leverage, market crowding, diversification, and market impact. Although diversification may be good for individual institutions, it can create dangerous systemic effects, and as a result financial contagion gets worse with too much diversification. Under our model there is a critical threshold for leverage; below it financial networks are always stable, and above it the unstable region grows as leverage increases. The financial system exhibits "robust yet fragile" behavior, with regions of the parameter space where contagion is rare but catastrophic whenever it occurs. Our model and methods of analysis can be calibrated to real data and provide simple yet powerful tools for macroprudential stress testing.

preprint2012arXiv

The Power of Choice for Random Satisfiability

We consider Achlioptas processes for k-SAT formulas. We create a semi-random formula with n variables and m clauses, where each clause is a choice, made on-line, between two or more uniformly random clauses. Our goal is to delay the satisfiability/unsatisfiability transition, keeping the formula satisfiable up to densities m/n beyond the satisfiability threshold alpha_k for random k-SAT. We show that three choices suffice to raise the threshold for any k >= 3, and that two choices suffice for all 3 <= k <= 25. We also show that two choices suffice to lower the threshold for all k >= 3, making the formula unsatisfiable at a density below alpha_k.

preprint2012arXiv

Topological phase transition in a network model with preferential attachment and node removal

Preferential attachment is a popular model of growing networks. We consider a generalized model with random node removal, and a combination of preferential and random attachment. Using a high-degree expansion of the master equation, we identify a topological phase transition depending on the rate of node removal and the relative strength of preferential vs. random attachment, where the degree distribution goes from a power law to one with an exponential tail.

preprint2011arXiv

Active Learning for Node Classification in Assortative and Disassortative Networks

In many real-world networks, nodes have class labels, attributes, or variables that affect the network's topology. If the topology of the network is known but the labels of the nodes are hidden, we would like to select a small subset of nodes such that, if we knew their labels, we could accurately predict the labels of all the other nodes. We develop an active learning algorithm for this problem which uses information-theoretic techniques to choose which nodes to explore. We test our algorithm on networks from three different domains: a social network, a network of English words that appear adjacently in a novel, and a marine food web. Our algorithm makes no initial assumptions about how the groups connect, and performs well even when faced with quite general types of network structure. In particular, we do not assume that nodes of the same class are more likely to be connected to each other---only that they connect to the rest of the network in similar ways.

preprint2011arXiv

Frugal and Truthful Auctions for Vertex Covers, Flows, and Cuts

We study truthful mechanisms for hiring a team of agents in three classes of set systems: Vertex Cover auctions, k-flow auctions, and cut auctions. For Vertex Cover auctions, the vertices are owned by selfish and rational agents, and the auctioneer wants to purchase a vertex cover from them. For k-flow auctions, the edges are owned by the agents, and the auctioneer wants to purchase k edge-disjoint s-t paths, for given s and t. In the same setting, for cut auctions, the auctioneer wants to purchase an s-t cut. Only the agents know their costs, and the auctioneer needs to select a feasible set and payments based on bids made by the agents. We present constant-competitive truthful mechanisms for all three set systems. That is, the maximum overpayment of the mechanism is within a constant factor of the maximum overpayment of any truthful mechanism, for every set system in the class. The mechanism for Vertex Cover is based on scaling each bid by a multiplier derived from the dominant eigenvector of a certain matrix. The mechanism for k-flows prunes the graph to be minimally (k+1)-connected, and then applies the Vertex Cover mechanism. Similarly, the mechanism for cuts contracts the graph until all s-t paths have length exactly 2, and then applies the Vertex Cover mechanism.

preprint2011arXiv

Independent sets in random graphs from the weighted second moment method

We prove new lower bounds on the likely size of a maximum independent set in a random graph with a given average degree. Our method is a weighted version of the second moment method, where we give each independent set a weight based on the total degree of its vertices.

preprint2011arXiv

Parallel Complexity of Random Boolean Circuits

Random instances of feedforward Boolean circuits are studied both analytically and numerically. Evaluating these circuits is known to be a P-complete problem and thus, in the worst case, believed to be impossible to perform, even given a massively parallel computer, in time much less than the depth of the circuit. Nonetheless, it is found that for some ensembles of random circuits, saturation to a fixed truth value occurs rapidly so that evaluation of the circuit can be accomplished in much less parallel time than the depth of the circuit. For other ensembles saturation does not occur and circuit evaluation is apparently hard. In particular, for some random circuits composed of connectives with five or more inputs, the number of true outputs at each level is a chaotic sequence. Finally, while the average case complexity depends on the choice of ensemble, it is shown that for all ensembles it is possible to simultaneously construct a typical circuit together with its solution in polylogarithmic parallel time.

preprint2011arXiv

Phase transition in the detection of modules in sparse networks

We present an asymptotically exact analysis of the problem of detecting communities in sparse random networks. Our results are also applicable to detection of functional modules, partitions, and colorings in noisy planted models. Using a cavity method analysis, we unveil a phase transition from a region where the original group assignment is undetectable to one where detection is possible. In some cases, the detectable region splits into an algorithmically hard region and an easy one. Our approach naturally translates into a practical algorithm for detecting modules in sparse networks, and learning the parameters of the underlying model.

preprint2011arXiv

Quantum Fourier sampling, Code Equivalence, and the quantum security of the McEliece and Sidelnikov cryptosystems

The Code Equivalence problem is that of determining whether two given linear codes are equivalent to each other up to a permutation of the coordinates. This problem has a direct reduction to a nonabelian hidden subgroup problem (HSP), suggesting a possible quantum algorithm analogous to Shor's algorithms for factoring or discrete log. However, we recently showed that in many cases of interest---including Goppa codes---solving this case of the HSP requires rich, entangled measurements. Thus, solving these cases of Code Equivalence via Fourier sampling appears to be out of reach of current families of quantum algorithms. Code equivalence is directly related to the security of McEliece-type cryptosystems in the case where the private code is known to the adversary. However, for many codes the support splitting algorithm of Sendrier provides a classical attack in this case. We revisit the claims of our previous article in the light of these classical attacks, and discuss the particular case of the Sidelnikov cryptosystem, which is based on Reed-Muller codes.

preprint2011arXiv

The complexity of the fermionant, and immanants of constant width

In the context of statistical physics, Chandrasekharan and Wiese recently introduced the \emph{fermionant} $\Ferm_k$, a determinant-like quantity where each permutation $π$ is weighted by $-k$ raised to the number of cycles in $π$. We show that computing $\Ferm_k$ is #P-hard under Turing reductions for any constant $k > 2$, and is $\oplusP$-hard for $k=2$, even for the adjacency matrices of planar graphs. As a consequence, unless the polynomial hierarchy collapses, it is impossible to compute the immanant $\Imm_λ\,A$ as a function of the Young diagram $λ$ in polynomial time, even if the width of $λ$ is restricted to be at most 2. In particular, if $\Ferm_2$ is in P, or if $\Imm_λ$ is in P for all $λ$ of width 2, then $\NP \subseteq \RP$ and there are randomized polynomial-time algorithms for NP-complete problems.

preprint2011arXiv

Tight bounds on the threshold for permuted k-colorability

If each edge (u,v) of a graph G=(V,E) is decorated with a permutation pi_{u,v} of k objects, we say that it has a permuted k-coloring if there is a coloring sigma from V to {1,...,k} such that sigma(v) is different from pi_{u,v}(sigma(u)) for all (u,v) in E. Based on arguments from statistical physics, we conjecture that the threshold d_k for permuted k-colorability in random graphs G(n,m=dn/2), where the permutations on the edges are uniformly random, is equal to the threshold for standard graph k-colorability. The additional symmetry provided by random permutations makes it easier to prove bounds on d_k. By applying the second moment method with these additional symmetries, and applying the first moment method to a random variable that depends on the number of available colors at each vertex, we bound the threshold within an additive constant. Specifically, we show that for any constant epsilon > 0, for sufficiently large k we have 2 k ln k - ln k - 2 - epsilon < d_k < 2 k ln k - ln k - 1 + epsilon. In contrast, the best known bounds on d_k for standard k-colorability leave an additive gap of about ln k between the upper and lower bounds.

preprint2010arXiv

Active Learning for Hidden Attributes in Networks

In many networks, vertices have hidden attributes, or types, that are correlated with the networks topology. If the topology is known but these attributes are not, and if learning the attributes is costly, we need a method for choosing which vertex to query in order to learn as much as possible about the attributes of the other vertices. We assume the network is generated by a stochastic block model, but we make no assumptions about its assortativity or disassortativity. We choose which vertex to query using two methods: 1) maximizing the mutual information between its attributes and those of the others (a well-known approach in active learning) and 2) maximizing the average agreement between two independent samples of the conditional Gibbs distribution. Experimental results show that both these methods do much better than simple heuristics. They also consistently identify certain vertices as important by querying them early on.

preprint2010arXiv

Approximate Representations and Approximate Homomorphisms

Approximate algebraic structures play a defining role in arithmetic combinatorics and have found remarkable applications to basic questions in number theory and pseudorandomness. Here we study approximate representations of finite groups: functions f:G -> U_d such that Pr[f(xy) = f(x) f(y)] is large, or more generally Exp_{x,y} ||f(xy) - f(x)f(y)||^2$ is small, where x and y are uniformly random elements of the group G and U_d denotes the unitary group of degree d. We bound these quantities in terms of the ratio d / d_min where d_min is the dimension of the smallest nontrivial representation of G. As an application, we bound the extent to which a function f : G -> H can be an approximate homomorphism where H is another finite group. We show that if H's representations are significantly smaller than G's, no such f can be much more homomorphic than a random function. We interpret these results as showing that if G is quasirandom, that is, if d_min is large, then G cannot be embedded in a small number of dimensions, or in a less-quasirandom group, without significant distortion of G's multiplicative structure. We also prove that our bounds are tight by showing that minors of genuine representations and their polar decompositions are essentially optimal approximate representations.

preprint2010arXiv

Circuit partitions and #P-complete products of inner products

We present a simple, natural #P-complete problem. Let G be a directed graph, and let k be a positive integer. We define q(G;k) as follows. At each vertex v, we place a k-dimensional complex vector x_v. We take the product, over all edges (u,v), of the inner product <x_u,x_v>. Finally, q(G;k) is the expectation of this product, where the x_v are chosen uniformly and independently from all vectors of norm 1 (or, alternately, from the Gaussian distribution). We show that q(G;k) is proportional to G's cycle partition polynomial, and therefore that it is #P-complete for any k>1.

preprint2010arXiv

Finding conjugate stabilizer subgroups in PSL(2; q) and related groups

We reduce a case of the hidden subgroup problem (HSP) in SL(2; q), PSL(2; q), and PGL(2; q), three related families of finite groups of Lie type, to efficiently solvable HSPs in the affine group AGL(1; q). These groups act on projective space in an almost 3-transitive way, and we use this fact in each group to distinguish conjugates of its Borel (upper triangular) subgroup, which is also the stabilizer subgroup of an element of projective space. Our observation is mainly group-theoretic, and as such breaks little new ground in quantum algorithms. Nonetheless, these appear to be the first positive results on the HSP in finite simple groups such as PSL(2; q).

preprint2010arXiv

How close can we come to a parity function when there isn't one?

Consider a group G such that there is no homomorphism f:G to {+1,-1}. In that case, how close can we come to such a homomorphism? We show that if f has zero expectation, then the probability that f(xy) = f(x) f(y), where x, y are chosen uniformly and independently from G, is at most 1/2(1+1/sqrt{d}), where d is the dimension of G's smallest nontrivial irreducible representation. For the alternating group A_n, for instance, d=n-1. On the other hand, A_n contains a subgroup isomorphic to S_{n-2}, whose parity function we can extend to obtain an f for which this probability is 1/2(1+1/{n \choose 2}). Thus the extent to which f can be "more homomorphic" than a random function from A_n to {+1,-1} lies between O(n^{-1/2}) and Omega(n^{-2}).

preprint2010arXiv

Regarding a Representation-Theoretic Conjecture of Wigderson

We show that there exists a family of irreducible representations R_i (of finite groups G_i) such that, for any constant t, the average of R_i over t uniformly random elements g_1, ..., g_t of G_i has operator norm 1 with probability approaching 1 as i limits to infinity. This settles a conjecture of Wigderson in the negative.

preprint2010arXiv

The McEliece Cryptosystem Resists Quantum Fourier Sampling Attacks

Quantum computers can break the RSA and El Gamal public-key cryptosystems, since they can factor integers and extract discrete logarithms. If we believe that quantum computers will someday become a reality, we would like to have \emph{post-quantum} cryptosystems which can be implemented today with classical computers, but which will remain secure even in the presence of quantum attacks. In this article we show that the McEliece cryptosystem over \emph{well-permuted, well-scrambled} linear codes resists precisely the attacks to which the RSA and El Gamal cryptosystems are vulnerable---namely, those based on generating and measuring coset states. This eliminates the approach of strong Fourier sampling on which almost all known exponential speedups by quantum algorithms are based. Specifically, we show that the natural case of the Hidden Subgroup Problem to which the McEliece cryptosystem reduces cannot be solved by strong Fourier sampling, or by any measurement of a coset state. We start with recent negative results on quantum algorithms for Graph Isomorphism, which are based on particular subgroups of size two, and extend them to subgroups of arbitrary structure, including the automorphism groups of linear codes. This allows us to obtain the first rigorous results on the security of the McEliece cryptosystem in the face of quantum adversaries, strengthening its candidacy for post-quantum cryptography.

preprint2010arXiv

The rigidity transition in random graphs

As we add rigid bars between points in the plane, at what point is there a giant (linear-sized) rigid component, which can be rotated and translated, but which has no internal flexibility? If the points are generic, this depends only on the combinatorics of the graph formed by the bars. We show that if this graph is an Erdos-Renyi random graph G(n,c/n), then there exists a sharp threshold for a giant rigid component to emerge. For c < c_2, w.h.p. all rigid components span one, two, or three vertices, and when c > c_2, w.h.p. there is a giant rigid component. The constant c_2 \approx 3.588 is the threshold for 2-orientability, discovered independently by Fernholz and Ramachandran and Cain, Sanders, and Wormald in SODA'07. We also give quantitative bounds on the size of the giant rigid component when it emerges, proving that it spans a (1-o(1))-fraction of the vertices in the (3+2)-core. Informally, the (3+2)-core is maximal induced subgraph obtained by starting from the 3-core and then inductively adding vertices with 2 neighbors in the graph obtained so far.

preprint2006arXiv

Exact solutions for models of evolving networks with addition and deletion of nodes

There has been considerable recent interest in the properties of networks, such as citation networks and the worldwide web, that grow by the addition of vertices, and a number of simple solvable models of network growth have been studied. In the real world, however, many networks, including the web, not only add vertices but also lose them. Here we formulate models of the time evolution of such networks and give exact solutions for a number of cases of particular interest. For the case of net growth and so-called preferential attachment -- in which newly appearing vertices attach to previously existing ones in proportion to vertex degree -- we show that the resulting networks have power-law degree distributions, but with an exponent that diverges as the growth rate vanishes. We conjecture that the low exponent values observed in real-world networks are thus the result of vigorous growth in which the rate of addition of vertices far exceeds the rate of removal. Were growth to slow in the future, for instance in a more mature future version of the web, we would expect to see exponents increase, potentially without bound.

preprint2005arXiv

Automatic Filters for the Detection of Coherent Structure in Spatiotemporal Systems

Most current methods for identifying coherent structures in spatially-extended systems rely on prior information about the form which those structures take. Here we present two new approaches to automatically filter the changing configurations of spatial dynamical systems and extract coherent structures. One, local sensitivity filtering, is a modification of the local Lyapunov exponent approach suitable to cellular automata and other discrete spatial systems. The other, local statistical complexity filtering, calculates the amount of information needed for optimal prediction of the system's behavior in the vicinity of a given point. By examining the changing spatiotemporal distributions of these quantities, we can find the coherent structures in a variety of pattern-forming cellular automata, without needing to guess or postulate the form of that structure. We apply both filters to elementary and cyclical cellular automata (ECA and CCA) and find that they readily identify particles, domains and other more complicated structures. We compare the results from ECA with earlier ones based upon the theory of formal languages, and the results from CCA with a more traditional approach based on an order parameter and free energy. While sensitivity and statistical complexity are equally adept at uncovering structure, they are based on different system properties (dynamical and probabilistic, respectively), and provide complementary information.

preprint2005arXiv

Generating Hard Satisfiable Formulas by Hiding Solutions Deceptively

To test incomplete search algorithms for constraint satisfaction problems such as 3-SAT, we need a source of hard, but satisfiable, benchmark instances. A simple way to do this is to choose a random truth assignment A, and then choose clauses randomly from among those satisfied by A. However, this method tends to produce easy problems, since the majority of literals point toward the ``hidden'' assignment A. Last year, Achlioptas, Jia and Moore proposed a problem generator that cancels this effect by hiding both A and its complement. While the resulting formulas appear to be just as hard for DPLL algorithms as random 3-SAT formulas with no hidden assignment, they can be solved by WalkSAT in only polynomial time. Here we propose a new method to cancel the attraction to A, by choosing a clause with t > 0 literals satisfied by A with probability proportional to q^t for some q < 1. By varying q, we can generate formulas whose variables have no bias, i.e., which are equally likely to be true or false; we can even cause the formula to ``deceptively'' point away from A. We present theoretical and experimental results suggesting that these formulas are exponentially hard both for DPLL algorithms and for incomplete algorithms such as WalkSAT.

preprint2005arXiv

Rapid Mixing for Lattice Colorings with Fewer Colors

We provide an optimally mixing Markov chain for 6-colorings of the square lattice on rectangular regions with free, fixed, or toroidal boundary conditions. This implies that the uniform distribution on the set of such colorings has strong spatial mixing, so that the 6-state Potts antiferromagnet has a finite correlation length and a unique Gibbs measure at zero temperature. Four and five are now the only remaining values of q for which it is not known whether there exists a rapidly mixing Markov chain for q-colorings of the square lattice.

preprint2005arXiv

The Symmetric Group Defies Strong Fourier Sampling: Part II

Part I of this paper showed that the hidden subgroup problem over the symmetric group--including the special case relevant to Graph Isomorphism--cannot be efficiently solved by strong Fourier sampling, even if one may perform an arbitrary POVM on the coset state. In this paper, we extend these results to entangled measurements. Specifically, we show that the hidden subgroup problem on the symmetric group cannot be solved by any POVM applied to pairs of coset states. In particular, these hidden subgroups cannot be determined by any polynomial number of one- or two-register experiments on coset states.

preprint2004arXiv

Traceroute sampling makes random graphs appear to have power law degree distributions

The topology of the Internet has typically been measured by sampling traceroutes, which are roughly shortest paths from sources to destinations. The resulting measurements have been used to infer that the Internet's degree distribution is scale-free; however, many of these measurements have relied on sampling traceroutes from a small number of sources. It was recently argued that sampling in this way can introduce a fundamental bias in the degree distribution, for instance, causing random (Erdos-Renyi) graphs to appear to have power law degree distributions. We explain this phenomenon analytically using differential equations to model the growth of a breadth-first tree in a random graph G(n,p=c/n) of average degree c, and show that sampling from a single source gives an apparent power law degree distribution P(k) ~ 1/k for k < c.

preprint2001arXiv

Counting, Fanout, and the Complexity of Quantum ACC

We propose definitions of $\QAC^0$, the quantum analog of the classical class $\AC^0$ of constant-depth circuits with AND and OR gates of arbitrary fan-in, and $\QACC[q]$, the analog of the class $\ACC[q]$ where $\Mod_q$ gates are also allowed. We prove that parity or fanout allows us to construct quantum $\MOD_q$ gates in constant depth for any $q$, so $\QACC[2] = \QACC$. More generally, we show that for any $q,p > 1$, $\MOD_q$ is equivalent to $\MOD_p$ (up to constant depth). This implies that $\QAC^0$ with unbounded fanout gates, denoted $\QACwf^0$, is the same as $\QACC[q]$ and $\QACC$ for all $q$. Since $\ACC[p] \ne \ACC[q]$ whenever $p$ and $q$ are distinct primes, $\QACC[q]$ is strictly more powerful than its classical counterpart, as is $\QAC^0$ when fanout is allowed. This adds to the growing list of quantum complexity classes which are provably more powerful than their classical counterparts. We also develop techniques for proving upper bounds for $\QACC^0$ in terms of related language classes. We define classes of languages $\EQACC$, $\NQACC$ and $\BQACC_{\rats}$. We define a notion of $\log$-planar $\QACC$ operators and show the appropriately restricted versions of $\EQACC$ and $\NQACC$ are contained in $¶/\poly$. We also define a notion of $\log$-gate restricted $\QACC$ operators and show the appropriately restricted versions of $\EQACC$ and $\NQACC$ are contained in $\TC^0$.

preprint1999arXiv

The physical limits of communication

It has been well-known since the pioneering work of Claude Shannon in the 1940s that a message transmitted with optimal efficiency over a channel of limited bandwidth is indistinguishable from random noise to a receiver who is unfamiliar with the language in which the message is written. In this letter we demonstrate an equivalent result about electromagnetic transmissions. We show that when electromagnetic radiation is used as the transmission medium, the most information-efficient format for a given message is indistinguishable from black-body radiation to a receiver who is unfamiliar with that format. The characteristic temperature of the radiation is set by the amount of energy used to make the transmission. If information is not encoded in the direction of the radiation, but only its timing, energy or polarization, then the most efficient format has the form of a one-dimensional black-body spectrum which is easily distinguished from the three-dimensional case.

preprint1998arXiv

The Computational Complexity of Sandpiles

Given an initial distribution of sand in an Abelian sandpile, what final state does it relax to after all possible avalanches have taken place? In d >= 3, we show that this problem is P-complete, so that explicit simulation of the system is almost certainly necessary. We also show that the problem of determining whether a sandpile state is recurrent is P-complete in d >= 3. In d=1, we give two algorithms for predicting the sandpile on a lattice of size n, both faster than explicit simulation: a serial one that runs in time O(n log n), and a parallel one that runs in time O(log^3 n), i.e. in the class NC^3. The latter is based on a more general problem we call Additive Ranked Generability. This leaves the two-dimensional case as an interesting open problem.

Cristopher Moore

What is connected

Connect this record

See the researcher in context

Building this map preview

64 published item(s)

Belief propagation for permutations, rankings, and partial orders

Effective Resistance for Pandemics: Mobility Network Sparsification for High-Fidelity Epidemic Simulation

Reconstruction of Random Geometric Graphs: Breaking the Omega(r) distortion barrier

The role of directionality, heterogeneity and correlations in epidemic risk and spread

The spectrum of the Grigoriev-Laurent pseudomoments

Spectral Planting and the Hardness of Refuting Cuts, Colorability, and Communities in Random Graphs

Codes, Lower Bounds, and Phase Transitions in the Symmetric Rendezvous Problem

Information-theoretic thresholds for community detection in sparse networks

Information-theoretic thresholds for community detection in sparse networks

Matrix multiplication algorithms from group orbits

A message-passing approach for recurrent-state epidemic models on networks

Community detection in networks with unequal groups

Detectability thresholds and optimal algorithms for community structure in dynamic networks

On the universal structure of human lexical semantics

Spatial Mixing for Independent Sets in Poisson Random Trees

The phase transition in random regular exact cover

Untangling the roles of parasites in food webs with generative network models

Bounds on the quantum satisfiability threshold

Computational Complexity, Phase Transitions, and Message-Passing for Community Detection

Group representations that resist random sampling

Lower Bounds on the Critical Density in the Hard Disk Model via Optimized Metrics

Phase transitions in semisupervised clustering of sparse networks

Scalable detection of statistically significant communities and hierarchies, using message-passing for modularity

A message-passing approach for threshold models of behavior in networks

Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications

Model Selection for Degree-corrected Block Models

Optimal epsilon-biased sets with just a little randomness

Phase Transitions in Community Detection: A Solvable Toy Model

Scalable Text and Link Analysis with Mixed-Topic Link Models

Small-Bias Sets for Nonabelian Groups: Derandomizing the Alon-Roichman Theorem

Spectral redemption: clustering sparse networks

Transdisciplinary electric power grid science

An Entropic Proof of Chang's Inequality

Continuum Percolation Thresholds in Two Dimensions

From spin glasses to hard satisfiable formulas

Oriented and Degree-generated Block Models: Generating and Inferring Communities with Inhomogeneous Degree Distributions

Stability analysis of financial contagion due to overlapping portfolios

The Power of Choice for Random Satisfiability

Topological phase transition in a network model with preferential attachment and node removal

Active Learning for Node Classification in Assortative and Disassortative Networks

Frugal and Truthful Auctions for Vertex Covers, Flows, and Cuts

Independent sets in random graphs from the weighted second moment method

Parallel Complexity of Random Boolean Circuits

Phase transition in the detection of modules in sparse networks

Quantum Fourier sampling, Code Equivalence, and the quantum security of the McEliece and Sidelnikov cryptosystems

The complexity of the fermionant, and immanants of constant width

Tight bounds on the threshold for permuted k-colorability

Active Learning for Hidden Attributes in Networks

Approximate Representations and Approximate Homomorphisms

Circuit partitions and #P-complete products of inner products

Finding conjugate stabilizer subgroups in PSL(2; q) and related groups

How close can we come to a parity function when there isn't one?

Regarding a Representation-Theoretic Conjecture of Wigderson

The McEliece Cryptosystem Resists Quantum Fourier Sampling Attacks

The rigidity transition in random graphs

Exact solutions for models of evolving networks with addition and deletion of nodes

Automatic Filters for the Detection of Coherent Structure in Spatiotemporal Systems

Generating Hard Satisfiable Formulas by Hiding Solutions Deceptively

Rapid Mixing for Lattice Colorings with Fewer Colors

The Symmetric Group Defies Strong Fourier Sampling: Part II

Traceroute sampling makes random graphs appear to have power law degree distributions

Counting, Fanout, and the Complexity of Quantum ACC

The physical limits of communication

The Computational Complexity of Sandpiles