Source author record

M. E. J. Newman

M. E. J. Newman appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

physics.soc-ph Social and Information Networks cond-mat.stat-mech physics.data-an cond-mat.dis-nn Digital Libraries Applications Machine Learning nlin.AO Populations and Evolution adap-org cond-mat cond-mat.soft Data Structures and Algorithms Discrete Mathematics Networking and Internet Architecture q-bio quant-ph

Catalog footprint

What is connected

45works

18topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Clustering of heterogeneous populations of networks

Statistical methods for reconstructing networks from repeated measurements typically assume that all measurements are generated from the same underlying network structure. This need not be the case, however. People's social networks might be different on weekdays and weekends, for instance. Brain networks may differ between healthy patients and those with dementia or other conditions. Here we describe a Bayesian analysis framework for such data that allows for the fact that network measurements may be reflective of multiple possible structures. We define a finite mixture model of the measurement process and derive a fast Gibbs sampling procedure that samples exactly from the full posterior distribution of model parameters. The end result is a clustering of the measured networks into groups with similar structure. We demonstrate the method on both real and synthetic network populations.

preprint2022arXiv

Cutting Through the Noise to Infer Autonomous System Topology

The Border Gateway Protocol (BGP) is a distributed protocol that manages interdomain routing without requiring a centralized record of which autonomous systems (ASes) connect to which others. Many methods have been devised to infer the AS topology from publicly available BGP data, but none provide a general way to handle the fact that the data are notoriously incomplete and subject to error. This paper describes a method for reliably inferring AS-level connectivity in the presence of measurement error using Bayesian statistical inference acting on BGP routing tables from multiple vantage points. We employ a novel approach for counting AS adjacency observations in the AS-PATH attribute data from public route collectors, along with a Bayesian algorithm to generate a statistical estimate of the AS-level network. Our approach also gives us a way to evaluate the accuracy of existing reconstruction methods and to identify advantageous locations for new route collectors or vantage points.

preprint2022arXiv

Representative community divisions of networks

Methods for detecting community structure in networks typically aim to identify a single best partition of network nodes into communities, often by optimizing some objective function, but in real-world applications there may be many competitive partitions with objective scores close to the global optimum and one can obtain a more informative picture of the community structure by examining a representative set of such high-scoring partitions than by looking at just the single optimum. However, such a set can be difficult to interpret since its size can easily run to hundreds or thousands of partitions. In this paper we present a method for analyzing large partition sets by dividing them into groups of similar partitions and then identifying an archetypal partition as a representative of each group. The resulting set of archetypal partitions provides a succinct, interpretable summary of the form and variety of community structure in any network. We demonstrate the method on a range of example networks.

preprint2021arXiv

Bayesian inference of network structure from unreliable data

Most empirical studies of complex networks do not return direct, error-free measurements of network structure. Instead, they typically rely on indirect measurements that are often error-prone and unreliable. A fundamental problem in empirical network science is how to make the best possible estimates of network structure given such unreliable data. In this paper we describe a fully Bayesian method for reconstructing networks from observational data in any format, even when the data contain substantial measurement error and when the nature and magnitude of that error is unknown. The method is introduced through pedagogical case studies using real-world example networks, and specifically tailored to allow straightforward, computationally efficient implementation with a minimum of technical input. Computer code implementing the method is publicly available.

preprint2019arXiv

Consistency of community structure in complex networks

The most widely used techniques for community detection in networks, including methods based on modularity, statistical inference, and information theoretic arguments, all work by optimizing objective functions that measure the quality of network partitions. There is a good case to be made, however, that one should not look solely at the single optimal community structure under such an objective function, but rather at a selection of high-scoring structures. If one does this one typically finds that the resulting structures show considerable variation, and this has been taken as evidence that these community detection methods are unreliable, since they do not appear to give consistent answers. Here we argue that, upon closer inspection, the structures found are in fact consistent in a certain way. Specifically, we show that they can all be assembled from a set of underlying "building blocks", groups of network nodes that are usually found together in the same community. Different community structures correspond to different arrangements of blocks, but the blocks themselves are largely invariant. We propose an information theoretic method for discovering the building blocks in specific networks and demonstrate it with several example applications. We conclude that traditional community detection is not the failure some have suggested it is, and that in fact it gives a significant amount of insight into network structure, although perhaps not in exactly the way previously imagined.

preprint2019arXiv

Improved mutual information measure for classification and community detection

The information theoretic quantity known as mutual information finds wide use in classification and community detection analyses to compare two classifications of the same set of objects into groups. In the context of classification algorithms, for instance, it is often used to compare discovered classes to known ground truth and hence to quantify algorithm performance. Here we argue that the standard mutual information, as commonly defined, omits a crucial term which can become large under real-world conditions, producing results that can be substantially in error. We demonstrate how to correct this error and define a mutual information that works in all cases. We discuss practical implementation of the new measure and give some example applications.

preprint2016arXiv

Community detection in networks: Modularity optimization and maximum likelihood are equivalent

We demonstrate an exact equivalence between two widely used methods of community detection in networks, the method of modularity maximization in its generalized form which incorporates a resolution parameter controlling the size of the communities discovered, and the method of maximum likelihood applied to the special case of the stochastic block model known as the planted partition model, in which all communities in a network are assumed to have statistically similar properties. Among other things, this equivalence provides a mathematically principled derivation of the modularity function, clarifies the conditions and assumptions of its use, and gives an explicit formula for the optimal value of the resolution parameter.

preprint2016arXiv

Estimating the number of communities in a network

Community detection, the division of a network into dense subnetworks with only sparse connections between them, has been a topic of vigorous study in recent years. However, while there exist a range of powerful and flexible methods for dividing a network into a specified number of communities, it is an open question how to determine exactly how many communities one should use. Here we describe a mathematically principled approach for finding the number of communities in a network using a maximum-likelihood method. We demonstrate the approach on a range of real-world examples with known community structure, finding that it is able to determine the number of communities correctly in every case.

preprint2015arXiv

Community detection in networks with unequal groups

Recently, a phase transition has been discovered in the network community detection problem below which no algorithm can tell which nodes belong to which communities with success any better than a random guess. This result has, however, so far been limited to the case where the communities have the same size or the same average degree. Here we consider the case where the sizes or average degrees are different. This asymmetry allows us to assign nodes to communities with better-than- random success by examining their local neighborhoods. Using the cavity method, we show that this removes the detectability transition completely for networks with four groups or fewer, while for more than four groups the transition persists up to a critical amount of asymmetry but not beyond. The critical point in the latter case coincides with the point at which local information percolates, causing a global transition from a less-accurate solution to a more-accurate one.

preprint2015arXiv

Generalized communities in networks

A substantial volume of research has been devoted to studies of community structure in networks, but communities are not the only possible form of large-scale network structure. Here we describe a broad extension of community structure that encompasses traditional communities but includes a wide range of generalized structural patterns as well. We describe a principled method for detecting this generalized structure in empirical network data and demonstrate with real-world examples how it can be used to learn new things about the shape and meaning of networks.

preprint2015arXiv

Localization and centrality in networks

Eigenvector centrality is a common measure of the importance of nodes in a network. Here we show that under common conditions the eigenvector centrality displays a localization transition that causes most of the weight of the centrality to concentrate on a small number of nodes in the network. In this regime the measure is no longer useful for distinguishing among the remaining nodes and its efficacy as a network metric is impaired. As a remedy, we propose an alternative centrality measure based on the nonbacktracking matrix, which gives results closely similar to the standard eigenvector centrality in dense networks where the latter is well behaved, but avoids localization and gives useful results in regimes where the standard centrality fails.

preprint2015arXiv

Multiway spectral community detection in networks

One of the most widely used methods for community detection in networks is the maximization of the quality function known as modularity. Of the many maximization techniques that have been used in this context, some of the most conceptually attractive are the spectral methods, which are based on the eigenvectors of the modularity matrix. Spectral algorithms have, however, been limited by and large to the division of networks into only two or three communities, with divisions into more than three being achieved by repeated two-way division. Here we present a spectral algorithm that can directly divide a network into any number of communities. The algorithm makes use of a mapping from modularity maximization to a vector partitioning problem, combined with a fast heuristic for vector partitioning. We compare the performance of this spectral algorithm with previous approaches and find it to give superior results, particularly in cases where community sizes are unbalanced. We also give demonstrative applications of the algorithm to two real-world networks and find that it produces results in good agreement with expectations for the networks studied.

preprint2015arXiv

Structural inference for uncertain networks

In the study of networked systems such as biological, technological, and social networks the available data are often uncertain. Rather than knowing the structure of a network exactly, we know the connections between nodes only with a certain probability. In this paper we develop methods for the analysis of such uncertain data, focusing particularly on the problem of community detection. We give a principled maximum-likelihood method for inferring community structure and demonstrate how the results can be used to make improved estimates of the true structure of the network. Using computer-generated benchmark networks we demonstrate that our methods are able to reconstruct known communities more accurately than previous approaches based on data thresholding. We also give an example application to the detection of communities in a protein-protein interaction network.

preprint2015arXiv

Structure and inference in annotated networks

For many networks of scientific interest we know both the connections of the network and information about the network nodes, such as the age or gender of individuals in a social network, geographic location of nodes in the Internet, or cellular function of nodes in a gene regulatory network. Here we demonstrate how this "metadata" can be used to improve our analysis and understanding of network structure. We focus in particular on the problem of community detection in networks and develop a mathematically principled approach that combines a network and its metadata to detect communities more accurately than can be done with either alone. Crucially, the method does not assume that the metadata are correlated with the communities we are trying to find. Instead the method learns whether a correlation exists and correctly uses or ignores the metadata depending on whether they contain useful information. The learned correlations are also of interest in their own right, allowing us to make predictions about the community membership of nodes whose network connections are unknown. We demonstrate our method on synthetic networks with known structure and on real-world networks, large and small, drawn from social, biological, and technological domains.

preprint2014arXiv

Equitable random graphs

Random graph models have played a dominant role in the theoretical study of networked systems. The Poisson random graph of Erdos and Renyi, in particular, as well as the so-called configuration model, have served as the starting point for numerous calculations. In this paper we describe another large class of random graph models, which we call equitable random graphs and which are flexible enough to represent networks with diverse degree distributions and many nontrivial types of structure, including community structure, bipartite structure, degree correlations, stratification, and others, yet are exactly solvable for a wide range of properties in the limit of large graph size, including percolation properties, complete spectral density, and the behavior of homogeneous dynamical systems, such as coupled oscillators or epidemic models.

preprint2014arXiv

Identification of core-periphery structure in networks

Many networks can be usefully decomposed into a dense core plus an outlying, loosely-connected periphery. Here we propose an algorithm for performing such a decomposition on empirical network data using methods of statistical inference. Our method fits a generative model of core-periphery structure to observed data using a combination of an expectation--maximization algorithm for calculating the parameters of the model and a belief propagation algorithm for calculating the decomposition itself. We find the method to be efficient, scaling easily to networks with a million or more nodes and we test it on a range of networks, including real-world examples as well as computer-generated benchmarks, for which it successfully identifies known core-periphery structure with low error rate. We also demonstrate that the method is immune from the detectability transition observed in the related community detection problem, which prevents the detection of community structure when that structure is too weak. There is no such transition for core-periphery structure, which is detectable, albeit with some statistical error, no matter how weak it is.

preprint2014arXiv

Percolation on sparse networks

We study percolation on networks, which is used as a model of the resilience of networked systems such as the Internet to attack or failure and as a simple model of the spread of disease over human contact networks. We reformulate percolation as a message passing process and demonstrate how the resulting equations can be used to calculate, among other things, the size of the percolating cluster and the average cluster size. The calculations are exact for sparse networks when the number of short loops in the network is small, but even on networks with many short loops we find them to be highly accurate when compared with direct numerical simulations. By considering the fixed points of the message passing process, we also show that the percolation threshold on a network with few loops is given by the inverse of the leading eigenvalue of the so-called non-backtracking matrix.

preprint2013arXiv

Coauthorship and citation in scientific publishing

A large number of published studies have examined the properties of either networks of citation among scientific papers or networks of coauthorship among scientists. Here, using an extensive data set covering more than a century of physics papers published in the Physical Review, we study a hybrid coauthorship/citation network that combines the two, which we analyze to gain insight into the correlations and interactions between authorship and citation. Among other things, we investigate the extent to which individuals tend to cite themselves or their collaborators more than others, the extent to which they cite themselves or their collaborators more quickly after publication, and the extent to which they tend to return the favor of a citation from another scientist.

preprint2013arXiv

Community detection and graph partitioning

Many methods have been proposed for community detection in networks. Some of the most promising are methods based on statistical inference, which rest on solid mathematical foundations and return excellent results in practice. In this paper we show that two of the most widely used inference methods can be mapped directly onto versions of the standard minimum-cut graph partitioning problem, which allows us to apply any of the many well-understood partitioning algorithms to the solution of community detection problems. We illustrate the approach by adapting the Laplacian spectral partitioning method to perform community inference, testing the resulting algorithm on a range of examples, including computer-generated and real-world networks. Both the quality of the results and the running time rival the best previous methods.

preprint2013arXiv

First-principles multiway spectral partitioning of graphs

We consider the minimum-cut partitioning of a graph into more than two parts using spectral methods. While there exist well-established spectral algorithms for this problem that give good results, they have traditionally not been well motivated. Rather than being derived from first principles by minimizing graph cuts, they are typically presented without direct derivation and then proved after the fact to work. In this paper, we take a contrasting approach in which we start with a matrix formulation of the minimum cut problem and then show, via a relaxed optimization, how it can be mapped onto a spectral embedding defined by the leading eigenvectors of the graph Laplacian. The end result is an algorithm that is similar in spirit to, but different in detail from, previous spectral partitioning approaches. In tests of the algorithm we find that it outperforms previous approaches on certain particularly difficult partitioning problems.

preprint2013arXiv

Interacting epidemics and coinfection on contact networks

The spread of certain diseases can be promoted, in some cases substantially, by prior infection with another disease. One example is that of HIV, whose immunosuppressant effects significantly increase the chances of infection with other pathogens. Such coinfection processes, when combined with nontrivial structure in the contact networks over which diseases spread, can lead to complex patterns of epidemiological behavior. Here we consider a mathematical model of two diseases spreading through a single population, where infection with one disease is dependent on prior infection with the other. We solve exactly for the sizes of the outbreaks of both diseases in the limit of large population size, along with the complete phase diagram of the system. Among other things, we use our model to demonstrate how diseases can be controlled not only by reducing the rate of their spread, but also by reducing the spread of other infections upon which they depend.

preprint2013arXiv

Prediction of highly cited papers

In an article written five years ago [arXiv:0809.0522], we described a method for predicting which scientific papers will be highly cited in the future, even if they are currently not highly cited. Applying the method to real citation data we made predictions about papers we believed would end up being well cited. Here we revisit those predictions, five years on, to see how well we did. Among the over 2000 papers in our original data set, we examine the fifty that, by the measures of our previous study, were predicted to do best and we find that they have indeed received substantially more citations in the intervening years than other papers, even after controlling for the number of prior citations. On average these top fifty papers have received 23 times as many citations in the last five years as the average paper in the data set as a whole, and 15 times as many as the average paper in a randomly drawn control group that started out with the same number of citations. Applying our prediction technique to current data, we also make new predictions of papers that we believe will be well cited in the next few years.

preprint2013arXiv

Spectra of random graphs with community structure and arbitrary degrees

Using methods from random matrix theory researchers have recently calculated the full spectra of random networks with arbitrary degrees and with community structure. Both reveal interesting spectral features, including deviations from the Wigner semicircle distribution and phase transitions in the spectra of community structured networks. In this paper we generalize both calculations, giving a prescription for calculating the spectrum of a network with both community structure and an arbitrary degree distribution. In general the spectrum has two parts, a continuous spectral band, which can depart strongly from the classic semicircle form, and a set of outlying eigenvalues that indicate the presence of communities.

preprint2013arXiv

Spectral community detection in sparse networks

Spectral methods based on the eigenvectors of matrices are widely used in the analysis of network data, particularly for community detection and graph partitioning. Standard methods based on the adjacency matrix and related matrices, however, break down for very sparse networks, which includes many networks of practical interest. As a solution to this problem it has been recently proposed that we focus instead on the spectrum of the non-backtracking matrix, an alternative matrix representation of a network that shows better behavior in the sparse limit. Inspired by this suggestion, we here make use of a relaxation method to derive a spectral community detection algorithm that works well even in the sparse regime where other methods break down. Interestingly, however, the matrix at the heart of the method, it turns out, is not exactly the non-backtracking matrix, but a variant of it with a somewhat different definition. We study the behavior of this variant matrix for both artificial and real-world networks and find it to have desirable properties, especially in the common case of networks with broad degree distributions, for which it appears to have a better behaved spectrum and eigenvectors than the original non-backtracking matrix.

preprint2013arXiv

Spectral methods for network community detection and graph partitioning

We consider three distinct and well studied problems concerning network structure: community detection by modularity maximization, community detection by statistical inference, and normalized-cut graph partitioning. Each of these problems can be tackled using spectral algorithms that make use of the eigenvectors of matrix representations of the network. We show that with certain choices of the free parameters appearing in these spectral algorithms the algorithms for all three problems are, in fact, identical, and hence that, at least within the spectral approximations used here, there is no difference between the modularity- and inference-based community detection methods, or between either and graph partitioning.

preprint2013arXiv

The small-world effect is a modern phenomenon

The "small-world effect" is the observation that one can find a short chain of acquaintances, often of no more than a handful of individuals, connecting almost any two people on the planet. It is often expressed in the language of networks, where it is equivalent to the statement that most pairs of individuals are connected by a short path through the acquaintance network. Although the small-world effect is well-established empirically for contemporary social networks, we argue here that it is a relatively recent phenomenon, arising only in the last few hundred years: for most of mankind's tenure on Earth the social world was large, with most pairs of individuals connected by relatively long chains of acquaintances, if at all. Our conclusions are based on observations about the spread of diseases, which travel over contact networks between individuals and whose dynamics can give us clues to the structure of those networks even when direct network measurements are not available. As an example we consider the spread of the Black Death in 14th-century Europe, which is known to have traveled across the continent in well-defined waves of infection over the course of several years. Using established epidemiological models, we show that such wave-like behavior can occur only if contacts between individuals living far apart are exponentially rare. We further show that if long-distance contacts are exponentially rare, then the shortest chain of contacts between distant individuals is on average a long one. The observation of the wave-like spread of a disease like the Black Death thus implies a network without the small-world effect.

preprint2012arXiv

Friendship networks and social status

In empirical studies of friendship networks participants are typically asked, in interviews or questionnaires, to identify some or all of their close friends, resulting in a directed network in which friendships can, and often do, run in only one direction between a pair of individuals. Here we analyze a large collection of such networks representing friendships among students at US high and junior-high schools and show that the pattern of unreciprocated friendships is far from random. In every network, without exception, we find that there exists a ranking of participants, from low to high, such that almost all unreciprocated friendships consist of a lower-ranked individual claiming friendship with a higher-ranked one. We present a maximum-likelihood method for deducing such rankings from observed network data and conjecture that the rankings produced reflect a measure of social status. We note in particular that reciprocated and unreciprocated friendships obey different statistics, suggesting different formation processes, and that rankings are correlated with other characteristics of the participants that are traditionally associated with status, such as age and overall popularity as measured by total number of friends.

preprint2012arXiv

Graph spectra and the detectability of community structure in networks

We study networks that display community structure -- groups of nodes within which connections are unusually dense. Using methods from random matrix theory, we calculate the spectra of such networks in the limit of large size, and hence demonstrate the presence of a phase transition in matrix methods for community detection, such as the popular modularity maximization method. The transition separates a regime in which such methods successfully detect the community structure from one in which the structure is present but is not detected. By comparing these results with recent analyses of maximum-likelihood methods we are able to show that spectral modularity maximization is an optimal detection method in the sense that no other method will succeed in the regime where the modularity method fails.

preprint2012arXiv

Spectra of random graphs with arbitrary expected degrees

We study random graphs with arbitrary distributions of expected degree and derive expressions for the spectra of their adjacency and modularity matrices. We give a complete prescription for calculating the spectra that is exact in the limit of large network size and large vertex degrees. We also study the effect on the spectra of hubs in the network, vertices of unusually high degree, and show that these produce isolated eigenvalues outside the main spectral band, akin to impurity states in condensed matter systems, with accompanying eigenvectors that are strongly localized around the hubs. We also give numerical results that confirm our analytic expressions.

preprint2011arXiv

An efficient and principled method for detecting communities in networks

A fundamental problem in the analysis of network data is the detection of network communities, groups of densely interconnected nodes, which may be overlapping or disjoint. Here we describe a method for finding overlapping communities based on a principled statistical approach using generative network models. We show how the method can be implemented using a fast, closed-form expectation-maximization algorithm that allows us to analyze networks of millions of nodes in reasonable running times. We test the method both on real-world networks and on synthetic benchmarks and find that it gives results competitive with previous methods. We also show that the same approach can be used to extract nonoverlapping community divisions via a relaxation method, and demonstrate that the algorithm is competitively fast and accurate for the nonoverlapping problem.

preprint2011arXiv

Competing epidemics on complex networks

Human diseases spread over networks of contacts between individuals and a substantial body of recent research has focused on the dynamics of the spreading process. Here we examine a model of two competing diseases spreading over the same network at the same time, where infection with either disease gives an individual subsequent immunity to both. Using a combination of analytic and numerical methods, we derive the phase diagram of the system and estimates of the expected final numbers of individuals infected with each disease. The system shows an unusual dynamical transition between dominance of one disease and dominance of the other as a function of their relative rates of growth. Close to this transition the final outcomes show strong dependence on stochastic fluctuations in the early stages of growth, dependence that decreases with increasing network size, but does so sufficiently slowly as still to be easily visible in systems with millions or billions of individuals. In most regions of the phase diagram we find that one disease eventually dominates while the other reaches only a vanishing fraction of the network, but the system also displays a significant coexistence regime in which both diseases reach epidemic proportions and infect an extensive fraction of the network.

preprint2011arXiv

Complex Systems: A Survey

A complex system is a system composed of many interacting parts, often called agents, which displays collective behavior that does not follow trivially from the behaviors of the individual parts. Examples include condensed matter systems, ecosystems, stock markets and economies, biological evolution, and indeed the whole of human society. Substantial progress has been made in the quantitative understanding of complex systems, particularly since the 1980s, using a combination of basic theory, much of it derived from physics, and computer simulation. The subject is a broad one, drawing on techniques and ideas from a wide range of areas. Here I give a survey of the main themes and methods of complex systems science and an annotated bibliography of resources, ranging from classic papers to recent books and reviews.

preprint2010arXiv

A message passing approach for general epidemic models

In most models of the spread of disease over contact networks it is assumed that the probabilities per unit time of disease transmission and recovery from disease are constant, implying exponential distributions of the time intervals for transmission and recovery. Time intervals for real diseases, however, have distributions that in most cases are far from exponential, which leads to disagreements, both qualitative and quantitative, with the models. In this paper, we study a generalized version of the SIR (susceptible-infected-recovered) model of epidemic disease that allows for arbitrary distributions of transmission and recovery times. Standard differential equation approaches cannot be used for this generalized model, but we show that the problem can be reformulated as a time-dependent message passing calculation on the appropriate contact network. The calculation is exact on trees (i.e., loopless networks) or locally tree-like networks (such as random graphs) in the large system size limit. On non-tree-like networks we show that the calculation gives a rigorous bound on the size of disease outbreaks. We demonstrate the method with applications to two specific models and the results compare favorably with numerical simulations.

preprint2010arXiv

Random graphs containing arbitrary distributions of subgraphs

Traditional random graph models of networks generate networks that are locally tree-like, meaning that all local neighborhoods take the form of trees. In this respect such models are highly unrealistic, most real networks having strongly non-tree-like neighborhoods that contain short loops, cliques, or other biconnected subgraphs. In this paper we propose and analyze a new class of random graph models that incorporates general subgraphs, allowing for non-tree-like neighborhoods while still remaining solvable for many fundamental network properties. Among other things we give solutions for the size of the giant component, the position of the phase transition at which the giant component appears, and percolation properties for both site and bond percolation on networks generated by the model.

preprint2010arXiv

Stochastic blockmodels and community structure in networks

Stochastic blockmodels have been proposed as a tool for detecting community structure in networks as well as for generating synthetic networks for use as benchmarks. Most blockmodels, however, ignore variation in vertex degree, making them unsuitable for applications to real-world networks, which typically display broad degree distributions that can significantly distort the results. Here we demonstrate how the generalization of blockmodels to incorporate this missing element leads to an improved objective function for community detection in complex networks. We also propose a heuristic algorithm for community detection using this objective function or its non-degree-corrected counterpart and show that the degree-corrected version dramatically outperforms the uncorrected one in both real-world and synthetic networks.

preprint2008arXiv

The first-mover advantage in scientific publication

Mathematical models of the scientific citation process predict a strong "first-mover" effect under which the first papers in a field will, essentially regardless of content, receive citations at a rate enormously higher than papers published later. Moreover papers are expected to retain this advantage in perpetuity -- they should receive more citations indefinitely, no matter how many other papers are published after them. We test this conjecture against data from a selection of fields and in several cases find a first-mover effect of a magnitude similar to that predicted by the theory. Were we wearing our cynical hat today, we might say that the scientist who wants to become famous is better off -- by a wide margin -- writing a modest paper in next year's hottest field than an outstanding paper in this year's. On the other hand, there are some papers, albeit only a small fraction, that buck the trend and attract significantly more citations than theory predicts despite having relatively late publication dates. We suggest that papers of this kind, though they often receive comparatively few citations overall, are probably worthy of our attention.

preprint2006arXiv

Exact solutions for models of evolving networks with addition and deletion of nodes

There has been considerable recent interest in the properties of networks, such as citation networks and the worldwide web, that grow by the addition of vertices, and a number of simple solvable models of network growth have been studied. In the real world, however, many networks, including the web, not only add vertices but also lose them. Here we formulate models of the time evolution of such networks and give exact solutions for a number of cases of particular interest. For the case of net growth and so-called preferential attachment -- in which newly appearing vertices attach to previously existing ones in proportion to vertex degree -- we show that the resulting networks have power-law degree distributions, but with an exponent that diverges as the growth rate vanishes. We conjecture that the low exponent values observed in real-world networks are thus the result of vigorous growth in which the rate of addition of vertices far exceeds the rate of removal. Were growth to slow in the future, for instance in a more mature future version of the web, we would expect to see exponents increase, potentially without bound.

preprint2003arXiv

The structure and function of complex networks

Inspired by empirical studies of networked systems such as the Internet, social networks, and biological networks, researchers have in recent years developed a variety of techniques and models to help us understand or predict the behavior of these systems. Here we review developments in this field, including such concepts as the small-world effect, degree distributions, clustering, network correlations, random graph models, models of network growth and preferential attachment, and dynamical processes taking place on networks.

preprint2002arXiv

Mixing patterns and community structure in networks

Common experience suggests that many networks might possess community structure - division of vertices into groups, with a higher density of edges within groups than between them. Here we describe a new computer algorithm that detects structure of this kind. We apply the algorithm to a number of real-world networks and show that they do indeed possess non-trivial community structure. We suggest a possible explanation for this structure in the mechanism of assortative mixing, which is the preferential association of network vertices with others that are like them in some way. We show by simulation that this mechanism can indeed account for community structure. We also look in detail at one particular example of assortative mixing, namely mixing by vertex degree, in which vertices with similar degree prefer to be connected to one another. We propose a measure for mixing of this type which we apply to a variety of networks, and also discuss the implications for network structure and the formation of a giant component in assortatively mixed networks.

preprint2000arXiv

The structure of scientific collaboration networks

We investigate the structure of scientific collaboration networks. We consider two scientists to be connected if they have authored a paper together, and construct explicit networks of such connections using data drawn from a number of databases, including MEDLINE (biomedical research), the Los Alamos e-Print Archive (physics), and NCSTRL (computer science). We show that these collaboration networks form "small worlds" in which randomly chosen pairs of scientists are typically separated by only a short path of intermediate acquaintances. We further give results for mean and distribution of numbers of collaborators of authors, demonstrate the presence of clustering in the networks, and highlight a number of apparent differences in the patterns of collaboration between the fields studied.

preprint1999arXiv

Models of Extinction: A Review

We review recent work aimed at modeling species extinction over geological time. We discuss a number of models which, rather than dealing with the direct causes of particular extinction events, attempt to predict overall statistical trends, such as the relative frequencies of large and small extinctions, or the distribution of the lifetimes of species, genera or higher taxa. We also describe the available fossil and other data, and compare the trends visible in these data with the predictions of the models.

preprint1999arXiv

The physical limits of communication

It has been well-known since the pioneering work of Claude Shannon in the 1940s that a message transmitted with optimal efficiency over a channel of limited bandwidth is indistinguishable from random noise to a receiver who is unfamiliar with the language in which the message is written. In this letter we demonstrate an equivalent result about electromagnetic transmissions. We show that when electromagnetic radiation is used as the transmission medium, the most information-efficient format for a given message is indistinguishable from black-body radiation to a receiver who is unfamiliar with that format. The characteristic temperature of the radiation is set by the amount of energy used to make the transmission. If information is not encoded in the direction of the radiation, but only its timing, energy or polarization, then the most efficient format has the form of a one-dimensional black-body spectrum which is easily distinguished from the three-dimensional case.

preprint1998arXiv

Error estimation in the histogram Monte Carlo method

We examine the sources of error in the histogram reweighting method for Monte Carlo data analysis. We demonstrate that, in addition to the standard statistical error which has been studied elsewhere, there are two other sources of error, one arising through correlations in the reweighted samples, and one arising from the finite range of energies sampled by a simulation of finite length. We demonstrate that while the former correction is usually negligible by comparison with statistical fluctuations, the latter may not be, and give criteria for judging the range of validity of histogram extrapolations based on the size of this latter correction.

preprint1996arXiv

The repton model of gel electrophoresis

We discuss the repton model of agarose gel electrophoresis of DNA. We review previous results, both analytic and numerical, as well as presenting a new numerical algorithm for the efficient simulation of the model, and suggesting a new approach to the model's analytic solution.

preprint1995arXiv

Phason elasticity of a three-dimensional quasicrystal: transfer-matrix method

We introduce a new transfer matrix method for calculating the thermodynamic properties of random-tiling models of quasicrystals in any number of dimensions, and describe how it may be used to calculate the phason elastic properties of these models, which are related to experimental measurables such as phason Debye-Waller factors, and diffuse scattering wings near Bragg peaks. We apply our method to the canonical-cell model of the icosahedral phase, making use of results from a previously-presented calculation in which the possible structures for this model under specific periodic boundary conditions were cataloged using a computational technique. We give results for the configurational entropy density and the two fundamental elastic constants for a range of system sizes. The method is general enough allow a similar calculation to be performed for any other random tiling model.

M. E. J. Newman

What is connected

Connect this record

See the researcher in context

Building this map preview

45 published item(s)

Clustering of heterogeneous populations of networks

Cutting Through the Noise to Infer Autonomous System Topology

Representative community divisions of networks

Bayesian inference of network structure from unreliable data

Consistency of community structure in complex networks

Improved mutual information measure for classification and community detection

Community detection in networks: Modularity optimization and maximum likelihood are equivalent

Estimating the number of communities in a network

Community detection in networks with unequal groups

Generalized communities in networks

Localization and centrality in networks

Multiway spectral community detection in networks

Structural inference for uncertain networks

Structure and inference in annotated networks

Equitable random graphs

Identification of core-periphery structure in networks

Percolation on sparse networks

Coauthorship and citation in scientific publishing

Community detection and graph partitioning

First-principles multiway spectral partitioning of graphs

Interacting epidemics and coinfection on contact networks

Prediction of highly cited papers

Spectra of random graphs with community structure and arbitrary degrees

Spectral community detection in sparse networks

Spectral methods for network community detection and graph partitioning

The small-world effect is a modern phenomenon

Friendship networks and social status

Graph spectra and the detectability of community structure in networks

Spectra of random graphs with arbitrary expected degrees

An efficient and principled method for detecting communities in networks

Competing epidemics on complex networks

Complex Systems: A Survey

A message passing approach for general epidemic models

Random graphs containing arbitrary distributions of subgraphs

Stochastic blockmodels and community structure in networks

The first-mover advantage in scientific publication

Exact solutions for models of evolving networks with addition and deletion of nodes

The structure and function of complex networks

Mixing patterns and community structure in networks

The structure of scientific collaboration networks

Models of Extinction: A Review

The physical limits of communication

Error estimation in the histogram Monte Carlo method

The repton model of gel electrophoresis

Phason elasticity of a three-dimensional quasicrystal: transfer-matrix method