Source author record

Tiago P. Peixoto

Tiago P. Peixoto appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

physics.soc-ph Social and Information Networks physics.data-an Machine Learning cond-mat.stat-mech cond-mat.dis-nn Biological Physics physics.comp-ph Applications Molecular Networks q-fin.GN Computer Science and Game Theory Methodology nlin.CG physics.gen-ph physics.geo-ph Populations and Evolution Quantitative Methods

Catalog footprint

What is connected

27works

18topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Disentangling homophily, community structure and triadic closure in networks

Network homophily, the tendency of similar nodes to be connected, and transitivity, the tendency of two nodes being connected if they share a common neighbor, are conflated properties in network analysis, since one mechanism can drive the other. Here we present a generative model and corresponding inference procedure that are capable of distinguishing between both mechanisms. Our approach is based on a variation of the stochastic block model (SBM) with the addition of triadic closure edges, and its inference can identify the most plausible mechanism responsible for the existence of every edge in the network, in addition to the underlying community structure itself. We show how the method can evade the detection of spurious communities caused solely by the formation of triangles in the network, and how it can improve the performance of edge prediction when compared to the pure version of the SBM without triadic closure.

preprint2022arXiv

Hypergraph reconstruction from network data

Networks can describe the structure of a wide variety of complex systems by specifying which pairs of entities in the system are connected. While such pairwise representations are flexible, they are not necessarily appropriate when the fundamental interactions involve more than two entities at the same time. Pairwise representations nonetheless remain ubiquitous, because higher-order interactions are often not recorded explicitly in network data. Here, we introduce a Bayesian approach to reconstruct latent higher-order interactions from ordinary pairwise network data. Our method is based on the principle of parsimony and only includes higher-order structures when there is sufficient statistical evidence for them. We demonstrate its applicability to a wide range of datasets, both synthetic and empirical.

preprint2022arXiv

Ordered community detection in directed networks

We develop a method to infer community structure in directed networks where the groups are ordered in a latent one-dimensional hierarchy that determines the preferred edge direction. Our nonparametric Bayesian approach is based on a modification of the stochastic block model (SBM), which can take advantage of rank alignment and coherence to produce parsimonious descriptions of networks that combine ordered hierarchies with arbitrary mixing patterns between groups. Since our model also includes directed degree correction, we can use it to distinguish non-local hierarchical structure from local in- and out-degree imbalance -- thus removing a source of conflation present in most ranking methods. We also demonstrate how we can reliably compare with the results obtained with the unordered SBM variant to determine whether a hierarchical ordering is statistically warranted in the first place. We illustrate the application of our method on a wide variety of empirical networks across several domains.

preprint2022arXiv

Systematic assessment of the quality of fit of the stochastic block model for empirical networks

We perform a systematic analysis of the quality of fit of the stochastic block model (SBM) for 275 empirical networks spanning a wide range of domains and orders of size magnitude. We employ posterior predictive model checking as a criterion to assess the quality of fit, which involves comparing networks generated by the inferred model with the empirical network, according to a set of network descriptors. We observe that the SBM is capable of providing an accurate description for the majority of networks considered, but falls short of saturating all modeling requirements. In particular, networks possessing a large diameter and slow-mixing random walks tend to be badly described by the SBM. However, contrary to what is often assumed, networks with a high abundance of triangles can be well described by the SBM in many cases. We demonstrate that simple network descriptors can be used to evaluate whether or not the SBM can provide a sufficiently accurate representation, potentially pointing to possible model extensions that can systematically improve the expressiveness of this class of models.

preprint2020arXiv

Latent Poisson models for networks with heterogeneous density

Empirical networks are often globally sparse, with a small average number of connections per node, when compared to the total size of the network. However, this sparsity tends not to be homogeneous, and networks can also be locally dense, for example with a few nodes connecting to a large fraction of the rest of the network, or with small groups of nodes with a large probability of connections between them. Here we show how latent Poisson models which generate hidden multigraphs can be effective at capturing this density heterogeneity, while being more tractable mathematically than some of the alternatives that model simple graphs directly. We show how these latent multigraphs can be reconstructed from data on simple graphs, and how this allows us to disentangle disassortative degree-degree correlations from the constraints of imposed degree sequences, and to improve the identification of community structure in empirically relevant scenarios.

preprint2020arXiv

Merge-split Markov chain Monte Carlo for community detection

We present a Markov chain Monte Carlo scheme based on merges and splits of groups that is capable of efficiently sampling from the posterior distribution of network partitions, defined according to the stochastic block model (SBM). We demonstrate how schemes based on the move of single nodes between groups systematically fail at correctly sampling from the posterior distribution even on small networks, and how our merge-split approach behaves significantly better, and improves the mixing time of the Markov chain by several orders of magnitude in typical cases. We also show how the scheme can be straightforwardly extended to nested versions of the SBM, yielding asymptotically exact samples of hierarchical network partitions.

preprint2016arXiv

Network structure, metadata and the prediction of missing nodes and annotations

The empirical validation of community detection methods is often based on available annotations on the nodes that serve as putative indicators of the large-scale network structure. Most often, the suitability of the annotations as topological descriptors itself is not assessed, and without this it is not possible to ultimately distinguish between actual shortcomings of the community detection algorithms on one hand, and the incompleteness, inaccuracy or structured nature of the data annotations themselves on the other. In this work we present a principled method to access both aspects simultaneously. We construct a joint generative model for the data and metadata, and a nonparametric Bayesian framework to infer its parameters from annotated datasets. We assess the quality of the metadata not according to its direct alignment with the network communities, but rather in its capacity to predict the placement of edges in the network. We also show how this feature can be used to predict the connections to missing nodes when only the metadata is available, as well as missing metadata. By investigating a wide range of datasets, we show that while there are seldom exact agreements between metadata tokens and the inferred data groups, the metadata is often informative of the network structure nevertheless, and can improve the prediction of missing nodes. This shows that the method uncovers meaningful patterns in both the data and metadata, without requiring or expecting a perfect agreement between the two.

preprint2015arXiv

Generalized communities in networks

A substantial volume of research has been devoted to studies of community structure in networks, but communities are not the only possible form of large-scale network structure. Here we describe a broad extension of community structure that encompasses traditional communities but includes a wide range of generalized structural patterns as well. We describe a principled method for detecting this generalized structure in empirical network data and demonstrate with real-world examples how it can be used to learn new things about the shape and meaning of networks.

preprint2015arXiv

Inferring the mesoscale structure of layered, edge-valued and time-varying networks

Many network systems are composed of interdependent but distinct types of interactions, which cannot be fully understood in isolation. These different types of interactions are often represented as layers, attributes on the edges or as a time-dependence of the network structure. Although they are crucial for a more comprehensive scientific understanding, these representations offer substantial challenges. Namely, it is an open problem how to precisely characterize the large or mesoscale structure of network systems in relation to these additional aspects. Furthermore, the direct incorporation of these features invariably increases the effective dimension of the network description, and hence aggravates the problem of overfitting, i.e. the use of overly-complex characterizations that mistake purely random fluctuations for actual structure. In this work, we propose a robust and principled method to tackle these problems, by constructing generative models of modular network structure, incorporating layered, attributed and time-varying properties, as well as a nonparametric Bayesian methodology to infer the parameters from data and select the most appropriate model according to statistical evidence. We show that the method is capable of revealing hidden structure in layered, edge-valued and time-varying networks, and that the most appropriate level of granularity with respect to the additional dimensions can be reliably identified. We illustrate our approach on a variety of empirical systems, including a social network of physicians, the voting correlations of deputies in the Brazilian national congress, the global airport network, and a proximity network of high-school students.

preprint2015arXiv

Model selection and hypothesis testing for large-scale network models with overlapping groups

The effort to understand network systems in increasing detail has resulted in a diversity of methods designed to extract their large-scale structure from data. Unfortunately, many of these methods yield diverging descriptions of the same network, making both the comparison and understanding of their results a difficult challenge. A possible solution to this outstanding issue is to shift the focus away from ad hoc methods and move towards more principled approaches based on statistical inference of generative models. As a result, we face instead the more well-defined task of selecting between competing generative processes, which can be done under a unified probabilistic framework. Here, we consider the comparison between a variety of generative models including features such as degree correction, where nodes with arbitrary degrees can belong to the same group, and community overlap, where nodes are allowed to belong to more than one group. Because such model variants possess an increasing number of parameters, they become prone to overfitting. In this work, we present a method of model selection based on the minimum description length criterion and posterior odds ratios that is capable of fully accounting for the increased degrees of freedom of the larger models, and selects the best one according to the statistical evidence available in the data. In applying this method to many empirical unweighted networks from different fields, we observe that community overlap is very often not supported by statistical evidence and is selected as a better model only for a minority of them. On the other hand, we find that degree correction tends to be almost universally favored by the available data, implying that intrinsic node proprieties (as opposed to group properties) are often an essential ingredient of network formation.

preprint2015arXiv

Sampling motif-constrained ensembles of networks

The statistical significance of network properties is conditioned on null models which satisfy spec- ified properties but that are otherwise random. Exponential random graph models are a principled theoretical framework to generate such constrained ensembles, but which often fail in practice, either due to model inconsistency, or due to the impossibility to sample networks from them. These problems affect the important case of networks with prescribed clustering coefficient or number of small connected subgraphs (motifs). In this paper we use the Wang-Landau method to obtain a multicanonical sampling that overcomes both these problems. We sample, in polynomial time, net- works with arbitrary degree sequences from ensembles with imposed motifs counts. Applying this method to social networks, we investigate the relation between transitivity and homophily, and we quantify the correlation between different types of motifs, finding that single motifs can explain up to 60% of the variation of motif profiles.

preprint2014arXiv

Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models

We present an efficient algorithm for the inference of stochastic block models in large networks. The algorithm can be used as an optimized Markov chain Monte Carlo (MCMC) method, with a fast mixing time and a much reduced susceptibility to getting trapped in metastable states, or as a greedy agglomerative heuristic, with an almost linear $O(N\ln^2N)$ complexity, where $N$ is the number of nodes in the network, independent on the number of blocks being inferred. We show that the heuristic is capable of delivering results which are indistinguishable from the more exact and numerically expensive MCMC method in many artificial and empirical networks, despite being much faster. The method is entirely unbiased towards any specific mixing pattern, and in particular it does not favor assortative community structures.

preprint2014arXiv

Hierarchical Block Structures and High-resolution Model Selection in Large Networks

Discovering and characterizing the large-scale topological features in empirical networks are crucial steps in understanding how complex systems function. However, most existing methods used to obtain the modular structure of networks suffer from serious problems, such as being oblivious to the statistical evidence supporting the discovered patterns, which results in the inability to separate actual structure from noise. In addition to this, one also observes a resolution limit on the size of communities, where smaller but well-defined clusters are not detectable when the network becomes large. This phenomenon occurs not only for the very popular approach of modularity optimization, which lacks built-in statistical validation, but also for more principled methods based on statistical inference and model selection, which do incorporate statistical validation in a formally correct way. Here we construct a nested generative model that, through a complete description of the entire network hierarchy at multiple scales, is capable of avoiding this limitation, and enables the detection of modular structure at levels far beyond those possible with current approaches. Even with this increased resolution, the method is based on the principle of parsimony, and is capable of separating signal from noise, and thus will not lead to the identification of spurious modules even on sparse networks. Furthermore, it fully generalizes other approaches in that it is not restricted to purely assortative mixing patterns, directed or undirected graphs, and ad hoc hierarchical structures such as binary trees. Despite its general character, the approach is tractable, and can be combined with advanced techniques of community detection to yield an efficient algorithm that scales well for very large networks.

preprint2013arXiv

Eigenvalue Spectra of Modular Networks

A large variety of dynamical processes that take place on networks can be expressed in terms of the spectral properties of some linear operator which reflects how the dynamical rules depend on the network topology. Often such spectral features are theoretically obtained by considering only local node properties, such as degree distributions. Many networks, however, possess large-scale modular structures that can drastically influence their spectral characteristics, and which are neglected in such simplified descriptions. Here we obtain in a unified fashion the spectrum of a large family of operators, including the adjacency, Laplacian and normalized Laplacian matrices, for networks with generic modular structure, in the limit of large degrees. We focus on the conditions necessary for the merging of the isolated eigenvalues with the continuous band of the spectrum, after which the planted modular structure can no longer be easily detected by spectral methods. This is a crucial transition point which determines when a modular structure is strong enough to affect a given dynamical process. We show that this transition happens in general at different points for the different matrices, and hence the detectability threshold can vary significantly depending on the operator chosen. Equivalently, the sensitivity to the modular structure of the different dynamical processes associated with each matrix will be different, given the same large-scale structure present in the network. Furthermore, we show that, with the exception of the Laplacian matrix, the different transitions coalesce into the same point for the special case where the modules are homogeneous, but separate otherwise.

preprint2013arXiv

Entropy of stochastic blockmodel ensembles

Stochastic blockmodels are generative network models where the vertices are separated into discrete groups, and the probability of an edge existing between two vertices is determined solely by their group membership. In this paper, we derive expressions for the entropy of stochastic blockmodel ensembles. We consider several ensemble variants, including the traditional model as well as the newly introduced degree-corrected version [Karrer et al. Phys. Rev. E 83, 016107 (2011)], which imposes a degree sequence on the vertices, in addition to the block structure. The imposed degree sequence is implemented both as "soft" constraints, where only the expected degrees are imposed, and as "hard" constraints, where they are required to be the same on all samples of the ensemble. We also consider generalizations to multigraphs and directed graphs. We illustrate one of many applications of this measure by directly deriving a log-likelihood function from the entropy expression, and using it to infer latent block structure in observed data. Due to the general nature of the ensembles considered, the method works well for ensembles with intrinsic degree correlations (i.e. with entropic origin) as well as extrinsic degree correlations, which go beyond the block structure.

preprint2013arXiv

No need for conspiracy: Self-organized cartel formation in a modified trust game

We investigate the dynamics of a trust game on a mixed population where individuals with the role of buyers are forced to play against a predetermined number of sellers, whom they choose dynamically. Agents with the role of sellers are also allowed to adapt the level of value for money of their products, based on payoff. The dynamics undergoes a transition at a specific value of the strategy update rate, above which an emergent cartel organization is observed, where sellers have similar values of below optimal value for money. This cartel organization is not due to an explicit collusion among agents; instead it arises spontaneously from the maximization of the individual payoffs. This dynamics is marked by large fluctuations and a high degree of unpredictability for most of the parameter space, and serves as a plausible qualitative explanation for observed elevated levels and fluctuations of certain commodity prices.

preprint2013arXiv

Parsimonious module inference in large networks

We investigate the detectability of modules in large networks when the number of modules is not known in advance. We employ the minimum description length (MDL) principle which seeks to minimize the total amount of information required to describe the network, and avoid overfitting. According to this criterion, we obtain general bounds on the detectability of any prescribed block structure, given the number of nodes and edges in the sampled network. We also obtain that the maximum number of detectable blocks scales as $\sqrt{N}$, where $N$ is the number of nodes in the network, for a fixed average degree $<k>$. We also show that the simplicity of the MDL approach yields an efficient multilevel Monte Carlo inference algorithm with a complexity of $O(τN\log N)$, if the number of blocks is unknown, and $O(τN)$ if it is known, where $τ$ is the mixing time of the Markov chain. We illustrate the application of the method on a large network of actors and films with over $10^6$ edges, and a dissortative, bipartite block structure.

preprint2013arXiv

Spontaneous centralization of control in a network of company ownerships

We introduce a model for the adaptive evolution of a network of company ownerships. In a recent work it has been shown that the empirical global network of corporate control is marked by a central, tightly connected "core" made of a small number of large companies which control a significant part of the global economy. Here we show how a simple, adaptive "rich get richer" dynamics can account for this characteristic, which incorporates the increased buying power of more influential companies, and in turn results in even higher control. We conclude that this kind of centralized structure can emerge without it being an explicit goal of these companies, or as a result of a well-organized strategy.

preprint2012arXiv

Emergence of robustness against noise: A structural phase transition in evolved models of gene regulatory networks

We investigate the evolution of Boolean networks subject to a selective pressure which favors robustness against noise, as a model of evolved genetic regulatory systems. By mapping the evolutionary process into a statistical ensemble and minimizing its associated free energy, we find the structural properties which emerge as the selective pressure is increased and identify a phase transition from a random topology to a "segregated core" structure, where a smaller and more densely connected subset of the nodes is responsible for most of the regulation in the network. This segregated structure is very similar qualitatively to what is found in gene regulatory networks, where only a much smaller subset of genes --- those responsible for transcription factors --- is responsible for global regulation. We obtain the full phase diagram of the evolutionary process as a function of selective pressure and the average number of inputs per node. We compare the theoretical predictions with Monte Carlo simulations of evolved networks and with empirical data for Saccharomyces cerevisiae and Escherichia coli.

preprint2012arXiv

Evolution of robust network topologies: Emergence of central backbones

We model the robustness against random failure or intentional attack of networks with arbitrary large-scale structure. We construct a block-based model which incorporates --- in a general fashion --- both connectivity and interdependence links, as well as arbitrary degree distributions and block correlations. By optimizing the percolation properties of this general class of networks, we identify a simple core-periphery structure as the topology most robust against random failure. In such networks, a distinct and small "core" of nodes with higher degree is responsible for most of the connectivity, functioning as a central "backbone" of the system. This centralized topology remains the optimal structure when other constraints are imposed, such as a given fraction of interdependence links and fixed degree distributions. This distinguishes simple centralized topologies as the most likely to emerge, when robustness against failure is the dominant evolutionary force.

preprint2012arXiv

The behavior of noise-resilient Boolean networks with diverse topologies

The dynamics of noise-resilient Boolean networks with majority functions and diverse topologies is investigated. A wide class of possible topological configurations is parametrized as a stochastic blockmodel. For this class of networks, the dynamics always undergoes a phase transition from a non-ergodic regime, where a memory of its past states is preserved, to an ergodic regime, where no such memory exists and every microstate is equally probable. Both the average error on the network, as well as the critical value of noise where the transition occurs are investigated analytically, and compared to numerical simulations. The results for "partially dense" networks, comprised of relatively few, but dynamically important nodes, which have a number of inputs which greatly exceeds the average for the entire network, give very general upper bounds on the maximum resilience against noise attainable on globally sparse systems.

preprint2010arXiv

Boolean networks with robust and reliable trajectories

We construct and investigate Boolean networks that follow a given reliable trajectory in state space, which is insensitive to fluctuations in the updating schedule, and which is also robust against noise. Robustness is quantified as the probability that the dynamics return to the reliable trajectory after a perturbation of the state of a single node. In order to achieve high robustness, we navigate through the space of possible update functions by using an evolutionary algorithm. We constrain the networks to having the minimum number of connections required to obtain the reliable trajectory. Surprisingly, we find that robustness always reaches values close to 100 percent during the evolutionary optimization process. The set of update functions can be evolved such that it differs only slightly from that of networks that were not optimized with respect to robustness. The state space of the optimized networks is dominated by the basin of attraction of the reliable trajectory.

preprint2010arXiv

Redundancy and error resilience in Boolean Networks

We consider the effect of noise in sparse Boolean Networks with redundant functions. We show that they always exhibit a non-zero error level, and the dynamics undergoes a phase transition from non-ergodicity to ergodicity, as a function of noise, after which the system is no longer capable of preserving a memory if its initial state. We obtain upper-bounds on the critical value of noise for networks of different sparsity.

preprint2010arXiv

Spatiotemporal correlations of aftershock sequences

Aftershock sequences are of particular interest in seismic research since they may condition seismic activity in a given region over long time spans. While they are typically identified with periods of enhanced seismic activity after a large earthquake as characterized by the Omori law, our knowledge of the spatiotemporal correlations between events in an aftershock sequence is limited. Here, we study the spatiotemporal correlations of two aftershock sequences form California (Parkfield and Hector Mine) using the recently introduced concept of "recurrent" events. We find that both sequences have very similar properties and that most of them are captured by the space-time epidemic-type aftershock sequence (ETAS) model if one takes into account catalog incompleteness. However, the stochastic model does not capture the spatiotemporal correlations leading to the observed structure of seismicity on small spatial scales.

preprint2010arXiv

The phase diagram of random Boolean networks with nested canalizing functions

We obtain the phase diagram of random Boolean networks with nested canalizing functions. Using the annealed approximation, we obtain the evolution of the number $b_t$ of nodes with value one, and the network sensitivity $λ$, and we compare with numerical simulations of quenched networks. We find that, contrary to what was reported by Kauffman et al. [Proc. Natl. Acad. Sci. 2004 101 49 17102-7], these networks have a rich phase diagram, were both the "chaotic" and frozen phases are present, as well as an oscillatory regime of the value of $b_t$. We argue that the presence of only the frozen phase in the work of Kauffman et al. was due simply to the specific parametrization used, and is not an inherent feature of this class of functions. However, these networks are significantly more stable than the variants where all possible Boolean functions are allowed.

preprint2010arXiv

Trust transitivity in social networks

Non-centralized recommendation-based decision making is a central feature of several social and technological processes, such as market dynamics, peer-to-peer file-sharing and the web of trust of digital certification. We investigate the properties of trust propagation on networks, based on a simple metric of trust transitivity. We investigate analytically the percolation properties of trust transitivity in random networks with arbitrary degree distribution, and compare with numerical realizations. We find that the existence of a non-zero fraction of absolute trust (i.e. entirely confident trust) is a requirement for the viability of global trust propagation in large systems: The average pair-wise trust is marked by a discontinuous transition at a specific fraction of absolute trust, below which it vanishes. Furthermore, we perform an extensive analysis of the Pretty Good Privacy (PGP) web of trust, in view of the concepts introduced. We compare different scenarios of trust distribution: community- and authority-centered. We find that these scenarios lead to sharply different patterns of trust propagation, due to the segregation of authority hubs and densely-connected communities. While the authority-centered scenario is more efficient, and leads to higher average trust values, it favours weakly-connected "fringe" nodes, which are directly trusted by authorities. The community-centered scheme, on the other hand, favours nodes with intermediate degrees, in detriment of the authorities and its "fringe" peers.

preprint2009arXiv

Boolean networks with reliable dynamics

We investigated the properties of Boolean networks that follow a given reliable trajectory in state space. A reliable trajectory is defined as a sequence of states which is independent of the order in which the nodes are updated. We explored numerically the topology, the update functions, and the state space structure of these networks, which we constructed using a minimum number of links and the simplest update functions. We found that the clustering coefficient is larger than in random networks, and that the probability distribution of three-node motifs is similar to that found in gene regulation networks. Among the update functions, only a subset of all possible functions occur, and they can be classified according to their probability. More homogeneous functions occur more often, leading to a dominance of canalyzing functions. Finally, we studied the entire state space of the networks. We observed that with increasing systems size, fixed points become more dominant, moving the networks close to the frozen phase.

Tiago P. Peixoto

What is connected

Connect this record

See the researcher in context

Building this map preview

27 published item(s)

Disentangling homophily, community structure and triadic closure in networks

Hypergraph reconstruction from network data

Ordered community detection in directed networks

Systematic assessment of the quality of fit of the stochastic block model for empirical networks

Latent Poisson models for networks with heterogeneous density

Merge-split Markov chain Monte Carlo for community detection

Network structure, metadata and the prediction of missing nodes and annotations

Generalized communities in networks

Inferring the mesoscale structure of layered, edge-valued and time-varying networks

Model selection and hypothesis testing for large-scale network models with overlapping groups

Sampling motif-constrained ensembles of networks

Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models

Hierarchical Block Structures and High-resolution Model Selection in Large Networks

Eigenvalue Spectra of Modular Networks

Entropy of stochastic blockmodel ensembles

No need for conspiracy: Self-organized cartel formation in a modified trust game

Parsimonious module inference in large networks

Spontaneous centralization of control in a network of company ownerships

Emergence of robustness against noise: A structural phase transition in evolved models of gene regulatory networks

Evolution of robust network topologies: Emergence of central backbones

The behavior of noise-resilient Boolean networks with diverse topologies

Boolean networks with robust and reliable trajectories

Redundancy and error resilience in Boolean Networks

Spatiotemporal correlations of aftershock sequences

The phase diagram of random Boolean networks with nested canalizing functions

Trust transitivity in social networks

Boolean networks with reliable dynamics