Source author record

Jean-Gabriel Young

Jean-Gabriel Young appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

physics.soc-ph Social and Information Networks cond-mat.stat-mech Applications Machine Learning Populations and Evolution cs.CY Genomics math.DS Molecular Networks Networking and Internet Architecture physics.data-an Software Engineering

Catalog footprint

What is connected

18works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Clustering of heterogeneous populations of networks

Statistical methods for reconstructing networks from repeated measurements typically assume that all measurements are generated from the same underlying network structure. This need not be the case, however. People's social networks might be different on weekdays and weekends, for instance. Brain networks may differ between healthy patients and those with dementia or other conditions. Here we describe a Bayesian analysis framework for such data that allows for the fact that network measurements may be reflective of multiple possible structures. We define a finite mixture model of the measurement process and derive a fast Gibbs sampling procedure that samples exactly from the full posterior distribution of model parameters. The end result is a clustering of the measured networks into groups with similar structure. We demonstrate the method on both real and synthetic network populations.

preprint2022arXiv

Cutting Through the Noise to Infer Autonomous System Topology

The Border Gateway Protocol (BGP) is a distributed protocol that manages interdomain routing without requiring a centralized record of which autonomous systems (ASes) connect to which others. Many methods have been devised to infer the AS topology from publicly available BGP data, but none provide a general way to handle the fact that the data are notoriously incomplete and subject to error. This paper describes a method for reliably inferring AS-level connectivity in the presence of measurement error using Bayesian statistical inference acting on BGP routing tables from multiple vantage points. We employ a novel approach for counting AS adjacency observations in the AS-PATH attribute data from public route collectors, along with a Bayesian algorithm to generate a statistical estimate of the AS-level network. Our approach also gives us a way to evaluate the accuracy of existing reconstruction methods and to identify advantageous locations for new route collectors or vantage points.

preprint2022arXiv

Hypergraph reconstruction from network data

Networks can describe the structure of a wide variety of complex systems by specifying which pairs of entities in the system are connected. While such pairwise representations are flexible, they are not necessarily appropriate when the fundamental interactions involve more than two entities at the same time. Pairwise representations nonetheless remain ubiquitous, because higher-order interactions are often not recorded explicitly in network data. Here, we introduce a Bayesian approach to reconstruct latent higher-order interactions from ordinary pairwise network data. Our method is based on the principle of parsimony and only includes higher-order structures when there is sufficient statistical evidence for them. We demonstrate its applicability to a wide range of datasets, both synthetic and empirical.

preprint2022arXiv

The OCEAN mailing list data set: Network analysis spanning mailing lists and code repositories

Communication surrounding the development of an open source project largely occurs outside the software repository itself. Historically, large communities often used a collection of mailing lists to discuss the different aspects of their projects. Multimodal tool use, with software development and communication happening on different channels, complicates the study of open source projects as a sociotechnical system. Here, we combine and standardize mailing lists of the Python community, resulting in 954,287 messages from 1995 to the present. We share all scraping and cleaning code to facilitate reproduction of this work, as well as smaller datasets for the Golang (122,721 messages), Angular (20,041 messages) and Node.js (12,514 messages) communities. To showcase the usefulness of these data, we focus on the CPython repository and merge the technical layer (which GitHub account works on what file and with whom) with the social layer (messages from unique email addresses) by identifying 33% of GitHub contributors in the mailing list data. We then explore correlations between the valence of social messaging and the structure of the collaboration network. We discuss how these data provide a laboratory to test theories from standard organizational science in large open source projects.

preprint2022arXiv

The Promise of Cross-Species Coexpression Analysis in Studying the Coevolution and Ecology of Host-Symbiont Interactions

Measuring gene expression simultaneously in both hosts and symbionts offers a powerful approach to explore the biology underlying species interactions. Such dual or simultaneous RNAseq approaches have primarily been used to gain insight into gene function in model systems, but there is opportunity to expand and apply these tools in new ways to understand ecological and evolutionary questions. By incorporating genetic diversity in both hosts and symbionts and studying how gene expression is correlated between partner species, we can gain new insight into host-symbiont coevolution and the ecology of species interactions. In this perspective, we explore how these relatively new tools could be applied to study such questions. We review the mechanisms that could be generating patterns of cross-species gene coexpression, including indirect genetic effects and selective filters, how these tools could be applied across different biological and temporal scales, and outline other methodological considerations and experiment possibilities.

preprint2021arXiv

Bayesian inference of network structure from unreliable data

Most empirical studies of complex networks do not return direct, error-free measurements of network structure. Instead, they typically rely on indirect measurements that are often error-prone and unreliable. A fundamental problem in empirical network science is how to make the best possible estimates of network structure given such unreliable data. In this paper we describe a fully Bayesian method for reconstructing networks from observational data in any format, even when the data contain substantial measurement error and when the nature and magnitude of that error is unknown. The method is introduced through pedagogical case studies using real-world example networks, and specifically tailored to allow straightforward, computationally efficient implementation with a minimum of technical input. Computer code implementing the method is publicly available.

preprint2020arXiv

Countering hate on social media: Large scale classification of hate and counter speech

Hateful rhetoric is plaguing online discourse, fostering extreme societal movements and possibly giving rise to real-world violence. A potential solution to this growing global problem is citizen-generated counter speech where citizens actively engage in hate-filled conversations to attempt to restore civil non-polarized discourse. However, its actual effectiveness in curbing the spread of hatred is unknown and hard to quantify. One major obstacle to researching this question is a lack of large labeled data sets for training automated classifiers to identify counter speech. Here we made use of a unique situation in Germany where self-labeling groups engaged in organized online hate and counter speech. We used an ensemble learning algorithm which pairs a variety of paragraph embeddings with regularized logistic regression functions to classify both hate and counter speech in a corpus of millions of relevant tweets from these two groups. Our pipeline achieved macro F1 scores on out of sample balanced test sets ranging from 0.76 to 0.97---accuracy in line and even exceeding the state of the art. On thousands of tweets, we used crowdsourcing to verify that the judgments made by the classifier are in close alignment with human judgment. We then used the classifier to discover hate and counter speech in more than 135,000 fully-resolved Twitter conversations occurring from 2013 to 2018 and study their frequency and interaction. Altogether, our results highlight the potential of automated methods to evaluate the impact of coordinated counter speech in stabilizing conversations on social media.

preprint2020arXiv

Inference for growing trees

One can often make inferences about a growing network from its current state alone. For example, it is generally possible to determine how a network changed over time or pick among plausible mechanisms explaining its growth. In practice, however, the extent to which such problems can be solved is limited by existing techniques, which are often inexact, inefficient, or both. In this article we derive exact and efficient inference methods for growing trees and demonstrate them in a series of applications: network interpolation, history reconstruction, model fitting, and model selection.

preprint2019arXiv

Improved mutual information measure for classification and community detection

The information theoretic quantity known as mutual information finds wide use in classification and community detection analyses to compare two classifications of the same set of objects into groups. In the context of classification algorithms, for instance, it is often used to compare discovered classes to known ground truth and hence to quantify algorithm performance. Here we argue that the standard mutual information, as commonly defined, omits a crucial term which can become large under real-world conditions, producing results that can be substantially in error. We demonstrate how to correct this error and define a mutual information that works in all cases. We discuss practical implementation of the new measure and give some example applications.

preprint2019arXiv

Interacting contagions are indistinguishable from social reinforcement

From fake news to innovative technologies, many contagions spread via a process of social reinforcement, where multiple exposures are distinct from prolonged exposure to a single source. Contrarily, biological agents such as Ebola or measles are typically thought to spread as simple contagions. Here, we demonstrate that interacting simple contagions are indistinguishable from complex contagions. In the social context, our results highlight the challenge of identifying and quantifying mechanisms, such as social reinforcement, in a world where an innumerable amount of ideas, memes and behaviors interact. In the biological context, this parallel allows the use of complex contagions to effectively quantify the non-trivial interactions of infectious diseases.

preprint2016arXiv

Growing networks of overlapping communities with internal structure

We introduce an intuitive model that describes both the emergence of community structure and the evolution of the internal structure of communities in growing social networks. The model comprises two complementary mechanisms: One mechanism accounts for the evolution of the internal link structure of a single community, and the second mechanism coordinates the growth of multiple overlapping communities. The first mechanism is based on the assumption that each node establishes links with its neighbors and introduces new nodes to the community at different rates. We demonstrate that this simple mechanism gives rise to an effective maximal degree within communities. This observation is related to the anthropological theory known as Dunbar's number, i.e., the empirical observation of a maximal number of ties which an average individual can sustain within its social groups. The second mechanism is based on a recently proposed generalization of preferential attachment to community structure, appropriately called structural preferential attachment (SPA). The combination of these two mechanisms into a single model (SPA+) allows us to reproduce a number of the global statistics of real networks: The distribution of community sizes, of node memberships and of degrees. The SPA+ model also predicts (a) three qualitative regimes for the degree distribution within overlapping communities and (b) strong correlations between the number of communities to which a node belongs and its number of connections within each community. We present empirical evidence that support our findings in real complex networks.

preprint2016arXiv

On the constrained growth of complex scale-independent systems

Scale independence is a ubiquitous feature of complex systems which implies a highly skewed distribution of resources with no characteristic scale. Research has long focused on why systems as varied as protein networks, evolution and stock actions all feature scale independence. Assuming that they simply do, we focus here on describing how this behaviour emerges, in contrast to more idealized models usually considered. We arrive at the conjecture that a minimal model to explain the growth towards scale independence involves only two coupled dynamical features: the first is the well-known preferential attachment principle and the second is a general form of delayed temporal scaling. While the first is sufficient, the second is present in all studied data and appears to maximize the speed of convergence to true scale independence. The delay in this temporal scaling acts as a coupling between population growth and individual activity. Together, these two dynamical properties appear to pave a precise evolution path, such that even an instantaneous snapshot of a distribution is enough to reconstruct the past of the system and predict its future. We validate our approach and confirm its usefulness on diverse spheres of human activities ranging from scientific and artistic productivity, to sexual relations and online traffic.

preprint2015arXiv

A shadowing problem in the detection of overlapping communities: lifting the resolution limit through a cascading procedure

Community detection is the process of assigning nodes and links in significant communities (e.g. clusters, function modules) and its development has led to a better understanding of complex networks. When applied to sizable networks, we argue that most detection algorithms correctly identify prominent communities, but fail to do so across multiple scales. As a result, a significant fraction of the network is left uncharted. We show that this problem stems from larger or denser communities overshadowing smaller or sparser ones, and that this effect accounts for most of the undetected communities and unassigned links. We propose a generic cascading approach to community detection that circumvents the problem. Using real and artificial network datasets with three widely used community detection algorithms, we show how a simple cascading procedure allows for the detection of the missing communities. This work highlights a new detection limit of community structure, and we hope that our approach can inspire better community detection algorithms.

preprint2015arXiv

Complex networks as an emerging property of hierarchical preferential attachment

Real complex systems are not rigidly structured; no clear rules or blueprints exist for their construction. Yet, amidst their apparent randomness, complex structural properties universally emerge. We propose that an important class of complex systems can be modeled as an organization of many embedded levels (potentially infinite in number), all of them following the same universal growth principle known as preferential attachment. We give examples of such hierarchy in real systems, for instance in the pyramid of production entities of the film industry. More importantly, we show how real complex networks can be interpreted as a projection of our model, from which their scale independence, their clustering, their hierarchy, their fractality and their navigability naturally emerge. Our results suggest that complex networks, viewed as growing systems, can be quite simple, and that the apparent complexity of their structure is largely a reflection of their unobserved hierarchical nature.

preprint2015arXiv

General and exact approach to percolation on random graphs

We present a comprehensive and versatile theoretical framework to study site and bond percolation on clustered and correlated random graphs. Our contribution can be summarized in three main points. (i) We introduce a set of iterative equations that solve the exact distribution of the size and composition of components in finite size quenched or random multitype graphs. (ii) We define a very general random graph ensemble that encompasses most of the models published to this day, and also that permits to model structural properties not yet included in a theoretical framework. Site and bond percolation on this ensemble is solved exactly in the infinite size limit using probability generating functions [i.e., the percolation threshold, the size and the composition of the giant (extensive) and small components]. Several examples and applications are also provided. (iii) Our approach can be adapted to model interdependent graphs---whose most striking feature is the emergence of an extensive component via a discontinuous phase transition---in an equally general fashion. We show how a graph can successively undergo a continuous then a discontinuous phase transition, and preliminary results suggest that clustering increases the amplitude of the discontinuity at the transition.

preprint2014arXiv

Coexistence of phases and the observability of random graphs

In a recent Letter, Yang et al. [Phys. Rev. Lett. 109, 258701 (2012)] introduced the concept of observability transitions: the percolation-like emergence of a macroscopic observable component in graphs in which the state of a fraction of the nodes, and of their first neighbors, is monitored. We show how their concept of depth-L percolation---where the state of nodes up to a distance L of monitored nodes is known---can be mapped unto multitype random graphs, and use this mapping to exactly solve the observability problem for arbitrary L. We then demonstrate a non-trivial coexistence of an observable and of a non-observable extensive component. This coexistence suggests that monitoring a macroscopic portion of a graph does not prevent a macroscopic event to occur unbeknown to the observer. We also show that real complex systems behave quite differently with regard to observability depending on whether they are geographically-constrained or not.

preprint2013arXiv

Global efficiency of local immunization on complex networks

Epidemics occur in all shapes and forms: infections propagating in our sparse sexual networks, rumours and diseases spreading through our much denser social interactions, or viruses circulating on the Internet. With the advent of large databases and efficient analysis algorithms, these processes can be better predicted and controlled. In this study, we use different characteristics of network organization to identify the influential spreaders in 17 empirical networks of diverse nature using 2 epidemic models. We find that a judicious choice of local measures, based either on the network's connectivity at a microscopic scale or on its community structure at a mesoscopic scale, compares favorably to global measures, such as betweenness centrality, in terms of efficiency, practicality and robustness. We also develop an analytical framework that highlights a transition in the characteristic scale of different epidemic regimes. This allows to decide which local measure should govern immunization in a given scenario.

preprint2013arXiv

Percolation on random networks with arbitrary k-core structure

The k-core decomposition of a network has thus far mainly served as a powerful tool for the empirical study of complex networks. We now propose its explicit integration in a theoretical model. We introduce a Hard-core Random Network model that generates maximally random networks with arbitrary degree distribution and arbitrary k-core structure. We then solve exactly the bond percolation problem on the HRN model and produce fast and precise analytical estimates for the corresponding real networks. Extensive comparison with selected databases reveals that our approach performs better than existing models, while requiring less input information.

Jean-Gabriel Young

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

Clustering of heterogeneous populations of networks

Cutting Through the Noise to Infer Autonomous System Topology

Hypergraph reconstruction from network data

The OCEAN mailing list data set: Network analysis spanning mailing lists and code repositories

The Promise of Cross-Species Coexpression Analysis in Studying the Coevolution and Ecology of Host-Symbiont Interactions

Bayesian inference of network structure from unreliable data

Countering hate on social media: Large scale classification of hate and counter speech

Inference for growing trees

Improved mutual information measure for classification and community detection

Interacting contagions are indistinguishable from social reinforcement

Growing networks of overlapping communities with internal structure

On the constrained growth of complex scale-independent systems

A shadowing problem in the detection of overlapping communities: lifting the resolution limit through a cascading procedure

Complex networks as an emerging property of hierarchical preferential attachment

General and exact approach to percolation on random graphs

Coexistence of phases and the observability of random graphs

Global efficiency of local immunization on complex networks

Percolation on random networks with arbitrary k-core structure