Topic overview

Social and Information Networks

1937 works5841 researchers0 institutions

Topic snapshot

What this area looks like now

1937works
5841authors
0experts visible
0communities

Next steps

Move from topic reading into action

The graph preview below keeps the nearby papers, people and communities visible in the same reading flow.

Topic graph

See the topic as a live network

Open full explorer

Inspect nearby papers, researchers, institutions and communities without opening a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Papers in this area

24 featured work(s)

preprint2016arXiv

Diffusion of innovation in large scale graphs

Will a new smartphone application diffuse deeply in the population or will it sink into oblivion soon? To predict this, we argue that common models of spread of innovations based on cascade dynamics or epidemics may not be fully adequate. Therefore we propose a novel stochastic network dynamics modeling the spread of a new technological asset, whose adoption is based on the word-of-mouth and the persuasion strength increases the more the product is diffused. In this paper we carry on an analysis on large scale graphs to show off how the parameters of the model, the topology of the graph and, possibly, the initial diffusion of the asset, determine whether the spread of the asset is successful or not. In particular, by means of stochastic dominations and deterministic approximations, we provide some general results for a large class of expansive graphs. Finally we present numerical simulations trying to expand the analytical results we proved to even more general topologies.

preprint2015arXiv

Respondent-driven sampling bias induced by clustering and community structure in social networks

Sampling hidden populations is particularly challenging using standard sampling methods mainly because of the lack of a sampling frame. Respondent-driven sampling (RDS) is an alternative methodology that exploits the social contacts between peers to reach and weight individuals in these hard-to-reach populations. It is a snowball sampling procedure where the weight of the respondents is adjusted for the likelihood of being sampled due to differences in the number of contacts. In RDS, the structure of the social contacts thus defines the sampling process and affects its coverage, for instance by constraining the sampling within a sub-region of the network. In this paper we study the bias induced by network structures such as social triangles, community structure, and heterogeneities in the number of contacts, in the recruitment trees and in the RDS estimator. We simulate different scenarios of network structures and response-rates to study the potential biases one may expect in real settings. We find that the prevalence of the estimated variable is associated with the size of the network community to which the individual belongs. Furthermore, we observe that low-degree nodes may be under-sampled in certain situations if the sample and the network are of similar size. Finally, we also show that low response-rates lead to reasonably accurate average estimates of the prevalence but generate relatively large biases.

preprint2015arXiv

Single-Seed Cascades on Clustered Networks

We consider a dynamic network cascade process developed by Watts applied to a random networks with a specified amount of clustering, belonging to a class of random networks developed by Newman. We adapt existing tree-based methods to formulate an appropriate two-type branching process to describe the spread of a cascade started with a single active node, and obtain a fixed-point equation to implicitly express the extinction probability of such a cascade. In so doing, we also recover a special case of a formula of Hackett et al. giving conditions for certain extinction of the cascade.

preprint2016arXiv

A Game-Theoretic Approach for Detection of Overlapping Communities in Dynamic Complex Networks

Complex networks tend to display communities which are groups of nodes cohesively connected among themselves in one group and sparsely connected to the remainder of the network. Detecting such communities is an important computational problem, since it provides an insight into the functionality of networks. Further, investigating community structure in a dynamic network, where the network is subject to change, is even more challenging. This paper presents a game-theoretical technique for detecting community structures in dynamic as well as static complex networks. In our method, each node takes the role of a player that attempts to gain a higher payoff by joining one or more communities or switching between them. The goal of the game is to reveal community structure formed by these players by finding a Nash-equilibrium point among them. To the best of our knowledge, this is the first game-theoretic algorithm which is able to extract overlapping communities from either static or dynamic networks. We present the experimental results illustrating the effectiveness of the proposed method on both synthetic and real-world networks.

preprint2017arXiv

Divergent discourse between protests and counter-protests: #BlackLivesMatter and #AllLivesMatter

Since the shooting of Black teenager Michael Brown by White police officer Darren Wilson in Ferguson, Missouri, the protest hashtag #BlackLivesMatter has amplified critiques of extrajudicial killings of Black Americans. In response to #BlackLivesMatter, other Twitter users have adopted #AllLivesMatter, a counter-protest hashtag whose content argues that equal attention should be given to all lives regardless of race. Through a multi-level analysis of over 860,000 tweets, we study how these protests and counter-protests diverge by quantifying aspects of their discourse. We find that #AllLivesMatter facilitates opposition between #BlackLivesMatter and hashtags such as #PoliceLivesMatter and #BlueLivesMatter in such a way that historically echoes the tension between Black protesters and law enforcement. In addition, we show that a significant portion of #AllLivesMatter use stems from hijacking by #BlackLivesMatter advocates. Beyond simply injecting #AllLivesMatter with #BlackLivesMatter content, these hijackers use the hashtag to directly confront the counter-protest notion of "All lives matter." Our findings suggest that Black Lives Matter movement was able to grow, exhibit diverse conversations, and avoid derailment on social media by making discussion of counter-protest opinions a central topic of #AllLivesMatter, rather than the movement itself.

preprint2017arXiv

On imitation dynamics in potential population games

Imitation dynamics for population games are studied and their asymptotic properties analyzed. In the considered class of imitation dynamics - that encompass the replicator equation as well as other models previously considered in evolutionary biology - players have no global information about the game structure, and all they know is their own current utility and the one of fellow players contacted through pairwise interactions. For potential population games, global asymptotic stability of the set of Nash equilibria of the sub-game restricted to the support of the initial population configuration is proved. These results strengthen (from local to global asymptotic stability) existing ones and generalize them to a broader class of dynamics. The developed techniques highlight a certain structure of the problem and suggest possible generalizations from the fully mixed population case to imitation dynamics whereby agents interact on complex communication networks.

preprint2017arXiv

From Relational Data to Graphs: Inferring Significant Links using Generalized Hypergeometric Ensembles

The inference of network topologies from relational data is an important problem in data analysis. Exemplary applications include the reconstruction of social ties from data on human interactions, the inference of gene co-expression networks from DNA microarray data, or the learning of semantic relationships based on co-occurrences of words in documents. Solving these problems requires techniques to infer significant links in noisy relational data. In this short paper, we propose a new statistical modeling framework to address this challenge. It builds on generalized hypergeometric ensembles, a class of generative stochastic models that give rise to analytically tractable probability spaces of directed, multi-edge graphs. We show how this framework can be used to assess the significance of links in noisy relational data. We illustrate our method in two data sets capturing spatio-temporal proximity relations between actors in a social system. The results show that our analytical framework provides a new approach to infer significant links from relational data, with interesting perspectives for the mining of data on social systems.

preprint2018arXiv

Facebook Use of Sensitive Data for Advertising in Europe

The upcoming European General Data Protection Regulation (GDPR) prohibits the processing and exploitation of some categories of personal data (health, political orientation, sexual preferences, religious beliefs, ethnic origin, etc.) due to the obvious privacy risks that may be derived from a malicious use of such type of information. These categories are referred to as sensitive personal data. Facebook has been recently fined EUR 1.2M in Spain for collecting, storing and processing sensitive personal data for advertising purposes. This paper quantifies the portion of Facebook users in the European Union (EU) who are labeled with interests linked to sensitive personal data. The results of our study reveal that Facebook labels 73% EU users with sensitive interests. This corresponds to 40% of the overall EU population. We also estimate that a malicious third-party could unveil the identity of Facebook users that have been assigned a sensitive interest at a cost as low as EUR 0.015 per user. Finally, we propose and implement a web browser extension to inform Facebook users of the sensitive interests Facebook has assigned them.

preprint2017arXiv

Bias and high-dimensional adjustment in observational studies of peer effects

Peer effects, in which the behavior of an individual is affected by the behavior of their peers, are posited by multiple theories in the social sciences. Other processes can also produce behaviors that are correlated in networks and groups, thereby generating debate about the credibility of observational (i.e. nonexperimental) studies of peer effects. Randomized field experiments that identify peer effects, however, are often expensive or infeasible. Thus, many studies of peer effects use observational data, and prior evaluations of causal inference methods for adjusting observational data to estimate peer effects have lacked an experimental "gold standard" for comparison. Here we show, in the context of information and media diffusion on Facebook, that high-dimensional adjustment of a nonexperimental control group (677 million observations) using propensity score models produces estimates of peer effects statistically indistinguishable from those from using a large randomized experiment (220 million observations). Naive observational estimators overstate peer effects by 320% and commonly used variables (e.g., demographics) offer little bias reduction, but adjusting for a measure of prior behaviors closely related to the focal behavior reduces bias by 91%. High-dimensional models adjusting for over 3,700 past behaviors provide additional bias reduction, such that the full model reduces bias by over 97%. This experimental evaluation demonstrates that detailed records of individuals' past behavior can improve studies of social influence, information diffusion, and imitation; these results are encouraging for the credibility of some studies but also cautionary for studies of rare or new behaviors. More generally, these results show how large, high-dimensional data sets and statistical learning techniques can be used to improve causal inference in the behavioral sciences.

preprint2019arXiv

A Hierarchical Attention Model for Social Contextual Image Recommendation

Image based social networks are among the most popular social networking services in recent years. With tremendous images uploaded everyday, understanding users' preferences on user-generated images and making recommendations have become an urgent need. In fact, many hybrid models have been proposed to fuse various kinds of side information~(e.g., image visual representation, social network) and user-item historical behavior for enhancing recommendation performance. However, due to the unique characteristics of the user generated images in social image platforms, the previous studies failed to capture the complex aspects that influence users' preferences in a unified framework. Moreover, most of these hybrid models relied on predefined weights in combining different kinds of information, which usually resulted in sub-optimal recommendation performance. To this end, in this paper, we develop a hierarchical attention model for social contextual image recommendation. In addition to basic latent user interest modeling in the popular matrix factorization based recommendation, we identify three key aspects (i.e., upload history, social influence, and owner admiration) that affect each user's latent preferences, where each aspect summarizes a contextual factor from the complex relationships between users and images. After that, we design a hierarchical attention network that naturally mirrors the hierarchical relationship (elements in each aspects level, and the aspect level) of users' latent interests with the identified key aspects. Specifically, by taking embeddings from state-of-the-art deep learning models that are tailored for each kind of data, the hierarchical attention network could learn to attend differently to more or less content. Finally, extensive experimental results on real-world datasets clearly show the superiority of our proposed model.

preprint2019arXiv

Ensemble Clustering for Graphs: Comparisons and Applications

We recently proposed a new ensemble clustering algorithm for graphs (ECG) based on the concept of consensus clustering. We validated our approach by replicating a study comparing graph clustering algorithms over benchmark graphs, showing that ECG outperforms the leading algorithms. In this paper, we extend our comparison by considering a wider range of parameters for the benchmark, generating graphs with different properties. We provide new experimental results showing that the ECG algorithm alleviates the well-known resolution limit issue, and that it leads to better stability of the partitions. We also illustrate how the ensemble obtained with ECG can be used to quantify the presence of community structure in the graph, and to zoom in on the sub-graph most closely associated with seed vertices. Finally, we illustrate further applications of ECG by comparing it to previous results for community detection on weighted graphs, and community-aware anomaly detection.

preprint2019arXiv

Scalable Privacy-Compliant Virality Prediction on Twitter

The digital town hall of Twitter becomes a preferred medium of communication for individuals and organizations across the globe. Some of them reach audiences of millions, while others struggle to get noticed. Given the impact of social media, the question remains more relevant than ever: how to model the dynamics of attention in Twitter. Researchers around the world turn to machine learning to predict the most influential tweets and authors, navigating the volume, velocity, and variety of social big data, with many compromises. In this paper, we revisit content popularity prediction on Twitter. We argue that strict alignment of data acquisition, storage and analysis algorithms is necessary to avoid the common trade-offs between scalability, accuracy and privacy compliance. We propose a new framework for the rapid acquisition of large-scale datasets, high accuracy supervisory signal and multilanguage sentiment prediction while respecting every privacy request applicable. We then apply a novel gradient boosting framework to achieve state-of-the-art results in virality ranking, already before including tweet's visual or propagation features. Our Gradient Boosted Regression Tree is the first to offer explainable, strong ranking performance on benchmark datasets. Since the analysis focused on features available early, the model is immediately applicable to incoming tweets in 18 languages.

preprint2018arXiv

Analytical Formulation of the Block-Constrained Configuration Model

We provide a novel family of generative block-models for random graphs that naturally incorporates degree distributions: the block-constrained configuration model. Block-constrained configuration models build on the generalised hypergeometric ensemble of random graphs and extend the well-known configuration model by enforcing block-constraints on the edge generation process. The resulting models are analytically tractable and practical to fit even to large networks. These models provide a new, flexible tool for the study of community structure and for network science in general, where modelling networks with heterogeneous degree distributions is of central importance.

preprint2018arXiv

PAFit: an R Package for the Non-Parametric Estimation of Preferential Attachment and Node Fitness in Temporal Complex Networks

Many real-world systems are profitably described as complex networks that grow over time. Preferential attachment and node fitness are two simple growth mechanisms that not only explain certain structural properties commonly observed in real-world systems, but are also tied to a number of applications in modeling and inference. While there are statistical packages for estimating various parametric forms of the preferential attachment function, there is no such package implementing non-parametric estimation procedures. The non-parametric approach to the estimation of the preferential attachment function allows for comparatively finer-grained investigations of the `rich-get-richer' phenomenon that could lead to novel insights in the search to explain certain nonstandard structural properties observed in real-world networks. This paper introduces the R package PAFit, which implements non-parametric procedures for estimating the preferential attachment function and node fitnesses in a growing network, as well as a number of functions for generating complex networks from these two mechanisms. The main computational part of the package is implemented in C++ with OpenMP to ensure scalability to large-scale networks. We first introduce the main functionalities of PAFit through simulated examples, and then use the package to analyze a collaboration network between scientists in the field of complex networks. The results indicate the joint presence of `rich-get-richer' and `fit-get-richer' phenomena in the collaboration network. The estimated attachment function is observed to be near-linear, which we interpret as meaning that the chance an author gets a new collaborator is proportional to their current number of collaborators. Furthermore, the estimated author fitnesses reveal a host of familiar faces from the complex networks community among the field's topmost fittest network scientists.

preprint2018arXiv

Ensemble Clustering for Graphs

We propose an ensemble clustering algorithm for graphs (ECG), which is based on the Louvain algorithm and the concept of consensus clustering. We validate our approach by replicating a recently published study comparing graph clustering algorithms over artificial networks, showing that ECG outperforms the leading algorithms from that study. We also illustrate how the ensemble obtained with ECG can be used to quantify the presence of community structure in the graph.

preprint2019arXiv

An Unsupervised Framework for Comparing Graph Embeddings

Graph embedding is a transformation of vertices of a graph into set of vectors. Good embeddings should capture the graph topology, vertex-to-vertex relationship, and other relevant information about graphs, subgraphs, and vertices. If these objectives are achieved, they are meaningful, understandable, and compressed representations of networks. They also provide more options and tools for data scientists as machine learning on graphs is still quite limited. Finally, vector operations are simpler and faster than comparable operations on graphs. The main challenge is that one needs to make sure that embeddings well describe the properties of the graphs. In particular, the decision has to be made on the embedding dimensionality which highly impacts the quality of an embedding. As a result, selecting the best embedding is a challenging task and very often requires domain experts. In this paper, we propose a ``divergence score'' that can be assign to various embeddings to distinguish good ones from bad ones. This general framework provides a tool for an unsupervised graph embedding comparison. In order to achieve it, we needed to generalize the well-known Chung-Lu model to incorporate geometry which is interesting on its own rights. In order to test our framework, we did a number of experiments with synthetic networks as well as real-world networks, and various embedding algorithms.

preprint2018arXiv

Large-scale analysis of user exposure to online advertising in Facebook

Online advertising is the major source of income for a large portion of Internet Services. There exists a body of literature aiming at optimizing ads engagement, understanding the privacy and ethical implications of online advertising, etc. However, to the best of our knowledge, no previous work analyses at large scale the exposure of real users to online advertising. This paper performs a comprehensive analysis of the exposure of users to ads and advertisers using a dataset including more than 7M ads from 140K unique advertisers delivered to more than 5K users that was collected between October 2016 and May 2018. The study focuses on Facebook, which is the second largest advertising platform only to Google in terms of revenue, and accounts for more than 2.2B monthly active users. Our analysis reveals that Facebook users are exposed (in median) to 70 ads per week, which come from 12 advertisers. Ads represent between 10% and 15% of all the information received in users' newsfeed. A small increment of 1% in the portion of ads in the newsfeed could roughly represent a revenue increase of 8.17M USD per week for Facebook. Finally, we also reveal that Facebook users are overprofiled since in the best case only 22.76% of the interests Facebook assigns to users for advertising purpose are actually related to the ads those users receive.

preprint2019arXiv

Social Cards Probably Provide For Better Understanding Of Web Archive Collections

Used by a variety of researchers, web archive collections have become invaluable sources of evidence. If a researcher is presented with a web archive collection that they did not create, how do they know what is inside so that they can use it for their own research? Search engine results and social media links are represented as surrogates, small easily digestible summaries of the underlying page. Search engines and social media have a different focus, and hence produce different surrogates than web archives. Search engine surrogates help a user answer the question "Will this link meet my information need?" Social media surrogates help a user decide "Should I click on this?" Our use case is subtly different. We hypothesize that groups of surrogates together are useful for summarizing a collection. We want to help users answer the question of "What does the underlying collection contain?" But which surrogate should we use? With Mechanical Turk participants, we evaluate six different surrogate types against each other. We find that the type of surrogate does not influence the time to complete the task we presented the participants. Of particular interest are social cards, surrogates typically found on social media, and browser thumbnails, screen captures of web pages rendered in a browser. At $p=0.0569$, and $p=0.0770$, respectively, we find that social cards and social cards paired side-by-side with browser thumbnails probably provide better collection understanding than the surrogates currently used by the popular Archive-It web archiving platform. We measure user interactions with each surrogate and find that users interact with social cards less than other types. The results of this study have implications for our web archive summarization work, live web curation platforms, social media, and more.

preprint2019arXiv

Quantifying Triadic Closure in Multi-Edge Social Networks

Multi-edge networks capture repeated interactions between individuals. In social networks, such edges often form closed triangles, or triads. Standard approaches to measure this triadic closure, however, fail for multi-edge networks, because they do not consider that triads can be formed by edges of different multiplicity. We propose a novel measure of triadic closure for multi-edge networks of social interactions based on a shared partner statistic. We demonstrate that our operalization is able to detect meaningful closure in synthetic and empirical multi-edge networks, where common approaches fail. This is a cornerstone in driving inferential network analyses from the analysis of binary networks towards the analyses of multi-edge and weighted networks, which offer a more realistic representation of social interactions and relations.

preprint2019arXiv

Improving the robustness of online social networks: A simulation approach of network interventions

Online social networks (OSN) are prime examples of socio-technical systems in which individuals interact via a technical platform. OSN are very volatile because users enter and exit and frequently change their interactions. This makes the robustness of such systems difficult to measure and to control. To quantify robustness, we propose a coreness value obtained from the directed interaction network. We study the emergence of large drop-out cascades of users leaving the OSN by means of an agent-based model. For agents, we define a utility function that depends on their relative reputation and their costs for interactions. The decision of agents to leave the OSN depends on this utility. Our aim is to prevent drop-out cascades by influencing specific agents with low utility. We identify strategies to control agents in the core and the periphery of the OSN such that drop-out cascades are significantly reduced, and the robustness of the OSN is increased.

preprint2019arXiv

Graph-Hist: Graph Classification from Latent Feature Histograms With Application to Bot Detection

Neural networks are increasingly used for graph classification in a variety of contexts. Social media is a critical application area in this space, however the characteristics of social media graphs differ from those seen in most popular benchmark datasets. Social networks tend to be large and sparse, while benchmarks are small and dense. Classically, large and sparse networks are analyzed by studying the distribution of local properties. Inspired by this, we introduce Graph-Hist: an end-to-end architecture that extracts a graph's latent local features, bins nodes together along 1-D cross sections of the feature space, and classifies the graph based on this multi-channel histogram. We show that Graph-Hist improves state of the art performance on true social media benchmark datasets, while still performing well on other benchmarks. Finally, we demonstrate Graph-Hist's performance by conducting bot detection in social media. While sophisticated bot and cyborg accounts increasingly evade traditional detection methods, they leave artificial artifacts in their conversational graph that are detected through graph classification. We apply Graph-Hist to classify these conversational graphs. In the process, we confirm that social media graphs are different than most baselines and that Graph-Hist outperforms existing bot-detection models.

preprint2019arXiv

Twitter Watch: Leveraging Social Media to Monitor and Predict Collective-Efficacy of Neighborhoods

Sociologists associate the spatial variation of crime within an urban setting, with the concept of collective efficacy. The collective efficacy of a neighborhood is defined as social cohesion among neighbors combined with their willingness to intervene on behalf of the common good. Sociologists measure collective efficacy by conducting survey studies designed to measure individuals' perception of their community. In this work, we employ the curated data from a survey study (ground truth) and examine the effectiveness of substituting costly survey questionnaires with proxies derived from social media. We enrich a corpus of tweets mentioning a local venue with several linguistic and topological features. We then propose a pairwise learning to rank model with the goal of identifying a ranking of neighborhoods that is similar to the ranking obtained from the ground truth collective efficacy values. In our experiments, we find that our generated ranking of neighborhoods achieves 0.77 Kendall tau-x ranking agreement with the ground truth ranking. Overall, our results are up to 37% better than traditional baselines.

preprint2019arXiv

Rejection-Based Simulation of Non-Markovian Agents on Complex Networks

Stochastic models in which agents interact with their neighborhood according to a network topology are a powerful modeling framework to study the emergence of complex dynamic patterns in real-world systems. Stochastic simulations are often the preferred - sometimes the only feasible - way to investigate such systems. Previous research focused primarily on Markovian models where the random time until an interaction happens follows an exponential distribution. In this work, we study a general framework to model systems where each agent is in one of several states. Agents can change their state at random, influenced by their complete neighborhood, while the time to the next event can follow an arbitrary probability distribution. Classically, these simulations are hindered by high computational costs of updating the rates of interconnected agents and sampling the random residence times from arbitrary distributions. We propose a rejection-based, event-driven simulation algorithm to overcome these limitations. Our method over-approximates the instantaneous rates corresponding to inter-event times while rejection events counterbalance these over-approximations. We demonstrate the effectiveness of our approach on models of epidemic and information spreading.

preprint2019arXiv

PFaRA: a Platoon Forming and Routing Algorithm for Same-Day Deliveries

Platoons, vehicles that travel very close together acting as one, promise to improve road usage on freeways and city roads alike. We study platoon formation in the context of same-day delivery in urban environments. Multiple self-interested logistic service providers (LSP) carry out same-day deliveries by deploying autonomous electric vehicles that are capable of forming and traveling in platoons. The novel aspect that we consider in our research is heterogeneity of platoons in the sense that vehicles are equipped with different capabilities and constraints, and belong to different providers. Our aim is to examine how these platoons can form and their potential properties and benefits. We present a platoon forming and routing algorithm, called PFaRA, that finds longest common routes for multiple vehicles, while also respecting vehicle preferences and constraints. PFaRA consists of two parts, a speed clustering step and a linear optimisation step. To test the approach, a simulation was used, working with realistic urban network data and background traffic models. Our results showed that the performance of our approach is comparable to a simple route-matching one, but it leads to better utility values for vehicles and by extension the LSPs. We show that the grouping provided is viable and provides benefits to all vehicles participating in the platoon.

People in this topic

12 visible researcher(s)