Researcher profile

Rumi Ghosh

Rumi Ghosh contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2013arXiv

Limited Attention and Centrality in Social Networks

How does one find important or influential people in an online social network? Researchers have proposed a variety of centrality measures to identify individuals that are, for example, often visited by a random walk, infected in an epidemic, or receive many messages from friends. Recent research suggests that a social media users' capacity to respond to an incoming message is constrained by their finite attention, which they divide over all incoming information, i.e., information sent by users they follow. We propose a new measure of centrality --- limited-attention version of Bonacich's Alpha-centrality --- that models the effect of limited attention on epidemic diffusion. The new measure describes a process in which nodes broadcast messages to their out-neighbors, but the neighbors' ability to receive the message depends on the number of in-neighbors they have. We evaluate the proposed measure on real-world online social networks and show that it can better reproduce an empirical influence ranking of users than other popular centrality measures.

preprint2013arXiv

Social Contagion: An Empirical Study of Information Spread on Digg and Twitter Follower Graphs

Social networks have emerged as a critical factor in information dissemination, search, marketing, expertise and influence discovery, and potentially an important tool for mobilizing people. Social media has made social networks ubiquitous, and also given researchers access to massive quantities of data for empirical analysis. These data sets offer a rich source of evidence for studying dynamics of individual and group behavior, the structure of networks and global patterns of the flow of information on them. However, in most previous studies, the structure of the underlying networks was not directly visible but had to be inferred from the flow of information from one individual to another. As a result, we do not yet understand dynamics of information spread on networks or how the structure of the network affects it. We address this gap by analyzing data from two popular social news sites. Specifically, we extract follower graphs of active Digg and Twitter users and track how interest in news stories cascades through the graph. We compare and contrast properties of information cascades on both sites and elucidate what they tell us about dynamics of information flow on networks.

preprint2012arXiv

Impact of Dynamic Interactions on Multi-Scale Analysis of Community Structure in Networks

To find interesting structure in networks, community detection algorithms have to take into account not only the network topology, but also dynamics of interactions between nodes. We investigate this claim using the paradigm of synchronization in a network of coupled oscillators. As the network evolves to a global steady state, nodes belonging to the same community synchronize faster than nodes belonging to different communities. Traditionally, nodes in network synchronization models are coupled via one-to-one, or conservative interactions. However, social interactions are often one-to-many, as for example, in social media, where users broadcast messages to all their followers. We formulate a novel model of synchronization in a network of coupled oscillators in which the oscillators are coupled via one-to-many, or non-conservative interactions. We study the dynamics of different interaction models and contrast their spectral properties. To find multi-scale community structure in a network of interacting nodes, we define a similarity function that measures the degree to which nodes are synchronized and use it to hierarchically cluster nodes. We study real-world social networks, including networks of two social media providers. To evaluate the quality of the discovered communities in a social media network we propose a community quality metric based on user activity. We find that conservative and non-conservative interaction models lead to dramatically different views of community structure even within the same network. Our work offers a novel mathematical framework for exploring the relationship between network structure, topology and dynamics.

preprint2011arXiv

Entropy-based Classification of 'Retweeting' Activity on Twitter

Twitter is used for a variety of reasons, including information dissemination, marketing, political organizing and to spread propaganda, spamming, promotion, conversations, and so on. Characterizing these activities and categorizing associated user generated content is a challenging task. We present a information-theoretic approach to classification of user activity on Twitter. We focus on tweets that contain embedded URLs and study their collective `retweeting' dynamics. We identify two features, time-interval and user entropy, which we use to classify retweeting activity. We achieve good separation of different activities using just these two features and are able to categorize content based on the collective user response it generates. We have identified five distinct categories of retweeting activity on Twitter: automatic/robotic activity, newsworthy information dissemination, advertising and promotion, campaigns, and parasitic advertisement. In the course of our investigations, we have shown how Twitter can be exploited for promotional and spam-like activities. The content-independent, entropy-based activity classification method is computationally efficient, scalable and robust to sampling and missing data. It has many applications, including automatic spam-detection, trend identification, trust management, user-modeling, social search and content classification on online social media.

preprint2011arXiv

Non-Conservative Diffusion and its Application to Social Network Analysis

The random walk is fundamental to modeling dynamic processes on networks. Metrics based on the random walk have been used in many applications from image processing to Web page ranking. However, how appropriate are random walks to modeling and analyzing social networks? We argue that unlike a random walk, which conserves the quantity diffusing on a network, many interesting social phenomena, such as the spread of information or disease on a social network, are fundamentally non-conservative. When an individual infects her neighbor with a virus, the total amount of infection increases. We classify diffusion processes as conservative and non-conservative and show how these differences impact the choice of metrics used for network analysis, as well as our understanding of network structure and behavior. We show that Alpha-Centrality, which mathematically describes non-conservative diffusion, leads to new insights into the behavior of spreading processes on networks. We give a scalable approximate algorithm for computing the Alpha-Centrality in a massive graph. We validate our approach on real-world online social networks of Digg. We show that a non-conservative metric, such as Alpha-Centrality, produces better agreement with empirical measure of influence than conservative metrics, such as PageRank. We hope that our investigation will inspire further exploration into the realms of conservative and non-conservative metrics in social network analysis.

preprint2011arXiv

Using Proximity to Predict Activity in Social Networks

The structure of a social network contains information useful for predicting its evolution. Nodes that are "close" in some sense are more likely to become linked in the future than more distant nodes. We show that structural information can also help predict node activity. We use proximity to capture the degree to which two nodes are "close" to each other in the network. In addition to standard proximity metrics used in the link prediction task, such as neighborhood overlap, we introduce new metrics that model different types of interactions that can occur between network nodes. We argue that the "closer" nodes are in a social network, the more similar will be their activity. We study this claim using data about URL recommendation on social media sites Digg and Twitter. We show that structural proximity of two users in the follower graph is related to similarity of their activity, i.e., how many URLs they both recommend. We also show that given friends' activity, knowing their proximity to the user can help better predict which URLs the user will recommend. We compare the performance of different proximity metrics on the activity prediction task and find that some metrics lead to substantial performance improvements.

preprint2011arXiv

What Stops Social Epidemics?

Theoretical progress in understanding the dynamics of spreading processes on graphs suggests the existence of an epidemic threshold below which no epidemics form and above which epidemics spread to a significant fraction of the graph. We have observed information cascades on the social media site Digg that spread fast enough for one initial spreader to infect hundreds of people, yet end up affecting only 0.1% of the entire network. We find that two effects, previously studied in isolation, combine cooperatively to drastically limit the final size of cascades on Digg. First, because of the highly clustered structure of the Digg network, most people who are aware of a story have been exposed to it via multiple friends. This structure lowers the epidemic threshold while moderately slowing the overall growth of cascades. In addition, we find that the mechanism for social contagion on Digg points to a fundamental difference between information spread and other contagion processes: despite multiple opportunities for infection within a social group, people are less likely to become spreaders of information with repeated exposure. The consequences of this mechanism become more pronounced for more clustered graphs. Ultimately, this effect severely curtails the size of social epidemics on Digg.

preprint2010arXiv

A Framework for Quantitative Analysis of Cascades on Networks

How does information flow in online social networks? How does the structure and size of the information cascade evolve in time? How can we efficiently mine the information contained in cascade dynamics? We approach these questions empirically and present an efficient and scalable mathematical framework for quantitative analysis of cascades on networks. We define a cascade generating function that captures the details of the microscopic dynamics of the cascades. We show that this function can also be used to compute the macroscopic properties of cascades, such as their size, spread, diameter, number of paths, and average path length. We present an algorithm to efficiently compute cascade generating function and demonstrate that while significantly compressing information within a cascade, it nevertheless allows us to accurately reconstruct its structure. We use this framework to study information dynamics on the social network of Digg. Digg allows users to post and vote on stories, and easily see the stories that friends have voted on. As a story spreads on Digg through voting, it generates cascades. We extract cascades of more than 3,500 Digg stories and calculate their macroscopic and microscopic properties. We identify several trends in cascade dynamics: spreading via chaining, branching and community. We discuss how these affect the spread of the story through the Digg social network. Our computational framework is general and offers a practical solution to quantitative analysis of the microscopic structure of even very large cascades.

preprint2010arXiv

A Parameterized Centrality Metric for Network Analysis

A variety of metrics have been proposed to measure the relative importance of nodes in a network. One of these, alpha-centrality [Bonacich, 2001], measures the number of attenuated paths that exist between nodes. We introduce a normalized version of this metric and use it to study network structure, specifically, to rank nodes and find community structure of the network. Specifically, we extend the modularity-maximization method [Newman and Girvan, 2004] for community detection to use this metric as the measure of node connectivity. Normalized alpha-centrality is a powerful tool for network analysis, since it contains a tunable parameter that sets the length scale of interactions. By studying how rankings and discovered communities change when this parameter is varied allows us to identify locally and globally important nodes and structures. We apply the proposed method to several benchmark networks and show that it leads to better insight into network structure than alternative methods.

preprint2010arXiv

Information Contagion: an Empirical Study of the Spread of News on Digg and Twitter Social Networks

Social networks have emerged as a critical factor in information dissemination, search, marketing, expertise and influence discovery, and potentially an important tool for mobilizing people. Social media has made social networks ubiquitous, and also given researchers access to massive quantities of data for empirical analysis. These data sets offer a rich source of evidence for studying dynamics of individual and group behavior, the structure of networks and global patterns of the flow of information on them. However, in most previous studies, the structure of the underlying networks was not directly visible but had to be inferred from the flow of information from one individual to another. As a result, we do not yet understand dynamics of information spread on networks or how the structure of the network affects it. We address this gap by analyzing data from two popular social news sites. Specifically, we extract social networks of active users on Digg and Twitter, and track how interest in news stories spreads among them. We show that social networks play a crucial role in the spread of information on these sites, and that network structure affects dynamics of information flow.

preprint2010arXiv

Predicting Influential Users in Online Social Networks

Who are the influential people in an online social network? The answer to this question depends not only on the structure of the network, but also on details of the dynamic processes occurring on it. We classify these processes as conservative and non-conservative. A random walk on a network is an example of a conservative dynamic process, while information spread is non-conservative. The influence models used to rank network nodes can be similarly classified, depending on the dynamic process they implicitly emulate. We claim that in order to correctly rank network nodes, the influence model has to match the details of the dynamic process. We study a real-world network on the social news aggregator Digg, which allows users to post and vote for news stories. We empirically define influence as the number of in-network votes a user's post generates. This influence measure, and the resulting ranking, arises entirely from the dynamics of voting on Digg, which represents non-conservative information flow. We then compare predictions of different influence models with this empirical estimate of influence. The results show that non-conservative models are better able to predict influential users on Digg. We find that normalized alpha-centrality metric turns out to be one of the best predictors of influence. We also present a simple algorithm for computing this metric and the associated mathematical formulation and analytical proofs.