Researcher profile

Neil Hurley

Neil Hurley contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - Emerging
7works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2014arXiv

Be In The Know: Connecting News Articles to Relevant Twitter Conversations

In the era of data-driven journalism, data analytics can deliver tools to support journalists in connecting to new and developing news stories, e.g., as echoed in micro-blogs such as Twitter, the new citizen-driven media. In this paper, we propose a framework for tracking and automatically connecting news articles to Twitter conversations as captured by Twitter hashtags. For example, such a system could alert journalists about news that get a lot of Twitter reaction, so that they can investigate those conversations for new developments in the story, promote their article to a set of interested consumers, or discover general sentiment towards the story. Mapping articles to appropriate hashtags is nevertheless very challenging, due to different language styles used in articles versus tweets, the streaming aspect of news and tweets, as well as the user behavior when marking certain tweet-terms as hashtags. As a case-study, we continuously track the RSS feeds of Irish Times news articles and a focused Twitter stream over a two months period, and present a system that assigns hashtags to each article, based on its Twitter echo. We propose a machine learning approach for classifying and ranking article-hashtag pairs. Our empirical study shows that our system delivers high precision for this task.

preprint2013arXiv

Normalized Mutual Information to evaluate overlapping community finding algorithms

Given the increasing popularity of algorithms for overlapping clustering, in particular in social network analysis, quantitative measures are needed to measure the accuracy of a method. Given a set of true clusters, and the set of clusters found by an algorithm, these sets of clusters must be compared to see how similar or different the sets are. A normalized measure is desirable in many contexts, for example assigning a value of 0 where the two sets are totally dissimilar, and 1 where they are identical. A measure based on normalized mutual information, [1], has recently become popular. We demonstrate unintuitive behaviour of this measure, and show how this can be corrected by using a more conventional normalization. We compare the results to that of other measures, such as the Omega index [2].

preprint2012arXiv

Partitioning Breaks Communities

Considering a clique as a conservative definition of community structure, we examine how graph partitioning algorithms interact with cliques. Many popular community-finding algorithms partition the entire graph into non-overlapping communities. We show that on a wide range of empirical networks, from different domains, significant numbers of cliques are split across the separate partitions produced by these algorithms. We then examine the largest connected component of the subgraph formed by retaining only edges in cliques, and apply partitioning strategies that explicitly minimise the number of cliques split. We further examine several modern overlapping community finding algorithms, in terms of the interaction between cliques and the communities they find, and in terms of the global overlap of the sets of communities they find. We conclude that, due to the connectedness of many networks, any community finding algorithm that produces partitions must fail to find at least some significant structures. Moreover, contrary to traditional intuition, in some empirical networks, strong ties and cliques frequently do cross community boundaries; much community structure is fundamentally overlapping and unpartitionable in nature.

preprint2012arXiv

Percolation Computation in Complex Networks

K-clique percolation is an overlapping community finding algorithm which extracts particular structures, comprised of overlapping cliques, from complex networks. While it is conceptually straightforward, and can be elegantly expressed using clique graphs, certain aspects of k-clique percolation are computationally challenging in practice. In this paper we investigate aspects of empirical social networks, such as the large numbers of overlapping maximal cliques contained within them, that make clique percolation, and clique graph representations, computationally expensive. We motivate a simple algorithm to conduct clique percolation, and investigate its performance compared to current best-in-class algorithms. We present improvements to this algorithm, which allow us to perform k-clique percolation on much larger empirical datasets. Our approaches perform much better than existing algorithms on networks exhibiting pervasively overlapping community structure, especially for higher values of k. However, clique percolation remains a hard computational problem; current algorithms still scale worse than some other overlapping community finding algorithms.

preprint2011arXiv

Diffusion in Networks With Overlapping Community Structure

In this work we study diffusion in networks with community structure. We first replicate and extend work on networks with non-overlapping community structure. We then study diffusion on network models that have overlapping community structure. We study contagions in the standard SIR model, and complex contagions thought to be better models of some social diffusion processes. Finally, we investigate diffusion on empirical networks with known overlapping community structure, by analysing the structure of such networks, and by simulating contagion on them. We find that simple and complex contagions can spread fast in networks with overlapping community structure. We also find that short paths exist through overlapping community structure on empirical networks.

preprint2011arXiv

Seeding for pervasively overlapping communities

In some social and biological networks, the majority of nodes belong to multiple communities. It has recently been shown that a number of the algorithms that are designed to detect overlapping communities do not perform well in such highly overlapping settings. Here, we consider one class of these algorithms, those which optimize a local fitness measure, typically by using a greedy heuristic to expand a seed into a community. We perform synthetic benchmarks which indicate that an appropriate seeding strategy becomes increasingly important as the extent of community overlap increases. We find that distinct cliques provide the best seeds. We find further support for this seeding strategy with benchmarks on a Facebook network and the yeast interactome.

preprint2010arXiv

Detecting highly overlapping community structure by greedy clique expansion

In complex networks it is common for each node to belong to several communities, implying a highly overlapping community structure. Recent advances in benchmarking indicate that existing community assignment algorithms that are capable of detecting overlapping communities perform well only when the extent of community overlap is kept to modest levels. To overcome this limitation, we introduce a new community assignment algorithm called Greedy Clique Expansion (GCE). The algorithm identifies distinct cliques as seeds and expands these seeds by greedily optimizing a local fitness function. We perform extensive benchmarks on synthetic data to demonstrate that GCE's good performance is robust across diverse graph topologies. Significantly, GCE is the only algorithm to perform well on these synthetic graphs, in which every node belongs to multiple communities. Furthermore, when put to the task of identifying functional modules in protein interaction data, and college dorm assignments in Facebook friendship data, we find that GCE performs competitively.