Source author record

Derek Greene

Derek Greene appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Social and Information Networks physics.soc-ph cs.CY Information Retrieval Computation and Language Digital Libraries Machine Learning Artificial Intelligence eess.IV Human-Computer Interaction Neural and Evolutionary Computing physics.data-an physics.med-ph

Catalog footprint

What is connected

29works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Cultural Analytics for Good: Building Inclusive Evaluation Frameworks for Historical IR

This work bridges the fields of information retrieval and cultural analytics to support equitable access to historical knowledge. Using the British Library BL19 digital collection (more than 35,000 works from 1700-1899), we construct a benchmark for studying changes in language, terminology and retrieval in the 19th-century fiction and non-fiction. Our approach combines expert-driven query design, paragraph-level relevance annotation, and Large Language Model (LLM) assistance to create a scalable evaluation framework grounded in human expertise. We focus on knowledge transfer from fiction to non-fiction, investigating how narrative understanding and semantic richness in fiction can improve retrieval for scholarly and factual materials. This interdisciplinary framework not only improves retrieval accuracy but also fosters interpretability, transparency, and cultural inclusivity in digital archives. Our work provides both practical evaluation resources and a methodological paradigm for developing retrieval systems that support richer, historically aware engagement with digital archives, ultimately working towards more emancipatory knowledge infrastructures.

preprint2026arXiv

MIRA: An LLM-Assisted Benchmark for Multi-Category Integrated Retrieval

Users increasingly expect modern search systems to offer a unified interface that seamlessly retrieves information from diverse data sources and formats. However, current information retrieval (IR) evaluation benchmarks have not kept pace with this development, primarily due to the lack of test collections that represent the diversity of contemporary search domains. We address this critical gap with MIRA, a novel benchmark based on a large-scale social science search platform. MIRA is designed for category-aware ranking across heterogeneous categories - Publications, Research Data, Variables, and Instruments & Tools - within a single, unified evaluation framework. The proposed collection is distinctive in several ways: (1) it is built upon real user queries, providing a more realistic basis for evaluation; (2) it covers scholarly items from four distinct categories, enabling multi-faceted evaluation; and (3) it leverages a Large Language Model to generate topic descriptions and narratives, as well as for relevance assessment with respect to these topics, substantially reducing the labor and cost of test collection generation. We release this resource to benefit the community by providing a foundational testbed for the research on multi-faceted, category-aware, integrated, or cross-category information retrieval.

preprint2022arXiv

An Analysis of Variations in the Effectiveness of Query Performance Prediction

A query performance predictor estimates the retrieval effectiveness of an IR system for a given query. An important characteristic of QPP evaluation is that, since the ground truth retrieval effectiveness for QPP evaluation can be measured with different metrics, the ground truth itself is not absolute, which is in contrast to other retrieval tasks, such as that of ad-hoc retrieval. Motivated by this argument, the objective of this paper is to investigate how such variances in the ground truth for QPP evaluation can affect the outcomes of QPP experiments. We consider this not only in terms of the absolute values of the evaluation metrics being reported (e.g. Pearson's $r$, Kendall's $τ$), but also with respect to the changes in the ranks of different QPP systems when ordered by the QPP metric scores. Our experiments reveal that the observed QPP outcomes can vary considerably, both in terms of the absolute evaluation metric values and also in terms of the relative system ranks. Through our analysis, we report the optimal combinations of QPP evaluation metric and experimental settings that are likely to lead to smaller variations in the observed results.

preprint2022arXiv

Assessing Network Representations for Identifying Interdisciplinarity

Many studies have sought to identify interdisciplinary research as a function of the diversity of disciplines identified in an article's references or citations. However, given the constant evolution of the scientific landscape, disciplinary boundaries are shifting and blurring, making it increasingly difficult to describe research within a strict taxonomy. In this work, we explore the potential for graph learning methods to learn embedded representations for research papers that encode their 'interdisciplinarity' in a citation network. This facilitates the identification of interdisciplinary research without the use of disciplinary categories. We evaluate these representations and their ability to identify interdisciplinary research, according to their utility in interdisciplinary citation prediction. We find that those representations which preserve structural equivalence in the citation graph are best able to predict distant, interdisciplinary interactions in the network, according to multiple definitions of citation distance.

preprint2022arXiv

Author Multidisciplinarity and Disciplinary Roles in Field of Study Networks

When studying large research corpora, "distant reading" methods are vital to understand the topics and trends in the corresponding research space. In particular, given the recognised benefits of multidisciplinary research, it may be important to map schools or communities of diverse research topics, and to understand the multidisciplinary role that topics play within and between these communities. This work proposes Field of Study (FoS) networks as a novel network representation for use in scientometric analysis. We describe the formation of FoS networks, which relate research topics according to the authors who publish in them, from corpora of articles in which fields of study can be identified. FoS networks are particularly useful for the distant reading of large datasets of research papers when analysed through the lens of exploring multidisciplinary science. In an evolving scientific landscape, modular communities in FoS networks offer an alternative categorisation strategy for research topics and sub-disciplines, when compared to traditional prescribed discipline classification schemes. Furthermore, structural role analysis of FoS networks can highlight important characteristics of topics in such communities. To support this, we present two case studies which explore multidisciplinary research in corpora of varying size and scope; namely, 6,323 articles relating to network science research and 4,184,011 articles relating to research on the COVID-19-pandemic.

preprint2022arXiv

Deep-QPP: A Pairwise Interaction-based Deep Learning Model for Supervised Query Performance Prediction

Motivated by the recent success of end-to-end deep neural models for ranking tasks, we present here a supervised end-to-end neural approach for query performance prediction (QPP). In contrast to unsupervised approaches that rely on various statistics of document score distributions, our approach is entirely data-driven. Further, in contrast to weakly supervised approaches, our method also does not rely on the outputs from different QPP estimators. In particular, our model leverages information from the semantic interactions between the terms of a query and those in the top-documents retrieved with it. The architecture of the model comprises multiple layers of 2D convolution filters followed by a feed-forward layer of parameters. Experiments on standard test collections demonstrate that our proposed supervised approach outperforms other state-of-the-art supervised and unsupervised approaches.

preprint2022arXiv

The Structure of Interdisciplinary Science: Uncovering and Explaining Roles in Citation Graphs

Role discovery is the task of dividing the set of nodes on a graph into classes of structurally similar roles. Modern strategies for role discovery typically rely on graph embedding techniques, which are capable of recognising complex local structures. However, when working with large, real-world networks, it is difficult to interpret or validate a set of roles identified according to these methods. In this work, motivated by advancements in the field of explainable artificial intelligence (XAI), we propose a new framework for interpreting role assignments on large graphs using small subgraph structures known as graphlets. We demonstrate our methods on a large, multidisciplinary citation network, where we successfully identify a number of important citation patterns which reflect interdisciplinary research

preprint2021arXiv

Collaboration in the Time of COVID: A Scientometric Analysis of Multidisciplinary SARS-CoV-2 Research

The novel coronavirus SARS-CoV-2 and the COVID-19 illness it causes have inspired unprecedented levels of multidisciplinary research in an effort to address a generational public health challenge. In this work we conduct a scientometric analysis of COVID-19 research, paying particular attention to the nature of collaboration that this pandemic has fostered among different disciplines. Increased multidisciplinary collaboration has been shown to produce greater scientific impact, albeit with higher co-ordination costs. As such, we consider a collection of over 166,000 COVID-19-related articles to assess the scale and diversity of collaboration in COVID-19 research, which we compare to non-COVID-19 controls before and during the pandemic. We show that COVID-19 research teams are not only significantly smaller than their non-COVID-19 counterparts, but they are also more diverse. Furthermore, we find that COVID-19 research has increased the multidisciplinarity of authors across most scientific fields of study, indicating that COVID-19 has helped to remove some of the barriers that usually exist between disparate disciplines. Finally, we highlight a number of interesting areas of multidisciplinary research during COVID-19, and propose methodologies for visualising the nature of multidisciplinary collaboration, which may have application beyond this pandemic.

preprint2020arXiv

Bone Segmentation in Contrast Enhanced Whole-Body Computed Tomography

Segmentation of bone regions allows for enhanced diagnostics, disease characterisation and treatment monitoring in CT imaging. In contrast enhanced whole-body scans accurate automatic segmentation is particularly difficult as low dose whole body protocols reduce image quality and make contrast enhanced regions more difficult to separate when relying on differences in pixel intensities. This paper outlines a U-net architecture with novel preprocessing techniques, based on the windowing of training data and the modification of sigmoid activation threshold selection to successfully segment bone-bone marrow regions from low dose contrast enhanced whole-body CT scans. The proposed method achieved mean Dice coefficients of 0.979, 0.965, and 0.934 on two internal datasets and one external test dataset respectively. We have demonstrated that appropriate preprocessing is important for differentiating between bone and contrast dye, and that excellent results can be achieved with limited data.

preprint2020arXiv

Mitigating Gender Bias in Machine Learning Data Sets

Artificial Intelligence has the capacity to amplify and perpetuate societal biases and presents profound ethical implications for society. Gender bias has been identified in the context of employment advertising and recruitment tools, due to their reliance on underlying language processing and recommendation algorithms. Attempts to address such issues have involved testing learned associations, integrating concepts of fairness to machine learning and performing more rigorous analysis of training data. Mitigating bias when algorithms are trained on textual data is particularly challenging given the complex way gender ideology is embedded in language. This paper proposes a framework for the identification of gender bias in training data for machine learning.The work draws upon gender theory and sociolinguistics to systematically indicate levels of bias in textual training data and associated neural word embedding models, thus highlighting pathways for both removing bias from training data and critically assessing its impact.

preprint2016arXiv

Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach

This study analyzes the political agenda of the European Parliament (EP) plenary, how it has evolved over time, and the manner in which Members of the European Parliament (MEPs) have reacted to external and internal stimuli when making plenary speeches. To unveil the plenary agenda and detect latent themes in legislative speeches over time, MEP speech content is analyzed using a new dynamic topic modeling method based on two layers of Non-negative Matrix Factorization (NMF). This method is applied to a new corpus of all English language legislative speeches in the EP plenary from the period 1999-2014. Our findings suggest that two-layer NMF is a valuable alternative to existing dynamic topic modeling approaches found in the literature, and can unveil niche topics and associated vocabularies not captured by existing methods. Substantively, our findings suggest that the political agenda of the EP evolves significantly over time and reacts to exogenous events such as EU Treaty referenda and the emergence of the Euro-crisis. MEP contributions to the plenary agenda are also found to be impacted upon by voting behaviour and the committee structure of the Parliament.

preprint2016arXiv

Indicators of Good Student Performance in Moodle Activity Data

In this paper we conduct an analysis of Moodle activity data focused on identifying early predictors of good student performance. The analysis shows that three relevant hypotheses are largely supported by the data. These hypotheses are: early submission is a good sign, a high level of activity is predictive of good results and evening activity is even better than daytime activity. We highlight some pathological examples where high levels of activity correlates with bad results.

preprint2015arXiv

Topic Stability over Noisy Sources

Topic modelling techniques such as LDA have recently been applied to speech transcripts and OCR output. These corpora may contain noisy or erroneous texts which may undermine topic stability. Therefore, it is important to know how well a topic modelling algorithm will perform when applied to noisy data. In this paper we show that different types of textual noise will have diverse effects on the stability of different topic models. From these observations, we propose guidelines for text corpus generation, with a focus on automatic speech transcription. We also suggest topic model selection methods for noisy corpora.

preprint2015arXiv

Unveiling the Political Agenda of the European Parliament Plenary: A Topical Analysis

This study analyzes political interactions in the European Parliament (EP) by considering how the political agenda of the plenary sessions has evolved over time and the manner in which Members of the European Parliament (MEPs) have reacted to external and internal stimuli when making Parliamentary speeches. It does so by considering the context in which speeches are made, and the content of those speeches. To detect latent themes in legislative speeches over time, speech content is analyzed using a new dynamic topic modeling method, based on two layers of matrix factorization. This method is applied to a new corpus of all English language legislative speeches in the EP plenary from the period 1999-2014. Our findings suggest that the political agenda of the EP has evolved significantly over time, is impacted upon by the committee structure of the Parliament, and reacts to exogenous events such as EU Treaty referenda and the emergence of the Euro-crisis have a significant impact on what is being discussed in Parliament.

preprint2014arXiv

A Latent Space Analysis of Editor Lifecycles in Wikipedia

Collaborations such as Wikipedia are a key part of the value of the modern Internet. At the same time there is concern that these collaborations are threatened by high levels of member turnover. In this paper we borrow ideas from topic analysis to editor activity on Wikipedia over time into a latent space that offers an insight into the evolving patterns of editor behavior. This latent space representation reveals a number of different categories of editor (e.g. content experts, social networkers) and we show that it does provide a signal that predicts an editor's departure from the community. We also show that long term editors gradually diversify their participation by shifting edit preference from one or two namespaces to multiple namespaces and experience relatively soft evolution in their editor profiles, while short term editors generally distribute their contribution randomly among the namespaces and experience considerably fluctuated evolution in their editor profiles.

preprint2014arXiv

Adaptive Representations for Tracking Breaking News on Twitter

Twitter is often the most up-to-date source for finding and tracking breaking news stories. Therefore, there is considerable interest in developing filters for tweet streams in order to track and summarize stories. This is a non-trivial text analytics task as tweets are short, and standard retrieval methods often fail as stories evolve over time. In this paper we examine the effectiveness of adaptive mechanisms for tracking and summarizing breaking news stories. We evaluate the effectiveness of these mechanisms on a number of recent news events for which manually curated timelines are available. Assessments based on ROUGE metrics indicate that an adaptive approaches are best suited for tracking evolving stories on Twitter.

preprint2014arXiv

An Analysis of Interactions Within and Between Extreme Right Communities in Social Media

Many extreme right groups have had an online presence for some time through the use of dedicated websites. This has been accompanied by increased activity in social media platforms in recent years, enabling the dissemination of extreme right content to a wider audience. In this paper, we present an analysis of the activity of a selection of such groups on Twitter, using network representations based on reciprocal follower and interaction relationships, while also analyzing topics found in their corresponding tweets. International relationships between certain extreme right groups across geopolitical boundaries are initially identified. Furthermore, we also discover stable communities of accounts within local interaction networks, in addition to associated topics, where the underlying extreme right ideology of these communities is often identifiable.

preprint2014arXiv

How Many Topics? Stability Analysis for Topic Models

Topic modeling refers to the task of discovering the underlying thematic structure in a text corpus, where the output is commonly presented as a report of the top terms appearing in each topic. Despite the diversity of topic modeling algorithms that have been proposed, a common challenge in successfully applying these techniques is the selection of an appropriate number of topics for a given corpus. Choosing too few topics will produce results that are overly broad, while choosing too many will result in the "over-clustering" of a corpus into many small, highly-similar topics. In this paper, we propose a term-centric stability analysis strategy to address this issue, the idea being that a model with an appropriate number of topics will be more robust to perturbations in the data. Using a topic modeling approach based on matrix factorization, evaluations performed on a range of corpora show that this strategy can successfully guide the model selection process.

preprint2014arXiv

Online Social Media in the Syria Conflict: Encompassing the Extremes and the In-Betweens

The Syria conflict has been described as the most socially mediated in history, with online social media playing a particularly important role. At the same time, the ever-changing landscape of the conflict leads to difficulties in applying analytical approaches taken by other studies of online political activism. Therefore, in this paper, we use an approach that does not require strong prior assumptions or the proposal of an advance hypothesis to analyze Twitter and YouTube activity of a range of protagonists to the conflict, in an attempt to reveal additional insights into the relationships between them. By means of a network representation that combines multiple data views, we uncover communities of accounts falling into four categories that broadly reflect the situation on the ground in Syria. A detailed analysis of selected communities within the anti-regime categories is provided, focusing on their central actors, preferred online platforms, and activity surrounding "real world" events. Our findings indicate that social media activity in Syria is considerably more convoluted than reported in many other studies of online political activism, suggesting that alternative analytical approaches can play an important role in this type of scenario.

preprint2014arXiv

TextLuas: Tracking and Visualizing Document and Term Clusters in Dynamic Text Data

For large volumes of text data collected over time, a key knowledge discovery task is identifying and tracking clusters. These clusters may correspond to emerging themes, popular topics, or breaking news stories in a corpus. Therefore, recently there has been increased interest in the problem of clustering dynamic data. However, there exists little support for the interactive exploration of the output of these analysis techniques, particularly in cases where researchers wish to simultaneously explore both the change in cluster structure over time and the change in the textual content associated with clusters. In this paper, we propose a model for tracking dynamic clusters characterized by the evolutionary events of each cluster. Motivated by this model, the TextLuas system provides an implementation for tracking these dynamic clusters and visualizing their evolution using a metro map metaphor. To provide overviews of cluster content, we adapt the tag cloud representation to the dynamic clustering scenario. We demonstrate the TextLuas system on two different text corpora, where they are shown to elucidate the evolution of key themes. We also describe how TextLuas was applied to a problem in bibliographic network research.

preprint2013arXiv

Discovering Latent Patterns from the Analysis of User-Curated Movie Lists

User content curation is becoming an important source of preference data, as well as providing information regarding the items being curated. One popular approach involves the creation of lists. On Twitter, these lists might contain accounts relevant to a particular topic, whereas on a community site such as the Internet Movie Database (IMDb), this might take the form of lists of movies sharing common characteristics. While list curation involves substantial combined effort on the part of users, researchers have rarely looked at mining the outputs of this kind of crowdsourcing activity. Here we study a large collection of movie lists from IMDb. We apply network analysis methods to a graph that reflects the degree to which pairs of movies are "co-listed", that is, assigned to the same lists. This allows us to uncover a more nuanced grouping of movies that goes beyond categorisation schemes based on attributes such as genre or director.

preprint2013arXiv

Normalized Mutual Information to evaluate overlapping community finding algorithms

Given the increasing popularity of algorithms for overlapping clustering, in particular in social network analysis, quantitative measures are needed to measure the accuracy of a method. Given a set of true clusters, and the set of clusters found by an algorithm, these sets of clusters must be compared to see how similar or different the sets are. A normalized measure is desirable in many contexts, for example assigning a value of 0 where the two sets are totally dissimilar, and 1 where they are identical. A measure based on normalized mutual information, [1], has recently become popular. We demonstrate unintuitive behaviour of this measure, and show how this can be corrected by using a more conventional normalization. We compare the results to that of other measures, such as the Omega index [2].

preprint2013arXiv

Producing a Unified Graph Representation from Multiple Social Network Views

In many social networks, several different link relations will exist between the same set of users. Additionally, attribute or textual information will be associated with those users, such as demographic details or user-generated content. For many data analysis tasks, such as community finding and data visualisation, the provision of multiple heterogeneous types of user data makes the analysis process more complex. We propose an unsupervised method for integrating multiple data views to produce a single unified graph representation, based on the combination of the k-nearest neighbour sets for users derived from each view. These views can be either relation-based or feature-based. The proposed method is evaluated on a number of annotated multi-view Twitter datasets, where it is shown to support the discovery of the underlying community structure in the data.

preprint2013arXiv

The Extreme Right Filter Bubble

Due to its status as the most popular video sharing platform, YouTube plays an important role in the online strategy of extreme right groups, where it is often used to host associated content such as music and other propaganda. In this paper, we develop a categorization suitable for the analysis of extreme right channels found on YouTube. By combining this with an NMF-based topic modelling method, we categorize channels originating from links propagated by extreme right Twitter accounts. This method is also used to categorize related channels, which are determined using results returned by YouTube's related video service. We identify the existence of a "filter bubble", whereby users who access an extreme right YouTube video are highly likely to be recommended further extreme right content.

preprint2013arXiv

TwitterCrowds: Techniques for Exploring Topic and Sentiment in Microblogging Data

Analysts and social scientists in the humanities and industry require techniques to help visualize large quantities of microblogging data. Methods for the automated analysis of large scale social media data (on the order of tens of millions of tweets) are widely available, but few visualization techniques exist to support interactive exploration of the results. In this paper, we present extended descriptions of ThemeCrowds and SentireCrowds, two tag-based visualization techniques for this data. We subsequently introduce a new list equivalent for both of these techniques and present a number of case studies showing them in operation. Finally, we present a formal user study to evaluate the effectiveness of these list interface equivalents when comparing them to ThemeCrowds and SentireCrowds. We find that discovering topics associated with areas of strong positive or negative sentiment is faster when using a list interface. In terms of user preference, multilevel tag clouds were found to be more enjoyable to use. Despite both interfaces being usable for all tested tasks, we have evidence to support that list interfaces can be more efficient for tasks when an appropriate ordering is known beforehand.

preprint2013arXiv

Uncovering the Wider Structure of Extreme Right Communities Spanning Popular Online Networks

Recent years have seen increased interest in the online presence of extreme right groups. Although originally composed of dedicated websites, the online extreme right milieu now spans multiple networks, including popular social media platforms such as Twitter, Facebook and YouTube. Ideally therefore, any contemporary analysis of online extreme right activity requires the consideration of multiple data sources, rather than being restricted to a single platform. We investigate the potential for Twitter to act as a gateway to communities within the wider online network of the extreme right, given its facility for the dissemination of content. A strategy for representing heterogeneous network data with a single homogeneous network for the purpose of community detection is presented, where these inherently dynamic communities are tracked over time. We use this strategy to discover and analyze persistent English and German language extreme right communities.

preprint2012arXiv

Aggregating Content and Network Information to Curate Twitter User Lists

Twitter introduced user lists in late 2009, allowing users to be grouped according to meaningful topics or themes. Lists have since been adopted by media outlets as a means of organising content around news stories. Thus the curation of these lists is important - they should contain the key information gatekeepers and present a balanced perspective on a story. Here we address this list curation process from a recommender systems perspective. We propose a variety of criteria for generating user list recommendations, based on content analysis, network analysis, and the "crowdsourcing" of existing user lists. We demonstrate that these types of criteria are often only successful for datasets with certain characteristics. To resolve this issue, we propose the aggregation of these different "views" of a news story on Twitter to produce more accurate user recommendations to support the curation process.

preprint2012arXiv

Identifying Topical Twitter Communities via User List Aggregation

A particular challenge in the area of social media analysis is how to find communities within a larger network of social interactions. Here a community may be a group of microblogging users who post content on a coherent topic, or who are associated with a specific event or news story. Twitter provides the ability to curate users into lists, corresponding to meaningful topics or themes. Here we describe an approach for crowdsourcing the list building efforts of many different Twitter users, in order to identify topical communities. This approach involves the use of ensemble community finding to produce stable groupings of user lists, and by extension, individual Twitter users. We examine this approach in the context of a case study surrounding the detection of communities on Twitter relating to the London 2012 Olympics.

preprint2011arXiv

Supporting the Curation of Twitter User Lists

Twitter introduced lists in late 2009 as a means of curating tweets into meaningful themes. Lists were quickly adopted by media companies as a means of organising content around news stories. Thus the curation of these lists is important, they should contain the key information gatekeepers and present a balanced perspective on the story. Identifying members to add to a list on an emerging topic is a delicate process. From a network analysis perspective there are a number of views on the Twitter network that can be explored, e.g. followers, retweets mentions etc. We present a process for integrating these views in order to recommend authoritative commentators to include on a list. This process is evaluated on manually curated lists about unrest in Bahrain and the Iowa caucuses for the 2012 US election.

Derek Greene

What is connected

Connect this record

See the researcher in context

Building this map preview

29 published item(s)

Cultural Analytics for Good: Building Inclusive Evaluation Frameworks for Historical IR

MIRA: An LLM-Assisted Benchmark for Multi-Category Integrated Retrieval

An Analysis of Variations in the Effectiveness of Query Performance Prediction

Assessing Network Representations for Identifying Interdisciplinarity

Author Multidisciplinarity and Disciplinary Roles in Field of Study Networks

Deep-QPP: A Pairwise Interaction-based Deep Learning Model for Supervised Query Performance Prediction

The Structure of Interdisciplinary Science: Uncovering and Explaining Roles in Citation Graphs

Collaboration in the Time of COVID: A Scientometric Analysis of Multidisciplinary SARS-CoV-2 Research

Bone Segmentation in Contrast Enhanced Whole-Body Computed Tomography

Mitigating Gender Bias in Machine Learning Data Sets

Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach

Indicators of Good Student Performance in Moodle Activity Data

Topic Stability over Noisy Sources

Unveiling the Political Agenda of the European Parliament Plenary: A Topical Analysis

A Latent Space Analysis of Editor Lifecycles in Wikipedia

Adaptive Representations for Tracking Breaking News on Twitter

An Analysis of Interactions Within and Between Extreme Right Communities in Social Media

How Many Topics? Stability Analysis for Topic Models

Online Social Media in the Syria Conflict: Encompassing the Extremes and the In-Betweens

TextLuas: Tracking and Visualizing Document and Term Clusters in Dynamic Text Data

Discovering Latent Patterns from the Analysis of User-Curated Movie Lists

Normalized Mutual Information to evaluate overlapping community finding algorithms

Producing a Unified Graph Representation from Multiple Social Network Views

The Extreme Right Filter Bubble

TwitterCrowds: Techniques for Exploring Topic and Sentiment in Microblogging Data

Uncovering the Wider Structure of Extreme Right Communities Spanning Popular Online Networks

Aggregating Content and Network Information to Curate Twitter User Lists

Identifying Topical Twitter Communities via User List Aggregation

Supporting the Curation of Twitter User Lists