Source author record

Barry Smyth

Barry Smyth appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning Social and Information Networks Digital Libraries Computation and Language econ.EM Logic in Computer Science physics.soc-ph q-fin.CP q-fin.ST

Catalog footprint

What is connected

8works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Author Multidisciplinarity and Disciplinary Roles in Field of Study Networks

When studying large research corpora, "distant reading" methods are vital to understand the topics and trends in the corresponding research space. In particular, given the recognised benefits of multidisciplinary research, it may be important to map schools or communities of diverse research topics, and to understand the multidisciplinary role that topics play within and between these communities. This work proposes Field of Study (FoS) networks as a novel network representation for use in scientometric analysis. We describe the formation of FoS networks, which relate research topics according to the authors who publish in them, from corpora of articles in which fields of study can be identified. FoS networks are particularly useful for the distant reading of large datasets of research papers when analysed through the lens of exploring multidisciplinary science. In an evolving scientific landscape, modular communities in FoS networks offer an alternative categorisation strategy for research topics and sub-disciplines, when compared to traditional prescribed discipline classification schemes. Furthermore, structural role analysis of FoS networks can highlight important characteristics of topics in such communities. To support this, we present two case studies which explore multidisciplinary research in corpora of varying size and scope; namely, 6,323 articles relating to network science research and 4,184,011 articles relating to research on the COVID-19-pandemic.

preprint2022arXiv

Exploring the Efficacy of Automatically Generated Counterfactuals for Sentiment Analysis

While state-of-the-art NLP models have been achieving the excellent performance of a wide range of tasks in recent years, important questions are being raised about their robustness and their underlying sensitivity to systematic biases that may exist in their training and test data. Such issues come to be manifest in performance problems when faced with out-of-distribution data in the field. One recent solution has been to use counterfactually augmented datasets in order to reduce any reliance on spurious patterns that may exist in the original data. Producing high-quality augmented data can be costly and time-consuming as it usually needs to involve human feedback and crowdsourcing efforts. In this work, we propose an alternative by describing and evaluating an approach to automatically generating counterfactual data for data augmentation and explanation. A comprehensive evaluation on several different datasets and using a variety of state-of-the-art benchmarks demonstrate how our approach can achieve significant improvements in model performance when compared to models training on the original data and even when compared to models trained with the benefit of human-generated augmented data.

preprint2022arXiv

NumHTML: Numeric-Oriented Hierarchical Transformer Model for Multi-task Financial Forecasting

Financial forecasting has been an important and active area of machine learning research because of the challenges it presents and the potential rewards that even minor improvements in prediction accuracy or forecasting may entail. Traditionally, financial forecasting has heavily relied on quantitative indicators and metrics derived from structured financial statements. Earnings conference call data, including text and audio, is an important source of unstructured data that has been used for various prediction tasks using deep earning and related approaches. However, current deep learning-based methods are limited in the way that they deal with numeric data; numbers are typically treated as plain-text tokens without taking advantage of their underlying numeric structure. This paper describes a numeric-oriented hierarchical transformer model to predict stock returns, and financial risk using multi-modal aligned earnings calls data by taking advantage of the different categories of numbers (monetary, temporal, percentages etc.) and their magnitude. We present the results of a comprehensive evaluation of NumHTML against several state-of-the-art baselines using a real-world publicly available dataset. The results indicate that NumHTML significantly outperforms the current state-of-the-art across a variety of evaluation metrics and that it has the potential to offer significant financial gains in a practical trading context.

preprint2022arXiv

Stock Embeddings: Learning Distributed Representations for Financial Assets

Identifying meaningful relationships between the price movements of financial assets is a challenging but important problem in a variety of financial applications. However with recent research, particularly those using machine learning and deep learning techniques, focused mostly on price forecasting, the literature investigating the modelling of asset correlations has lagged somewhat. To address this, inspired by recent successes in natural language processing, we propose a neural model for training stock embeddings, which harnesses the dynamics of historical returns data in order to learn the nuanced relationships that exist between financial assets. We describe our approach in detail and discuss a number of ways that it can be used in the financial domain. Furthermore, we present the evaluation results to demonstrate the utility of this approach, compared to several important benchmarks, in two real-world financial analytics tasks.

preprint2021arXiv

A Few Good Counterfactuals: Generating Interpretable, Plausible and Diverse Counterfactual Explanations

Counterfactual explanations provide a potentially significant solution to the Explainable AI (XAI) problem, but good, native counterfactuals have been shown to rarely occur in most datasets. Hence, the most popular methods generate synthetic counterfactuals using blind perturbation. However, such methods have several shortcomings: the resulting counterfactuals (i) may not be valid data-points (they often use features that do not naturally occur), (ii) may lack the sparsity of good counterfactuals (if they modify too many features), and (iii) may lack diversity (if the generated counterfactuals are minimal variants of one another). We describe a method designed to overcome these problems, one that adapts native counterfactuals in the original dataset, to generate sparse, diverse synthetic counterfactuals from naturally occurring features. A series of experiments are reported that systematically explore parametric variations of this novel method on common datasets to establish the conditions for optimal performance.

preprint2021arXiv

Collaboration in the Time of COVID: A Scientometric Analysis of Multidisciplinary SARS-CoV-2 Research

The novel coronavirus SARS-CoV-2 and the COVID-19 illness it causes have inspired unprecedented levels of multidisciplinary research in an effort to address a generational public health challenge. In this work we conduct a scientometric analysis of COVID-19 research, paying particular attention to the nature of collaboration that this pandemic has fostered among different disciplines. Increased multidisciplinary collaboration has been shown to produce greater scientific impact, albeit with higher co-ordination costs. As such, we consider a collection of over 166,000 COVID-19-related articles to assess the scale and diversity of collaboration in COVID-19 research, which we compare to non-COVID-19 controls before and during the pandemic. We show that COVID-19 research teams are not only significantly smaller than their non-COVID-19 counterparts, but they are also more diverse. Furthermore, we find that COVID-19 research has increased the multidisciplinarity of authors across most scientific fields of study, indicating that COVID-19 has helped to remove some of the barriers that usually exist between disparate disciplines. Finally, we highlight a number of interesting areas of multidisciplinary research during COVID-19, and propose methodologies for visualising the nature of multidisciplinary collaboration, which may have application beyond this pandemic.

preprint2020arXiv

Good Counterfactuals and Where to Find Them: A Case-Based Technique for Generating Counterfactuals for Explainable AI (XAI)

Recently, a groundswell of research has identified the use of counterfactual explanations as a potentially significant solution to the Explainable AI (XAI) problem. It is argued that (a) technically, these counterfactual cases can be generated by permuting problem-features until a class change is found, (b) psychologically, they are much more causally informative than factual explanations, (c) legally, they are GDPR-compliant. However, there are issues around the finding of good counterfactuals using current techniques (e.g. sparsity and plausibility). We show that many commonly-used datasets appear to have few good counterfactuals for explanation purposes. So, we propose a new case based approach for generating counterfactuals using novel ideas about the counterfactual potential and explanatory coverage of a case-base. The new technique reuses patterns of good counterfactuals, present in a case-base, to generate analogous counterfactuals that can explain new problems and their solutions. Several experiments show how this technique can improve the counterfactual potential and explanatory coverage of case-bases that were previously found wanting.

preprint2012arXiv

Aggregating Content and Network Information to Curate Twitter User Lists

Twitter introduced user lists in late 2009, allowing users to be grouped according to meaningful topics or themes. Lists have since been adopted by media outlets as a means of organising content around news stories. Thus the curation of these lists is important - they should contain the key information gatekeepers and present a balanced perspective on a story. Here we address this list curation process from a recommender systems perspective. We propose a variety of criteria for generating user list recommendations, based on content analysis, network analysis, and the "crowdsourcing" of existing user lists. We demonstrate that these types of criteria are often only successful for datasets with certain characteristics. To resolve this issue, we propose the aggregation of these different "views" of a news story on Twitter to produce more accurate user recommendations to support the curation process.

Barry Smyth

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Author Multidisciplinarity and Disciplinary Roles in Field of Study Networks

Exploring the Efficacy of Automatically Generated Counterfactuals for Sentiment Analysis

NumHTML: Numeric-Oriented Hierarchical Transformer Model for Multi-task Financial Forecasting

Stock Embeddings: Learning Distributed Representations for Financial Assets

A Few Good Counterfactuals: Generating Interpretable, Plausible and Diverse Counterfactual Explanations

Collaboration in the Time of COVID: A Scientometric Analysis of Multidisciplinary SARS-CoV-2 Research

Good Counterfactuals and Where to Find Them: A Case-Based Technique for Generating Counterfactuals for Explainable AI (XAI)

Aggregating Content and Network Information to Curate Twitter User Lists