Researcher profile

Stephen Roberts

Stephen Roberts contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
18works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

18 published item(s)

preprint2022arXiv

Enhancing Cross-Sectional Currency Strategies by Context-Aware Learning to Rank with Self-Attention

The performance of a cross-sectional currency strategy depends crucially on accurately ranking instruments prior to portfolio construction. While this ranking step is traditionally performed using heuristics, or by sorting the outputs produced by pointwise regression or classification techniques, strategies using Learning to Rank algorithms have recently presented themselves as competitive and viable alternatives. Although the rankers at the core of these strategies are learned globally and improve ranking accuracy on average, they ignore the differences between the distributions of asset features over the times when the portfolio is rebalanced. This flaw renders them susceptible to producing sub-optimal rankings, possibly at important periods when accuracy is actually needed the most. For example, this might happen during critical risk-off episodes, which consequently exposes the portfolio to substantial, unwanted drawdowns. We tackle this shortcoming with an analogous idea from information retrieval: that a query's top retrieved documents or the local ranking context provide vital information about the query's own characteristics, which can then be used to refine the initial ranked list. In this work, we use a context-aware Learning-to-rank model that is based on the Transformer architecture to encode top/bottom ranked assets, learn the context and exploit this information to re-rank the initial results. Backtesting on a slate of 31 currencies, our proposed methodology increases the Sharpe ratio by around 30% and significantly enhances various performance metrics. Additionally, this approach also improves the Sharpe ratio when separately conditioning on normal and risk-off market states.

preprint2022arXiv

One-Shot Transfer Learning of Physics-Informed Neural Networks

Solving differential equations efficiently and accurately sits at the heart of progress in many areas of scientific research, from classical dynamical systems to quantum mechanics. There is a surge of interest in using Physics-Informed Neural Networks (PINNs) to tackle such problems as they provide numerous benefits over traditional numerical approaches. Despite their potential benefits for solving differential equations, transfer learning has been under explored. In this study, we present a general framework for transfer learning PINNs that results in one-shot inference for linear systems of both ordinary and partial differential equations. This means that highly accurate solutions to many unknown differential equations can be obtained instantaneously without retraining an entire network. We demonstrate the efficacy of the proposed deep learning approach by solving several real-world problems, such as first- and second-order linear ordinary equations, the Poisson equation, and the time-dependent Schrodinger complex-value partial differential equation.

preprint2022arXiv

The ACM Multimedia 2022 Computational Paralinguistics Challenge: Vocalisations, Stuttering, Activity, & Mosquitoes

The ACM Multimedia 2022 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the Vocalisations and Stuttering Sub-Challenges, a classification on human non-verbal vocalisations and speech has to be made; the Activity Sub-Challenge aims at beyond-audio human activity recognition from smartwatch sensor data; and in the Mosquitoes Sub-Challenge, mosquitoes need to be detected. We describe the Sub-Challenges, baseline feature extraction, and classifiers based on the usual ComPaRE and BoAW features, the auDeep toolkit, and deep feature extraction from pre-trained CNNs using the DeepSpectRum toolkit; in addition, we add end-to-end sequential modelling, and a log-mel-128-BNN.

preprint2021arXiv

Deep Learning for Portfolio Optimization

We adopt deep learning models to directly optimise the portfolio Sharpe ratio. The framework we present circumvents the requirements for forecasting expected returns and allows us to directly optimise portfolio weights by updating model parameters. Instead of selecting individual assets, we trade Exchange-Traded Funds (ETFs) of market indices to form a portfolio. Indices of different asset classes show robust correlations and trading them substantially reduces the spectrum of available assets to choose from. We compare our method with a wide range of algorithms with results showing that our model obtains the best performance over the testing period, from 2011 to the end of April 2020, including the financial instabilities of the first quarter of 2020. A sensitivity analysis is included to understand the relevance of input features and we further study the performance of our approach under different cost rates and different risk levels via volatility scaling.

preprint2021arXiv

Explicit Regularisation in Gaussian Noise Injections

We study the regularisation induced in neural networks by Gaussian noise injections (GNIs). Though such injections have been extensively studied when applied to data, there have been few studies on understanding the regularising effect they induce when applied to network activations. Here we derive the explicit regulariser of GNIs, obtained by marginalising out the injected noise, and show that it penalises functions with high-frequency components in the Fourier domain; particularly in layers closer to a neural network's output. We show analytically and empirically that such regularisation produces calibrated classifiers with large classification margins.

preprint2021arXiv

Improving VAEs' Robustness to Adversarial Attack

Variational autoencoders (VAEs) have recently been shown to be vulnerable to adversarial attacks, wherein they are fooled into reconstructing a chosen target image. However, how to defend against such attacks remains an open problem. We make significant advances in addressing this issue by introducing methods for producing adversarially robust VAEs. Namely, we first demonstrate that methods proposed to obtain disentangled latent representations produce VAEs that are more robust to these attacks. However, this robustness comes at the cost of reducing the quality of the reconstructions. We ameliorate this by applying disentangling methods to hierarchical VAEs. The resulting models produce high-fidelity autoencoders that are also adversarially robust. We confirm their capabilities on several different datasets and with current state-of-the-art VAE adversarial attacks, and also show that they increase the robustness of downstream tasks to attack.

preprint2021arXiv

Learning Bijective Feature Maps for Linear ICA

Separating high-dimensional data like images into independent latent factors, i.e independent component analysis (ICA), remains an open research problem. As we show, existing probabilistic deep generative models (DGMs), which are tailor-made for image data, underperform on non-linear ICA tasks. To address this, we propose a DGM which combines bijective feature maps with a linear ICA model to learn interpretable latent structures for high-dimensional data. Given the complexities of jointly training such a hybrid model, we introduce novel theory that constrains linear ICA to lie close to the manifold of orthogonal rectangular matrices, the Stiefel manifold. By doing so we create models that converge quickly, are easy to train, and achieve better unsupervised latent factor discovery than flow-based models, linear ICA, and Variational Autoencoders on images.

preprint2021arXiv

Relaxed-Responsibility Hierarchical Discrete VAEs

Successfully training Variational Autoencoders (VAEs) with a hierarchy of discrete latent variables remains an area of active research. Vector-Quantised VAEs are a powerful approach to discrete VAEs, but naive hierarchical extensions can be unstable when training. Leveraging insights from classical methods of inference we introduce \textit{Relaxed-Responsibility Vector-Quantisation}, a novel way to parameterise discrete latent variables, a refinement of relaxed Vector-Quantisation that gives better performance and more stable training. This enables a novel approach to hierarchical discrete variational autoencoders with numerous layers of latent variables (here up to 32) that we train end-to-end. Within hierarchical probabilistic deep generative models with discrete latent variables trained end-to-end, we achieve state-of-the-art bits-per-dim results for various standard datasets. % Unlike discrete VAEs with a single layer of latent variables, we can produce samples by ancestral sampling: it is not essential to train a second autoregressive generative model over the learnt latent representations to then sample from and then decode. % Moreover, that latter approach in these deep hierarchical models would require thousands of forward passes to generate a single sample. Further, we observe different layers of our model become associated with different aspects of the data.

preprint2021arXiv

Towards a Theoretical Understanding of the Robustness of Variational Autoencoders

We make inroads into understanding the robustness of Variational Autoencoders (VAEs) to adversarial attacks and other input perturbations. While previous work has developed algorithmic approaches to attacking and defending VAEs, there remains a lack of formalization for what it means for a VAE to be robust. To address this, we develop a novel criterion for robustness in probabilistic models: $r$-robustness. We then use this to construct the first theoretical results for the robustness of VAEs, deriving margins in the input space for which we can provide guarantees about the resulting reconstruction. Informally, we are able to define a region within which any perturbation will produce a reconstruction that is similar to the original reconstruction. To support our analysis, we show that VAEs trained using disentangling methods not only score well under our robustness metrics, but that the reasons for this can be interpreted through our theoretical results.

preprint2020arXiv

Adversarial Robustness Guarantees for Classification with Gaussian Processes

We investigate adversarial robustness of Gaussian Process Classification (GPC) models. Given a compact subset of the input space $T\subseteq \mathbb{R}^d$ enclosing a test point $x^*$ and a GPC trained on a dataset $\mathcal{D}$, we aim to compute the minimum and the maximum classification probability for the GPC over all the points in $T$. In order to do so, we show how functions lower- and upper-bounding the GPC output in $T$ can be derived, and implement those in a branch and bound optimisation algorithm. For any error threshold $ε> 0$ selected a priori, we show that our algorithm is guaranteed to reach values $ε$-close to the actual values in finitely many iterations. We apply our method to investigate the robustness of GPC models on a 2D synthetic dataset, the SPAM dataset and a subset of the MNIST dataset, providing comparisons of different GPC training techniques, and show how our method can be used for interpretability analysis. Our empirical analysis suggests that GPC robustness increases with more accurate posterior estimation.

preprint2020arXiv

DeepLOB: Deep Convolutional Neural Networks for Limit Order Books

We develop a large-scale deep learning model to predict price movements from limit order book (LOB) data of cash equities. The architecture utilises convolutional filters to capture the spatial structure of the limit order books as well as LSTM modules to capture longer time dependencies. The proposed network outperforms all existing state-of-the-art algorithms on the benchmark LOB dataset [1]. In a more realistic setting, we test our model by using one year market quotes from the London Stock Exchange and the model delivers a remarkably stable out-of-sample prediction accuracy for a variety of instruments. Importantly, our model translates well to instruments which were not part of the training set, indicating the model's ability to extract universal features. In order to better understand these features and to go beyond a "black box" model, we perform a sensitivity analysis to understand the rationale behind the model predictions and reveal the components of LOBs that are most relevant. The ability to extract robust features which translate well to other instruments is an important property of our model which has many other applications.

preprint2020arXiv

HumBug Zooniverse: a crowd-sourced acoustic mosquito dataset

Mosquitoes are the only known vector of malaria, which leads to hundreds of thousands of deaths each year. Understanding the number and location of potential mosquito vectors is of paramount importance to aid the reduction of malaria transmission cases. In recent years, deep learning has become widely used for bioacoustic classification tasks. In order to enable further research applications in this field, we release a new dataset of mosquito audio recordings. With over a thousand contributors, we obtained 195,434 labels of two second duration, of which approximately 10 percent signify mosquito events. We present an example use of the dataset, in which we train a convolutional neural network on log-Mel features, showcasing the information content of the labels. We hope this will become a vital resource for those researching all aspects of malaria, and add to the existing audio datasets for bioacoustic detection and signal processing.

preprint2020arXiv

Investment sizing with deep learning prediction uncertainties for high-frequency Eurodollar futures trading

In this work we show that prediction uncertainty estimates gleaned from deep learning models can be useful inputs for influencing the relative allocation of risk capital across trades. In this way, consideration of uncertainty is important because it permits the scaling of investment size across trade opportunities in a principled and data-driven way. We showcase this insight with a prediction model and find clear outperformance based on a Sharpe ratio metric, relative to trading strategies that either do not take uncertainty into account, or that utilize an alternative market-based statistic as a proxy for uncertainty. Of added novelty is our modelling of high-frequency data at the top level of the Eurodollar Futures limit order book for each trading day of 2018, whereby we predict interest rate curve changes on small time horizons. We are motivated to study the market for these popularly-traded interest rate derivatives since it is deep and liquid, and contributes to the efficient functioning of global finance -- though there is relatively little by way of its modelling contained in the academic literature. Hence, we verify the utility of prediction models and uncertainty estimates for trading applications in this complex and multi-dimensional asset price space.

preprint2020arXiv

Practical Bayesian Learning of Neural Networks via Adaptive Optimisation Methods

We introduce a novel framework for the estimation of the posterior distribution over the weights of a neural network, based on a new probabilistic interpretation of adaptive optimisation algorithms such as AdaGrad and Adam. We demonstrate the effectiveness of our Bayesian Adam method, Badam, by experimentally showing that the learnt uncertainties correctly relate to the weights' predictive capabilities by weight pruning. We also demonstrate the quality of the derived uncertainty measures by comparing the performance of Badam to standard methods in a Thompson sampling setting for multi-armed bandits, where good uncertainty measures are required for an agent to balance exploration and exploitation.

preprint2020arXiv

Ready Policy One: World Building Through Active Learning

Model-Based Reinforcement Learning (MBRL) offers a promising direction for sample efficient learning, often achieving state of the art results for continuous control tasks. However, many existing MBRL methods rely on combining greedy policies with exploration heuristics, and even those which utilize principled exploration bonuses construct dual objectives in an ad hoc fashion. In this paper we introduce Ready Policy One (RP1), a framework that views MBRL as an active learning problem, where we aim to improve the world model in the fewest samples possible. RP1 achieves this by utilizing a hybrid objective function, which crucially adapts during optimization, allowing the algorithm to trade off reward v.s. exploration at different stages of learning. In addition, we introduce a principled mechanism to terminate sample collection once we have a rich enough trajectory batch to improve the model. We rigorously evaluate our method on a variety of continuous control tasks, and demonstrate statistically significant gains over existing approaches.

preprint2020arXiv

SafePILCO: a software tool for safe and data-efficient policy synthesis

SafePILCO is a software tool for safe and data-efficient policy search with reinforcement learning. It extends the known PILCO algorithm, originally written in MATLAB, to support safe learning. We provide a Python implementation and leverage existing libraries that allow the codebase to remain short and modular, which is appropriate for wider use by the verification, reinforcement learning, and control communities.

preprint2020arXiv

Safety Guarantees for Planning Based on Iterative Gaussian Processes

Gaussian Processes (GPs) are widely employed in control and learning because of their principled treatment of uncertainty. However, tracking uncertainty for iterative, multi-step predictions in general leads to an analytically intractable problem. While approximation methods exist, they do not come with guarantees, making it difficult to estimate their reliability and to trust their predictions. In this work, we derive formal probability error bounds for iterative prediction and planning with GPs. Building on GP properties, we bound the probability that random trajectories lie in specific regions around the predicted values. Namely, given a tolerance $ε> 0 $, we compute regions around the predicted trajectory values, such that GP trajectories are guaranteed to lie inside them with probability at least $1-ε$. We verify experimentally that our method tracks the predictive uncertainty correctly, even when current approximation techniques fail. Furthermore, we show how the proposed bounds can be employed within a safe reinforcement learning framework to verify the safety of candidate control policies, guiding the synthesis of provably safe controllers.

preprint2019arXiv

Using Sparse Gaussian Processes for Predicting Robust Inertial Confinement Fusion Implosion Yields

Here we present the application of an advanced Sparse Gaussian Process based machine learning algorithm to the challenge of predicting the yields of inertial confinement fusion (ICF) experiments. The algorithm is used to investigate the parameter space of an extremely robust ICF design for the National Ignition Facility, the `Simplest Design'; deuterium-tritium gas in a plastic ablator with a Gaussian, Planckian drive. In particular we show that i) GPz has the potential to decompose uncertainty on predictions into uncertainty from lack of data and shot-to-shot variation, ii) permits the incorporation of science-goal specific cost-sensitive learning e.g. focussing on the high-yield parts of parameter space and iii) is very fast and effective in high dimensions.