Source author record

Tim Gebbie

Tim Gebbie appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

q-fin.CP Machine Learning q-fin.ST Distributed, Parallel, and Cluster Computing physics.data-an q-fin.GN q-fin.PM q-fin.TR cond-mat.stat-mech Multiagent Systems Neural and Evolutionary Computing physics.soc-ph q-fin.RM

Catalog footprint

What is connected

11works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

Agglomerative Likelihood Clustering

We consider the problem of fast time-series data clustering. Building on previous work modeling the correlation-based Hamiltonian of spin variables we present an updated fast non-expensive Agglomerative Likelihood Clustering algorithm (ALC). The method replaces the optimized genetic algorithm based approach (f-SPC) with an agglomerative recursive merging framework inspired by previous work in Econophysics and Community Detection. The method is tested on noisy synthetic correlated time-series data-sets with built-in cluster structure to demonstrate that the algorithm produces meaningful non-trivial results. We apply it to time-series data-sets as large as 20,000 assets and we argue that ALC can reduce compute time costs and resource usage cost for large scale clustering for time-series applications while being serialized, and hence has no obvious parallelization requirement. The algorithm can be an effective choice for state-detection for online learning in a fast non-linear data environment because the algorithm requires no prior information about the number of clusters.

preprint2021arXiv

CoinTossX: An open-source low-latency high-throughput matching engine

We deploy and demonstrate the CoinTossX low-latency, high-throughput, open-source matching engine with orders sent using the Julia and Python languages. We show how this can be deployed for small-scale local desk-top testing and discuss a larger scale, but local hosting, with multiple traded instruments managed concurrently and managed by multiple clients. We then demonstrate a cloud based deployment using Microsoft Azure, with large-scale industrial and simulation research use cases in mind. The system is exposed and interacted with via sockets using UDP SBE message protocols and can be monitored using a simple web browser interface using HTTP. We give examples showing how orders can be be sent to the system and market data feeds monitored using the Julia and Python languages. The system is developed in Java with orders submitted as binary encodings (SBE) via UDP protocols using the Aeron Media Driver as the low-latency, high throughput message transport. The system separates the order-generation and simulation environments e.g. agent-based model simulation, from the matching of orders, data-feeds and various modularised components of the order-book system. This ensures a more natural and realistic asynchronicity between events generating orders, and the events associated with order-book dynamics and market data-feeds. We promote the use of Julia as the preferred order submission and simulation environment.

preprint2021arXiv

The efficient frontiers of mean-variance portfolio rules under distribution misspecification

Mean-variance portfolio decisions that combine prediction and optimisation have been shown to have poor empirical performance. Here, we consider the performance of various shrinkage methods by their efficient frontiers under different distributional assumptions to study the impact of reasonable departures from Normality. Namely, we investigate the impact of first-order auto-correlation, second-order auto-correlation, skewness, and excess kurtosis. We show that the shrinkage methods tend to re-scale the sample efficient frontier, which can change based on the nature of local perturbations from Normality. This re-scaling implies that the standard approach of comparing decision rules for a fixed level of risk aversion is problematic, and more so in a dynamic market setting. Our results suggest that comparing efficient frontiers has serious implications which oppose the prevailing thinking in the literature. Namely, that sample estimators out-perform Stein type estimators of the mean, and that improving the prediction of the covariance has greater importance than improving that of the means.

preprint2020arXiv

A Framework for Online Investment Algorithms

The artificial segmentation of an investment management process into a workflow with silos of offline human operators can restrict silos from collectively and adaptively pursuing a unified optimal investment goal. To meet the investor's objectives, an online algorithm can provide an explicit incremental approach that makes sequential updates as data arrives at the process level. This is in stark contrast to offline (or batch) processes that are focused on making component level decisions prior to process level integration. Here we present and report results for an integrated, and online framework for algorithmic portfolio management. This article provides a workflow that can in-turn be embedded into a process level learning framework. The workflow can be enhanced to refine signal generation and asset-class evolution and definitions. Our results confirm that we can use our framework in conjunction with resampling methods to outperform naive market capitalisation benchmarks while making clear the extent of back-test over-fitting. We consider such an online update framework to be a crucial step towards developing intelligent portfolio selection algorithms that integrate financial theory, investor views, and data analysis with process-level learning.

preprint2020arXiv

Learning low-frequency temporal patterns for quantitative trading

We consider the viability of a modularised mechanistic online machine learning framework to learn signals in low-frequency financial time series data. The framework is proved on daily sampled closing time-series data from JSE equity markets. The input patterns are vectors of pre-processed sequences of daily, weekly and monthly or quarterly sampled feature changes. The data processing is split into a batch processed step where features are learnt using a stacked autoencoder via unsupervised learning, and then both batch and online supervised learning are carried out using these learnt features, with the output being a point prediction of measured time-series feature fluctuations. Weight initializations are implemented with restricted Boltzmann machine pre-training, and variance based initializations. Historical simulations are then run using an online feedforward neural network initialised with the weights from the batch training and validation step. The validity of results are considered under a rigorous assessment of backtest overfitting using both combinatorially symmetrical cross validation and probabilistic and deflated Sharpe ratios. Results are used to develop a view on the phenomenology of financial markets and the value of complex historical data-analysis for trading under the unstable adaptive dynamics that characterise financial markets.

preprint2020arXiv

Revisiting the Epps effect using volume time averaging: An exercise in R

We revisit and demonstrate the Epps effect using two well-known non-parametric covariance estimators; the Malliavin and Mancino (MM), and Hayashi and Yoshida (HY) estimators. We show the existence of the Epps effect in the top 10 stocks from the Johannesburg Stock Exchange (JSE) by various methods of aggregating Trade and Quote (TAQ) data. Concretely, we compare calendar time sampling with two volume time sampling methods: asset intrinsic volume time averaging, and volume time averaging synchronised in volume time across assets relative to the least and most liquid asset clocks. We reaffirm the argument made in much of the literature that the MM estimator is more representative of trade time reality because it does not over-estimate short-term correlations in an asynchronous event driven world. We confirm well known market phenomenology with the aim of providing some standardised R based simulation tools.

preprint2019arXiv

Fast Super-Paramagnetic Clustering

We map stock market interactions to spin models to recover their hierarchical structure using a simulated annealing based Super-Paramagnetic Clustering (SPC) algorithm. This is directly compared to a modified implementation of a maximum likelihood approach we call Fast Super-Paramagnetic Clustering (f-SPC). The methods are first applied standard toy test-case problems, and then to a data-set of 447 stocks traded on the New York Stock Exchange (NYSE) over 1249 days. The signal to noise ratio of stock market correlation matrices is briefly considered. Our result recover approximately clusters representative of standard economic sectors and mixed ones whose dynamics shine light on the adaptive nature of financial markets and raise concerns relating to the effectiveness of industry based static financial market classification in the world of real-time data analytics. A key result is that we show that f-SPC maximum likelihood solutions converge to ones found within the Super-Paramagnetic Phase where the entropy is maximum, and those solutions are qualitatively better for high dimensionality data-sets.

preprint2015arXiv

High-speed detection of emergent market clustering via an unsupervised parallel genetic algorithm

We implement a master-slave parallel genetic algorithm (PGA) with a bespoke log-likelihood fitness function to identify emergent clusters within price evolutions. We use graphics processing units (GPUs) to implement a PGA and visualise the results using disjoint minimal spanning trees (MSTs). We demonstrate that our GPU PGA, implemented on a commercially available general purpose GPU, is able to recover stock clusters in sub-second speed, based on a subset of stocks in the South African market. This represents a pragmatic choice for low-cost, scalable parallel computing and is significantly faster than a prototype serial implementation in an optimised C-based fourth-generation programming language, although the results are not directly comparable due to compiler differences. Combined with fast online intraday correlation matrix estimation from high frequency data for cluster identification, the proposed implementation offers cost-effective, near-real-time risk assessment for financial practitioners.

preprint2014arXiv

Hierarchical causality in financial economics

Hierarchical analysis is considered and a multilevel model is presented in order to explore causality, chance and complexity in financial economics. A coupled system of models is used to describe multilevel interactions, consistent with market data: the lowest level is occupied by agents generating the prices of individual traded assets; the next level entails aggregation of stocks into markets; the third level combines shared risk factors with information variables and bottom-up, agent-generated structure, consistent with conditions for no-arbitrage pricing theory; the fourth level describes market factors which originate in the greater economy and the highest levels are described by regulated market structure and the customs and ethics which define the nature of acceptable transactions. A mechanism for emergence or innovation is considered and causal sources are discussed in terms of five causation classes.

preprint2013arXiv

Factorising equity returns in an emerging market through exogenous shocks and capital flows

A technique from stochastic portfolio theory [Fernholz, 1998] is applied to analyse equity returns of Small, Mid and Large cap portfolios in an emerging market through periods of growth and regional crises, up to the onset of the global financial crisis. In particular, we factorize portfolios in the South African market in terms of distribution of capital, change of stock ranks in portfolios, and the effect due to dividends for the period Nov 1994 to May 2007. We discuss the results in the context of broader economic thinking to consider capital flows as risk factors, turning around more established approaches which use macroeconomic and socio-economic conditions to explain Foreign Direct Investment (into the economy) and Net Portfolio Investment (into equity and bond markets).

preprint2006arXiv

An analysis of Cross-correlations in South African Market data

We apply random matrix theory to compare correlation matrix estimators C obtained from emerging market data. The correlation matrices are constructed from 10 years of daily data for stocks listed on the Johannesburg Stock Exchange (JSE) from January 1993 to December 2002. We test the spectral properties of C against random matrix predictions and find some agreement between the distributions of eigenvalues, nearest neighbour spacings, distributions of eigenvector components and the inverse participation ratios for eigenvectors. We show that interpolating both missing data and illiquid trading days with a zero-order hold increases agreement with RMT predictions. For the more realistic estimation of correlations in an emerging market, we suggest a pairwise measured-data correlation matrix. For the data set used, this approach suggests greater temporal stability for the leading eigenvectors. An interpretation of eigenvectors in terms of trading strategies is given in lieu of classification by economic sectors.

Tim Gebbie

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Agglomerative Likelihood Clustering

CoinTossX: An open-source low-latency high-throughput matching engine

The efficient frontiers of mean-variance portfolio rules under distribution misspecification

A Framework for Online Investment Algorithms

Learning low-frequency temporal patterns for quantitative trading

Revisiting the Epps effect using volume time averaging: An exercise in R

Fast Super-Paramagnetic Clustering

High-speed detection of emergent market clustering via an unsupervised parallel genetic algorithm

Hierarchical causality in financial economics

Factorising equity returns in an emerging market through exogenous shocks and capital flows

An analysis of Cross-correlations in South African Market data