Researcher profile

Nick Brown

Nick Brown contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2023arXiv

Task-based preemptive scheduling on FPGAs leveraging partial reconfiguration

FPGAs are an attractive type of accelerator for all-purpose HPC computing systems due to the possibility of deploying tailored hardware on demand. However, the common tools for programming and operating FPGAs are still complex to use, especially in scenarios where diverse types of tasks should be dynamically executed. In this work we present a programming abstraction with a simple interface that internally leverages High-Level Synthesis, Dynamic Partial Reconfiguration and synchronisation mechanisms to use an FPGA as a multi-tasking server with preemptive scheduling and priority queues. This leads to an improved use of the FPGA resources, allowing the execution of several different kernels concurrently and deploying the most urgent ones as fast as possible. The results of our experimental study show that our approach incurs only a 10% overhead in the worst case when using two reconfigurable regions, whilst providing a significant performance improvement of at least 24% over the traditional full reconfiguration approach.

preprint2022arXiv

Low-power option Greeks: Efficiency-driven market risk analysis using FPGAs

Quantitative finance is the use of mathematical models to analyse financial markets and securities. Typically requiring significant amounts of computation, an important question is the role that novel architectures can play in accelerating these models. In this paper we explore the acceleration of the industry standard Securities Technology Analysis Center's (STAC) derivatives risk analysis benchmark STAC-A2\texttrademark{} by porting the Heston stochastic volatility model and Longstaff and Schwartz path reduction onto a Xilinx Alveo U280 FPGA with a focus on efficiency-driven computing. Describing in detail the steps undertaken to optimise the algorithm for the FPGA, we then leverage the flexibility provided by the reconfigurable architecture to explore choices around numerical precision and representation. Insights gained are then exploited in our final performance and energy measurements, where for the efficiency improvement metric we achieve between an 8 times and 185 times improvement on the FPGA compared to two 24-core Intel Xeon Platinum CPUs. The result of this work is not only a show-case for the market risk analysis workload on FPGAs, but furthermore a set of efficiency driven techniques and lessons learnt that can be applied to quantitative finance and computational workloads on reconfigurable architectures more generally.

preprint2022arXiv

Predicting batch queue job wait times for informed scheduling of urgent HPC workloads

There is increasing interest in the use of HPC machines for urgent workloads to help tackle disasters as they unfold. Whilst batch queue systems are not ideal in supporting such workloads, many disadvantages can be worked around by accurately predicting when a waiting job will start to run. However there are numerous challenges in achieving such a prediction with high accuracy, not least because the queue's state can change rapidly and depend upon many factors. In this work we explore a novel machine learning approach for predicting queue wait times, hypothesising that such a model can capture the complex behaviour resulting from the queue policy and other interactions to generate accurate job start times. For ARCHER2 (HPE Cray EX), Cirrus (HPE 8600) and 4-cabinet (HPE Cray EX) we explore how different machine learning approaches and techniques improve the accuracy of our predictions, comparing against the estimation generated by Slurm. We demonstrate that our techniques deliver the most accurate predictions across our machines of interest, with the result of this work being the ability to predict job start times within one minute of the actual start time for around 65\% of jobs on ARCHER2 and 4-cabinet, and 76\% of jobs on Cirrus. When compared against what Slurm can deliver, this represents around 3.8 times better accuracy on ARCHER2 and 18 times better for Cirrus. Furthermore our approach can accurately predicting the start time for three quarters of all job within ten minutes of the actual start time on ARCHER2 and 4-cabinet, and for 90\% of jobs on Cirrus. Whilst the driver of this work has been to better facilitate placement of urgent workloads across HPC machines, the insights gained can be used to provide wider benefits to users and also enrich existing batch queue systems and inform policy too.

preprint2022arXiv

Programming abstractions for preemptive scheduling in FPGAs using partial reconfiguration

FPGAs are an attractive type of accelerator for all-purpose HPC computing systems due to the possibility of deploying tailored hardware on demand. However, the common tools for programming and operating FPGAs are still complex to use, specially in scenarios where diverse types of tasks should be dynamically executed. In this work we present a programming abstraction with a simple interface that internally leverages High-Level Synthesis, Dynamic Partial Reconfiguration and synchronisation mechanisms to use an FPGA as a multi-tasking server with preemptive scheduling and priority queues. This leads to a better use of the FPGA resources, allowing the execution of several kernels at the same time and deploying the most urgent ones as fast as possible. The results of our experimental study show that our approach incurs only a 1.66% overhead when using only one Reconfigurable Region (RR), and 4.04% when using two RRs, whilst presenting a significant performance improvement over the traditional non-preemptive full reconfiguration approach.

preprint2022arXiv

Workflows to driving high-performance interactive supercomputing for urgent decision making

Interactive urgent computing is a small but growing user of supercomputing resources. However there are numerous technical challenges that must be overcome to make supercomputers fully suited to the wide range of urgent workloads which could benefit from the computational power delivered by such instruments. An important question is how to connect the different components of an urgent workload; namely the users, the simulation codes, and external data sources, together in a structured and accessible manner. In this paper we explore the role of workflows from both the perspective of marshalling and control of urgent workloads, and at the individual HPC machine level. Ultimately requiring two workflow systems, by using a space weather prediction urgent use-cases, we explore the benefit that these two workflow systems provide especially when one exploits the flexibility enabled by them interoperating.

preprint2021arXiv

Compact Native Code Generation for Dynamic Languages on Micro-core Architectures

Micro-core architectures combine many simple, low memory, low power-consuming CPU cores onto a single chip. Potentially providing significant performance and low power consumption, this technology is not only of great interest in embedded, edge, and IoT uses, but also potentially as accelerators for data-center workloads. Due to the restricted nature of such CPUs, these architectures have traditionally been challenging to program, not least due to the very constrained amounts of memory (often around 32KB) and idiosyncrasies of the technology. However, more recently, dynamic languages such as Python have been ported to a number of micro-cores, but these are often delivered as interpreters which have an associated performance limitation. Targeting the four objectives of performance, unlimited code-size, portability between architectures, and maintaining the programmer productivity benefits of dynamic languages, the limited memory available means that classic techniques employed by dynamic language compilers, such as just-in-time (JIT), are simply not feasible. In this paper we describe the construction of a compilation approach for dynamic languages on micro-core architectures which aims to meet these four objectives, and use Python as a vehicle for exploring the application of this in replacing the existing micro-core interpreter. Our experiments focus on the metrics of performance, architecture portability, minimum memory size, and programmer productivity, comparing our approach against that of writing native C code. The outcome of this work is the identification of a series of techniques that are not only suitable for compiling Python code, but also applicable to a wide variety of dynamic languages on micro-cores.