Source author record

David Thomas

David Thomas appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing eess.SP Information Theory math.IT Artificial Intelligence Computational Engineering, Finance, and Science eess.IV Machine Learning nlin.CD Performance physics.med-ph Programming Languages Systems and Control

Catalog footprint

What is connected

7works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Priming: Hybrid State Space Models From Pre-trained Transformers

Hybrid State-Space models combine Attention with recurrent State-Space Model (SSM) layers, balancing eidetic memory from Attention with compressed fading memory from SSMs. This yields smaller Key-Value caches and faster decoding than Transformers, along with a richer architectural design space. Exploring that design space at scale has so far required training from scratch, a barrier that has kept most large-model Hybrid research within a narrow range of architectures. We introduce Priming, a method that turns Hybrid architecture design from a pre-training problem into a knowledge transfer one. Priming initializes a Hybrid model from a pre-trained Transformer and, through short alignment and post-training phases, recovers downstream quality using less than 0.5% of the source model's pre-training token budget. Priming is agnostic to the source Transformer family (e.g., Qwen, Llama, Mistral), model class (dense or Mixture-of-Experts), and model scale. Priming enables us to run the first controlled comparison of SSM layer types at scale under identical conditions. We evaluate, Gated KalmaNet (GKA), Gated DeltaNet (GDN), and Mamba-2, and show that their expressiveness hierarchy, GKA>GDN>Mamba-2, directly predicts downstream performance on long-context reasoning tasks. We scale Priming to 8B/32B reasoning models with native 128K contexts. Our Hybrid GKA 32B improves over its source Qwen3-32B by +3.8 average reasoning points, while staying within 1% of a Transformer post-trained on the same data and enabling up to 2.3x higher decode throughput. To foster research on Hybrid architectures, we release a model zoo of primed Hybrid models for long-context reasoning and instruction following, together with the Priming training and inference code (Sequence Parallelism algorithms for long-context training, optimized GKA kernels, and vLLM serving plugin), all under Apache~2.0 License.

preprint2021arXiv

Near-field Image Transmission and EVM Measurements in Rich Scattering Environment

In this work, we present near-field image transmission and error vector magnitude measurement in a rich scattering environment in a metal enclosure. We check the effect of loading metal enclosure on the performance of SDR based near-field communication link. We focus on the key communication receiver parameters to observe the effect of near-field link in presence of rich-scattering and in presence of loading with RF absorber cones. The near-field performance is measured by transmitting wideband OFDM-modulated packets containing image information. Our finding suggests that the performance of OFDM based wideband near-field communication improves when the metal enclosure is loaded with RF absorbers. Near-field EVM improves when the enclosure is loaded with RF absorber cones. Loading of the metal enclosure has the effect of increased coherence bandwidth. Frequency selectivity was observed in an empty enclosure which suggests coherence bandwidth less than the signal bandwidth.

preprint2021arXiv

Statistical Characterization of Wireless MIMO Channels in Mode-Stirred Enclosures

We present the statistical characterization of a 2x2 Multiple-Input Multiple-Output wireless link operated in a mode-stirred enclosure, with channel state information available only at the receiver (agnostic transmitter). Our wireless channel measurements are conducted in absence of line of sight and varying the inter-element spacing between the two antenna elements in both the transmit and receive array. The mode-stirred cavity is operated: i) at a low number of stirrer positions to create statistical inhomogeneity; ii) at two different loading conditions, empty and with absorbers, in order to mimic a wide range of realistic equipment level enclosures. Our results show that two parallel channels are obtained within the confined space at both the operating conditions. The statistical characterization of the wireless channel is presented in terms of coherence bandwidth, path loss, delay spread and Rician factor, and wideband channel capacity. It is found that the severe multipath fading supported by a highly reflecting environment creates unbalance between the two Multiple-Input Multiple-Output channels, even in presence of substantial losses. Furthermore, the channel capacity has a multi-modal distribution whose average and variance scale monotonically with the number of absorbers. Results are of interest in IoT devices, including wireless chip-to-chip and device-to-device communications, operating in highly reflective environments.

preprint2020arXiv

Physics-informed brain MRI segmentation

Magnetic Resonance Imaging (MRI) is one of the most flexible and powerful medical imaging modalities. This flexibility does however come at a cost; MRI images acquired at different sites and with different parameters exhibit significant differences in contrast and tissue appearance, resulting in downstream issues when quantifying brain anatomy or the presence of pathology. In this work, we propose to combine multiparametric MRI-based static-equation sequence simulations with segmentation convolutional neural networks (CNN), to make these networks robust to variations in acquisition parameters. Results demonstrate that, when given both the image and their associated physics acquisition parameters, CNNs can produce segmentations that exhibit robustness to acquisition variations. We also show that the proposed physics-informed methods can be used to bridge multi-centre and longitudinal imaging studies where imaging acquisition varies across a site or in time.

preprint2015arXiv

A Phase-Space Approach for Propagating Field-Field Correlation Functions

We show that radiation from complex and inherently random but correlated wave sources can be modelled efficiently by using an approach based on the Wigner distribution function. Our method exploits the connection between correlation functions and theWigner function and admits in its simplest approximation a direct representation in terms of the evolution of ray densities in phase space. We show that next leading order corrections to the ray-tracing approximation lead to Airy-function type phase space propagators. By exploiting the exact Wigner function propagator, inherently wave-like effects such as evanescent decay or radiation from more heterogeneous sources as well as diffraction and reflections can be included and analysed. We discuss in particular the role of evanescent waves in the near-field of non-paraxial sources and give explicit expressions for the growth rate of the correlation length as function of the distance from the source. Furthermore, results for the reflection of partially coherent sources from flat mirrors are given. We focus here on electromagnetic sources at microwave frequencies and modelling efforts in the context of electromagnetic compatibility.

preprint2014arXiv

A Domain Specific Approach to Heterogeneous Computing: From Availability to Accessibility

We advocate a domain specific software development methodology for heterogeneous computing platforms such as Multicore CPUs, GPUs and FPGAs. We argue that three specific benefits are realised from adopting such an approach: portable, efficient implementations across heterogeneous platforms; domain specific metrics of quality that characterise platforms in a form software developers will understand; automatic, optimal partitioning across the available computing resources. These three benefits allow a development methodology for software developers where they describe their computational problems in a single, easy to understand form, and after a modeling procedure on the available resources, select how they would like to trade between various domain specific metrics. Our work on the Forward Financial Framework ($F^3$) demonstrates this methodology in practise. We are able to execute a range of computational finance option pricing tasks efficiently upon a wide range of CPU, GPU and FPGA computing platforms. We can also create accurate financial domain metric models of walltime latency and statistical confidence. Furthermore, we believe that we can support automatic, optimal partitioning using this execution and modelling capability.

preprint2014arXiv

An Automatic Mixed Software Hardware Pipeline Builder for CPU-FPGA Platforms

Our toolchain for accelerating application called Courier-FPGA, is designed for utilize the processing power of CPU-FPGA platforms for software programmers and non-expert users. It automatically gathers runtime information of library functions from a running target binary, and constructs the function call graph including input-output data. Then, it uses corresponding predefined hardware modules if these are ready for FPGA and prepares software functions on CPU by using Pipeline Generator. The Pipeline Generator builds a pipeline control program by using Intel Threading Building Block to run both hardware modules and software functions in parallel. Finally, Courier-FPGA dynamically replaces the original functions in the binary and accelerates it by using the built pipeline. Courier-FPGA performs these acceleration processes without user intervention, source code tweaks or re-compilations of the binary. We describe the technical details of this mixed software hardware pipeline on CPU-FPGA platforms in this paper. In our case study, Courier-FPGA was used to accelerate a corner detection using the Harris-Stephens method application binary on the Zynq platform. A series of functions were off-loaded, and speed up 15.36 times was achieved by using the built pipeline.

David Thomas

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Priming: Hybrid State Space Models From Pre-trained Transformers

Near-field Image Transmission and EVM Measurements in Rich Scattering Environment

Statistical Characterization of Wireless MIMO Channels in Mode-Stirred Enclosures

Physics-informed brain MRI segmentation

A Phase-Space Approach for Propagating Field-Field Correlation Functions

A Domain Specific Approach to Heterogeneous Computing: From Availability to Accessibility

An Automatic Mixed Software Hardware Pipeline Builder for CPU-FPGA Platforms