Researcher profile

Francky Catthoor

Francky Catthoor contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2026arXiv

Architectural Classification of XR Workloads: Cross-Layer Archetypes and Implications

Edge and mobile platforms for augmented and virtual reality, collectively referred to as extended reality (XR) must deliver deterministic ultra-low-latency performance under stringent power and area constraints. However, the diversity of XR workloads is rapidly increasing, characterized by heterogeneous operator types and complex dataflow structures. This trend poses significant challenges to conventional accelerator architectures centered around convolutional neural networks (CNNs), resulting in diminishing returns for traditional compute-centric optimization strategies. Despite the importance of this problem, a systematic architectural understanding of the full XR pipeline remains lacking. In this paper, we present an architectural classification of XR workloads using a cross-layer methodology that integrates model-based high-level design space exploration (DSE) with empirical profiling on commercial GPU and CPU hardware. By analyzing a representative set of workloads spanning 12 distinct XR kernels, we distill their complex architectural characteristics into a small set of cross-layer workload archetypes (e.g., capacity-limited and overhead-sensitive). Building on these archetypes, we further extract key architectural insights and provide actionable design guidelines for next-generation XR SoCs. Our study highlights that XR architecture design must shift from generic resource scaling toward phase-aware scheduling and elastic resource allocation in order to achieve greater energy efficiency and high performance in future XR systems.

preprint2023arXiv

Investigating methods to improve photovoltaic thermal models at second-to-minute timescales

This paper presents a range of methods to improve the accuracy of equation-based thermal models of PV modules at second-to-minute timescales. We present an RC-equivalent conceptual model for PV modules, where wind effects are captured. We show how the thermal time constant $τ$ of PV modules can be determined from measured data, and subsequently used to make static thermal models dynamic by applying the Exponential Weighted Mean (EWM) approach to irradiance and wind signals. On average, $τ$ is $6.3 \pm 1~$min for fixed-mount PV systems. Based on this conceptual model, the Filter- EWM - Mean Bias Error correction (FEM) methodology is developed. We propose two thermal models, WM1 and WM2, and compare these against the models of Ross, Sandia, and Faiman on twenty-four datasets of fifteen sites, with time resolutions ranging from 1$~$s to 1$~$h, the majority of these at 1$~$min resolution. The FEM methodology is shown to reduce model errors (RMSE and MAE) on average for all sites and models versus the standard steady-state equivalent by -1.1$~$K and -0.75$~$K respectively.

preprint2023arXiv

Timescales: the choreography of classical and unconventional computing

Tasks that one wishes to have done by a computer often come with conditions that relate to timescales. For instance, the processing must terminate within a given time limit; or a signal processing computer must integrate input information across several timescales; or a robot motor controller must react to sensor signals with a short latency. For classical digital computing machines such requirements pose no fundamental problems as long as the physical clock rate is fast enough for the fastest relevant timescales. However, when digital microchips become scaled down into the few-nanometer range where quantum noise and device mismatch become unavoidable, or when the digital computing paradigm is altogether abandoned in analog neuromorphic systems or other unconventional hardware bases, it can become difficult to relate timescale conditions to the physical hardware dynamics. Here we explore the relations between task-defined timescale conditions and physical hardware timescales in some depth. The article has two main parts. In the first part we develop an abstract model of a generic computational system that admits a unified discussion of computational timescales. This model is general enough to cover digital computing systems as well as analog neuromorphic or other unconventional ones. We identify four major types of timescales which require separate considerations: causal physical timescales; timescales of phenomenal change which characterize the ``speed'' of how something changes in time; timescales of reactivity which describe how fast a computing system can react to incoming trigger information; and memory timescales. In the second part we survey twenty known computational mechanisms that can be used to obtain desired task-related timescale characteristics from the physical givens of the available hardware.

preprint2022arXiv

A New Look at Spike-Timing-Dependent Plasticity Networks for Spatio-Temporal Feature Learning

We present new theoretical foundations for unsupervised Spike-Timing-Dependent Plasticity (STDP) learning in spiking neural networks (SNNs). In contrast to empirical parameter search used in most previous works, we provide novel theoretical grounds for SNN and STDP parameter tuning which considerably reduces design time. Using our generic framework, we propose a class of global, action-based and convolutional SNN-STDP architectures for learning spatio-temporal features from event-based cameras. We assess our methods on the N-MNIST, the CIFAR10-DVS and the IBM DVS128 Gesture datasets, all acquired with a real-world event camera. Using our framework, we report significant improvements in classification accuracy compared to both conventional state-of-the-art event-based feature descriptors (+8.2% on CIFAR10-DVS), and compared to state-of-the-art STDP-based systems (+9.3% on N-MNIST, +7.74% on IBM DVS128 Gesture). Our work contributes to both ultra-low-power learning in neuromorphic edge devices, and towards a biologically-plausible, optimization-based theory of cortical vision.

preprint2022arXiv

Learning to SLAM on the Fly in Unknown Environments: A Continual Learning Approach for Drones in Visually Ambiguous Scenes

Learning to safely navigate in unknown environments is an important task for autonomous drones used in surveillance and rescue operations. In recent years, a number of learning-based Simultaneous Localisation and Mapping (SLAM) systems relying on deep neural networks (DNNs) have been proposed for applications where conventional feature descriptors do not perform well. However, such learning-based SLAM systems rely on DNN feature encoders trained offline in typical deep learning settings. This makes them less suited for drones deployed in environments unseen during training, where continual adaptation is paramount. In this paper, we present a new method for learning to SLAM on the fly in unknown environments, by modulating a low-complexity Dictionary Learning and Sparse Coding (DLSC) pipeline with a newly proposed Quadratic Bayesian Surprise (QBS) factor. We experimentally validate our approach with data collected by a drone in a challenging warehouse scenario, where the high number of ambiguous scenes makes visual disambiguation hard.

preprint2022arXiv

VWR2A: A Very-Wide-Register Reconfigurable-Array Architecture for Low-Power Embedded Devices

Edge-computing requires high-performance energy-efficient embedded systems. Fixed-function or custom accelerators, such as FFT or FIR filter engines, are very efficient at implementing a particular functionality for a given set of constraints. However, they are inflexible when facing application-wide optimizations or functionality upgrades. Conversely, programmable cores offer higher flexibility, but often with a penalty in area, performance, and, above all, energy consumption. In this paper, we propose VWR2A, an architecture that integrates high computational density and low power memory structures (i.e., very-wide registers and scratchpad memories). VWR2A narrows the energy gap with similar or better performance on FFT kernels with respect to an FFT accelerator. Moreover, VWR2A flexibility allows to accelerate multiple kernels, resulting in significant energy savings at the application level.

preprint2021arXiv

MemPool-3D: Boosting Performance and Efficiency of Shared-L1 Memory Many-Core Clusters with 3D Integration

Three-dimensional integrated circuits promise power, performance, and footprint gains compared to their 2D counterparts, thanks to drastic reductions in the interconnects' length through their smaller form factor. We can leverage the potential of 3D integration by enhancing MemPool, an open-source many-core design with 256 cores and a shared pool of L1 scratchpad memory connected with a low-latency interconnect. MemPool's baseline 2D design is severely limited by routing congestion and wire propagation delay, making the design ideal for 3D integration. In architectural terms, we increase MemPool's scratchpad memory capacity beyond the sweet spot for 2D designs, improving performance in a common digital signal processing kernel. We propose a 3D MemPool design that leverages a smart partitioning of the memory resources across two layers to balance the size and utilization of the stacked dies. In this paper, we explore the architectural and the technology parameter spaces by analyzing the power, performance, area, and energy efficiency of MemPool instances in 2D and 3D with 1 MiB, 2 MiB, 4 MiB, and 8 MiB of scratchpad memory in a commercial 28 nm technology node. We observe a performance gain of 9.1% when running a matrix multiplication on the MemPool-3D design with 4 MiB of scratchpad memory compared to the MemPool 2D counterpart. In terms of energy efficiency, we can implement the MemPool-3D instance with 4 MiB of L1 memory on an energy budget 15% smaller than its 2D counterpart, and even 3.7% smaller than the MemPool-2D instance with one-fourth of the L1 scratchpad memory capacity.

preprint2020arXiv

Enabling Resource-Aware Mapping of Spiking Neural Networks via Spatial Decomposition

With growing model complexity, mapping Spiking Neural Network (SNN)-based applications to tile-based neuromorphic hardware is becoming increasingly challenging. This is because the synaptic storage resources on a tile, viz. a crossbar, can accommodate only a fixed number of pre-synaptic connections per post-synaptic neuron. For complex SNN models that have many pre-synaptic connections per neuron, some connections may need to be pruned after training to fit onto the tile resources, leading to a loss in model quality, e.g., accuracy. In this work, we propose a novel unrolling technique that decomposes a neuron function with many pre-synaptic connections into a sequence of homogeneous neural units, where each neural unit is a function computation node, with two pre-synaptic connections. This spatial decomposition technique significantly improves crossbar utilization and retains all pre-synaptic connections, resulting in no loss of the model quality derived from connection pruning. We integrate the proposed technique within an existing SNN mapping framework and evaluate it using machine learning applications on the DYNAP-SE state-of-the-art neuromorphic hardware. Our results demonstrate an average 60% lower crossbar requirement, 9x higher synapse utilization, 62% lower wasted energy on the hardware, and between 0.8% and 4.6% increase in model quality.

preprint2020arXiv

PyCARL: A PyNN Interface for Hardware-Software Co-Simulation of Spiking Neural Network

We present PyCARL, a PyNN-based common Python programming interface for hardware-software co-simulation of spiking neural network (SNN). Through PyCARL, we make the following two key contributions. First, we provide an interface of PyNN to CARLsim, a computationally-efficient, GPU-accelerated and biophysically-detailed SNN simulator. PyCARL facilitates joint development of machine learning models and code sharing between CARLsim and PyNN users, promoting an integrated and larger neuromorphic community. Second, we integrate cycle-accurate models of state-of-the-art neuromorphic hardware such as TrueNorth, Loihi, and DynapSE in PyCARL, to accurately model hardware latencies that delay spikes between communicating neurons and degrade performance. PyCARL allows users to analyze and optimize the performance difference between software-only simulation and hardware-software co-simulation of their machine learning models. We show that system designers can also use PyCARL to perform design-space exploration early in the product development stage, facilitating faster time-to-deployment of neuromorphic products. We evaluate the memory usage and simulation time of PyCARL using functionality tests, synthetic SNNs, and realistic applications. Our results demonstrate that for large SNNs, PyCARL does not lead to any significant overhead compared to CARLsim. We also use PyCARL to analyze these SNNs for a state-of-the-art neuromorphic hardware and demonstrate a significant performance deviation from software-only simulations. PyCARL allows to evaluate and minimize such differences early during model development.

preprint2020arXiv

Run-time Mapping of Spiking Neural Networks to Neuromorphic Hardware

In this paper, we propose a design methodology to partition and map the neurons and synapses of online learning SNN-based applications to neuromorphic architectures at {run-time}. Our design methodology operates in two steps -- step 1 is a layer-wise greedy approach to partition SNNs into clusters of neurons and synapses incorporating the constraints of the neuromorphic architecture, and step 2 is a hill-climbing optimization algorithm that minimizes the total spikes communicated between clusters, improving energy consumption on the shared interconnect of the architecture. We conduct experiments to evaluate the feasibility of our algorithm using synthetic and realistic SNN-based applications. We demonstrate that our algorithm reduces SNN mapping time by an average 780x compared to a state-of-the-art design-time based SNN partitioning approach with only 6.25\% lower solution quality.