Source author record

Carl Busart

Carl Busart appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision eess.IV Hardware Architecture Distributed, Parallel, and Cluster Computing eess.AS Multiagent Systems Robotics Sound

Catalog footprint

What is connected

6works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Asynchronous Local Computations in Distributed Bayesian Learning

Due to the expanding scope of machine learning (ML) to the fields of sensor networking, cooperative robotics and many other multi-agent systems, distributed deployment of inference algorithms has received a lot of attention. These algorithms involve collaboratively learning unknown parameters from dispersed data collected by multiple agents. There are two competing aspects in such algorithms, namely, intra-agent computation and inter-agent communication. Traditionally, algorithms are designed to perform both synchronously. However, certain circumstances need frugal use of communication channels as they are either unreliable, time-consuming, or resource-expensive. In this paper, we propose gossip-based asynchronous communication to leverage fast computations and reduce communication overhead simultaneously. We analyze the effects of multiple (local) intra-agent computations by the active agents between successive inter-agent communications. For local computations, Bayesian sampling via unadjusted Langevin algorithm (ULA) MCMC is utilized. The communication is assumed to be over a connected graph (e.g., as in decentralized learning), however, the results can be extended to coordinated communication where there is a central server (e.g., federated learning). We theoretically quantify the convergence rates in the process. To demonstrate the efficacy of the proposed algorithm, we present simulations on a toy problem as well as on real world data sets to train ML models to perform classification tasks. We observe faster initial convergence and improved performance accuracy, especially in the low data range. We achieve on average 78% and over 90% classification accuracy respectively on the Gamma Telescope and mHealth data sets from the UCI ML repository.

preprint2024arXiv

PAHD: Perception-Action based Human Decision Making using Explainable Graph Neural Networks on SAR Images

Synthetic Aperture Radar (SAR) images are commonly utilized in military applications for automatic target recognition (ATR). Machine learning (ML) methods, such as Convolutional Neural Networks (CNN) and Graph Neural Networks (GNN), are frequently used to identify ground-based objects, including battle tanks, personnel carriers, and missile launchers. Determining the vehicle class, such as the BRDM2 tank, BMP2 tank, BTR60 tank, and BTR70 tank, is crucial, as it can help determine whether the target object is an ally or an enemy. While the ML algorithm provides feedback on the recognized target, the final decision is left to the commanding officers. Therefore, providing detailed information alongside the identified target can significantly impact their actions. This detailed information includes the SAR image features that contributed to the classification, the classification confidence, and the probability of the identified object being classified as a different object type or class. We propose a GNN-based ATR framework that provides the final classified class and outputs the detailed information mentioned above. This is the first study to provide a detailed analysis of the classification class, making final decisions more straightforward. Moreover, our GNN framework achieves an overall accuracy of 99.2\% when evaluated on the MSTAR dataset, improving over previous state-of-the-art GNN methods.

preprint2023arXiv

Accurate, Low-latency, Efficient SAR Automatic Target Recognition on FPGA

Synthetic aperture radar (SAR) automatic target recognition (ATR) is the key technique for remote-sensing image recognition. The state-of-the-art convolutional neural networks (CNNs) for SAR ATR suffer from \emph{high computation cost} and \emph{large memory footprint}, making them unsuitable to be deployed on resource-limited platforms, such as small/micro satellites. In this paper, we propose a comprehensive GNN-based model-architecture {co-design} on FPGA to address the above issues. \emph{Model design}: we design a novel graph neural network (GNN) for SAR ATR. The proposed GNN model incorporates GraphSAGE layer operators and attention mechanism, achieving comparable accuracy as the state-of-the-art work with near $1/100$ computation cost. Then, we propose a pruning approach including weight pruning and input pruning. While weight pruning through lasso regression reduces most parameters without accuracy drop, input pruning eliminates most input pixels with negligible accuracy drop. \emph{Architecture design}: to fully unleash the computation parallelism within the proposed model, we develop a novel unified hardware architecture that can execute various computation kernels (feature aggregation, feature transformation, graph pooling). The proposed hardware design adopts the Scatter-Gather paradigm to efficiently handle the irregular computation {patterns} of various computation kernels. We deploy the proposed design on an embedded FPGA (AMD Xilinx ZCU104) and evaluate the performance using MSTAR dataset. Compared with the state-of-the-art CNNs, the proposed GNN achieves comparable accuracy with $1/3258$ computation cost and $1/83$ model size. Compared with the state-of-the-art CPU/GPU, our FPGA accelerator achieves $14.8\times$/$2.5\times$ speedup (latency) and is $62\times$/$39\times$ more energy efficient.

preprint2022arXiv

Model-Architecture Co-Design for High Performance Temporal GNN Inference on FPGA

Temporal Graph Neural Networks (TGNNs) are powerful models to capture temporal, structural, and contextual information on temporal graphs. The generated temporal node embeddings outperform other methods in many downstream tasks. Real-world applications require high performance inference on real-time streaming dynamic graphs. However, these models usually rely on complex attention mechanisms to capture relationships between temporal neighbors. In addition, maintaining vertex memory suffers from intrinsic temporal data dependency that hinders task-level parallelism, making it inefficient on general-purpose processors. In this work, we present a novel model-architecture co-design for inference in memory-based TGNNs on FPGAs. The key modeling optimizations we propose include a light-weight method to compute attention scores and a related temporal neighbor pruning strategy to further reduce computation and memory accesses. These are holistically coupled with key hardware optimizations that leverage FPGA hardware. We replace the temporal sampler with an on-chip FIFO based hardware sampler and the time encoder with a look-up-table. We train our simplified models using knowledge distillation to ensure similar accuracy vis-á-vis the original model. Taking advantage of the model optimizations, we propose a principled hardware architecture using batching, pipelining, and prefetching techniques to further improve the performance. We also propose a hardware mechanism to ensure the chronological vertex updating without sacrificing the computation parallelism. We evaluate the performance of the proposed hardware accelerator on three real-world datasets.

preprint2022arXiv

SynchroSim: An Integrated Co-simulation Middleware for Heterogeneous Multi-robot System

With the advancement of modern robotics, autonomous agents are now capable of hosting sophisticated algorithms, which enables them to make intelligent decisions. But developing and testing such algorithms directly in real-world systems is tedious and may result in the wastage of valuable resources. Especially for heterogeneous multi-agent systems in battlefield environments where communication is critical in determining the system's behavior and usability. Due to the necessity of simulators of separate paradigms (co-simulation) to simulate such scenarios before deploying, synchronization between those simulators is vital. Existing works aimed at resolving this issue fall short of addressing diversity among deployed agents. In this work, we propose \textit{SynchroSim}, an integrated co-simulation middleware to simulate a heterogeneous multi-robot system. Here we propose a velocity difference-driven adjustable window size approach with a view to reducing packet loss probability. It takes into account the respective velocities of deployed agents to calculate a suitable window size before transmitting data between them. We consider our algorithm-specific simulator agnostic but for the sake of implementation results, we have used Gazebo as a Physics simulator and NS-3 as a network simulator. Also, we design our algorithm considering the Perception-Action loop inside a closed communication channel, which is one of the essential factors in a contested scenario with the requirement of high fidelity in terms of data transmission. We validate our approach empirically at both the simulation and system level for both line-of-sight (LOS) and non-line-of-sight (NLOS) scenarios. Our approach achieves a noticeable improvement in terms of reducing packet loss probability ($\approx$11\%), and average packet delay ($\approx$10\%) compared to the fixed window size-based synchronization approach.

preprint2022arXiv

TinyM$^2$Net: A Flexible System Algorithm Co-designed Multimodal Learning Framework for Tiny Devices

With the emergence of Artificial Intelligence (AI), new attention has been given to implement AI algorithms on resource constrained tiny devices to expand the application domain of IoT. Multimodal Learning has recently become very popular with the classification task due to its impressive performance for both image and audio event classification. This paper presents TinyM$^2$Net -- a flexible system algorithm co-designed multimodal learning framework for resource constrained tiny devices. The framework was designed to be evaluated on two different case-studies: COVID-19 detection from multimodal audio recordings and battle field object detection from multimodal images and audios. In order to compress the model to implement on tiny devices, substantial network architecture optimization and mixed precision quantization were performed (mixed 8-bit and 4-bit). TinyM$^2$Net shows that even a tiny multimodal learning model can improve the classification performance than that of any unimodal frameworks. The most compressed TinyM$^2$Net achieves 88.4% COVID-19 detection accuracy (14.5% improvement from unimodal base model) and 96.8% battle field object detection accuracy (3.9% improvement from unimodal base model). Finally, we test our TinyM$^2$Net models on a Raspberry Pi 4 to see how they perform when deployed to a resource constrained tiny device.

Carl Busart

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Asynchronous Local Computations in Distributed Bayesian Learning

PAHD: Perception-Action based Human Decision Making using Explainable Graph Neural Networks on SAR Images

Accurate, Low-latency, Efficient SAR Automatic Target Recognition on FPGA

Model-Architecture Co-Design for High Performance Temporal GNN Inference on FPGA

SynchroSim: An Integrated Co-simulation Middleware for Heterogeneous Multi-robot System

TinyM$^2$Net: A Flexible System Algorithm Co-designed Multimodal Learning Framework for Tiny Devices