Source author record

Khaled Nabil Salama

Khaled Nabil Salama appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.SP Hardware Architecture Machine Learning Neural and Evolutionary Computing physics.optics

Catalog footprint

What is connected

3works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Sparsity-Aware Streaming SNN Accelerator with Output-Channel Dataflow for Automatic Modulation Classification

The rapid advancement of wireless communication technologies, including 5G, emerging 6G networks, and the large-scale deployment of the Internet of Things (IoT), has intensified the need for efficient spectrum utilization. Automatic modulation classification (AMC) plays a vital role in cognitive radio systems by enabling real-time identification of modulation schemes for dynamic spectrum access and interference mitigation. While deep neural networks (DNNs) offer high classification accuracy, their computational and energy demands pose challenges for real-time edge deployment. Spiking neural networks (SNNs), with their event-driven nature, offer inherent energy efficiency, but achieving both high throughput and low power under constrained hardware resources remains challenging. This work proposes a sparsity-aware SNN streaming accelerator optimized for AMC tasks. Unlike traditional systolic arrays that exploit sparsity but suffer from low throughput, or streaming architectures that achieve high throughput but cannot fully utilize input and weight sparsity, our design integrates both advantages. By leveraging the fixed nature of kernels during inference, we apply the gated one-to-all product (GOAP) algorithm to compute only on non-zero input-weight intersections. Extra or empty iterations are precomputed and embedded into the inference dataflow, eliminating dynamic data fetches and enabling fully pipelined, control-free inter-layer execution. Implemented on an FPGA, our sparsity-aware output-channel dataflow streaming (SAOCDS) accelerator achieves 23.5 MS/s (approximately double the baseline throughput) on the RadioML 2016 dataset, while reducing dynamic power and maintaining comparable classification accuracy. These results demonstrate strong potential for real-time, low-power deployment in edge cognitive radio systems.

preprint2021arXiv

Efficient Training of Spiking Neural Networks with Temporally-Truncated Local Backpropagation through Time

Directly training spiking neural networks (SNNs) has remained challenging due to complex neural dynamics and intrinsic non-differentiability in firing functions. The well-known backpropagation through time (BPTT) algorithm proposed to train SNNs suffers from large memory footprint and prohibits backward and update unlocking, making it impossible to exploit the potential of locally-supervised training methods. This work proposes an efficient and direct training algorithm for SNNs that integrates a locally-supervised training method with a temporally-truncated BPTT algorithm. The proposed algorithm explores both temporal and spatial locality in BPTT and contributes to significant reduction in computational cost including GPU memory utilization, main memory access and arithmetic operations. We thoroughly explore the design space concerning temporal truncation length and local training block size and benchmark their impact on classification accuracy of different networks running different types of tasks. The results reveal that temporal truncation has a negative effect on the accuracy of classifying frame-based datasets, but leads to improvement in accuracy on dynamic-vision-sensor (DVS) recorded datasets. In spite of resulting information loss, local training is capable of alleviating overfitting. The combined effect of temporal truncation and local training can lead to the slowdown of accuracy drop and even improvement in accuracy. In addition, training deep SNNs models such as AlexNet classifying CIFAR10-DVS dataset leads to 7.26% increase in accuracy, 89.94% reduction in GPU memory, 10.79% reduction in memory access, and 99.64% reduction in MAC operations compared to the standard end-to-end BPTT.

preprint2016arXiv

Theory of diffusive light scattering cancellation cloaking

We report on a new concept of cloaking objects in diffusive light regime using the paradigm of the scattering cancellation and mantle cloaking techniques. We show numerically that an object can be made completely invisible to diffusive photon density waves, by tailoring the diffusivity constant of the spherical shell enclosing the object. This means that photons' flow outside the object and the cloak made of these spherical shells behaves as if the object were not present. Diffusive light invisibility may open new vistas in hiding hot spots in infrared thermography or tissue imaging.