Researcher profile

Khaled Nabil Salama

Khaled Nabil Salama contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2026arXiv

Sparsity-Aware Streaming SNN Accelerator with Output-Channel Dataflow for Automatic Modulation Classification

The rapid advancement of wireless communication technologies, including 5G, emerging 6G networks, and the large-scale deployment of the Internet of Things (IoT), has intensified the need for efficient spectrum utilization. Automatic modulation classification (AMC) plays a vital role in cognitive radio systems by enabling real-time identification of modulation schemes for dynamic spectrum access and interference mitigation. While deep neural networks (DNNs) offer high classification accuracy, their computational and energy demands pose challenges for real-time edge deployment. Spiking neural networks (SNNs), with their event-driven nature, offer inherent energy efficiency, but achieving both high throughput and low power under constrained hardware resources remains challenging. This work proposes a sparsity-aware SNN streaming accelerator optimized for AMC tasks. Unlike traditional systolic arrays that exploit sparsity but suffer from low throughput, or streaming architectures that achieve high throughput but cannot fully utilize input and weight sparsity, our design integrates both advantages. By leveraging the fixed nature of kernels during inference, we apply the gated one-to-all product (GOAP) algorithm to compute only on non-zero input-weight intersections. Extra or empty iterations are precomputed and embedded into the inference dataflow, eliminating dynamic data fetches and enabling fully pipelined, control-free inter-layer execution. Implemented on an FPGA, our sparsity-aware output-channel dataflow streaming (SAOCDS) accelerator achieves 23.5 MS/s (approximately double the baseline throughput) on the RadioML 2016 dataset, while reducing dynamic power and maintaining comparable classification accuracy. These results demonstrate strong potential for real-time, low-power deployment in edge cognitive radio systems.

preprint2021arXiv

Efficient Training of Spiking Neural Networks with Temporally-Truncated Local Backpropagation through Time

Directly training spiking neural networks (SNNs) has remained challenging due to complex neural dynamics and intrinsic non-differentiability in firing functions. The well-known backpropagation through time (BPTT) algorithm proposed to train SNNs suffers from large memory footprint and prohibits backward and update unlocking, making it impossible to exploit the potential of locally-supervised training methods. This work proposes an efficient and direct training algorithm for SNNs that integrates a locally-supervised training method with a temporally-truncated BPTT algorithm. The proposed algorithm explores both temporal and spatial locality in BPTT and contributes to significant reduction in computational cost including GPU memory utilization, main memory access and arithmetic operations. We thoroughly explore the design space concerning temporal truncation length and local training block size and benchmark their impact on classification accuracy of different networks running different types of tasks. The results reveal that temporal truncation has a negative effect on the accuracy of classifying frame-based datasets, but leads to improvement in accuracy on dynamic-vision-sensor (DVS) recorded datasets. In spite of resulting information loss, local training is capable of alleviating overfitting. The combined effect of temporal truncation and local training can lead to the slowdown of accuracy drop and even improvement in accuracy. In addition, training deep SNNs models such as AlexNet classifying CIFAR10-DVS dataset leads to 7.26% increase in accuracy, 89.94% reduction in GPU memory, 10.79% reduction in memory access, and 99.64% reduction in MAC operations compared to the standard end-to-end BPTT.

preprint2016arXiv

Theory of diffusive light scattering cancellation cloaking

We report on a new concept of cloaking objects in diffusive light regime using the paradigm of the scattering cancellation and mantle cloaking techniques. We show numerically that an object can be made completely invisible to diffusive photon density waves, by tailoring the diffusivity constant of the spherical shell enclosing the object. This means that photons' flow outside the object and the cloak made of these spherical shells behaves as if the object were not present. Diffusive light invisibility may open new vistas in hiding hot spots in infrared thermography or tissue imaging.