Source author record

Mostafa Rahimi Azghadi

Mostafa Rahimi Azghadi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Emerging Technologies Computer Vision Machine Learning Neural and Evolutionary Computing Artificial Intelligence eess.IV eess.SP Hardware Architecture Human-Computer Interaction Networking and Internet Architecture Other Computer Science

Catalog footprint

What is connected

15works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

AFFormer: Adaptive Feature Fusion Transformer for V2X Cooperative Perception under Channel Impairments

Accurate 3D object detection is essential for ensuring the safety of autonomous vehicles. Cooperative perception, which leverages vehicle-to-everything (V2X) communication to share perceptual data, enhances detection but is vulnerable to channel impairments, such as noise, fading, and interference. To strengthen the reliability of intelligent transportation systems, this work improves the robustness of V2X cooperative perception under communication conditions that reflect common channel impairments. This paper proposes an Adaptive Feature Fusion Transformer (AFFormer), a Transformer-based framework that mitigates the adverse effects of corrupted features by modeling temporal, inter-agent, and spatial correlations. AFFormer introduces three key modules: Multi-Agent and Temporal Aggregation for context-aware fusion across agents and over time, Dual Spatial Attention for efficient modeling of spatial dependencies, and Uncertainty-Guided Fusion for entropy-driven refinement of fused features. A teacher-student knowledge distillation strategy further enhances robustness by aligning fused features with reliable early-collaboration supervision. AFFormer is validated on the V2XSet and DAIR-V2X datasets, where it consistently outperforms existing methods under both ideal and impaired communication conditions, demonstrating improved robustness to communication-induced feature degradation while maintaining a competitive efficiency-accuracy trade-off.

preprint2022arXiv

A lightweight Transformer-based model for fish landmark detection

Transformer-based models, such as the Vision Transformer (ViT), can outperform onvolutional Neural Networks (CNNs) in some vision tasks when there is sufficient training data. However, (CNNs) have a strong and useful inductive bias for vision tasks (i.e. translation equivariance and locality). In this work, we developed a novel model architecture that we call a Mobile fish landmark detection network (MFLD-net). We have made this model using convolution operations based on ViT (i.e. Patch embeddings, Multi-Layer Perceptrons). MFLD-net can achieve competitive or better results in low data regimes while being lightweight and therefore suitable for embedded and mobile devices. Furthermore, we show that MFLD-net can achieve keypoint (landmark) estimation accuracies on-par or even better than some of the state-of-the-art (CNNs) on a fish image dataset. Additionally, unlike ViT, MFLD-net does not need a pre-trained model and can generalise well when trained on a small dataset. We provide quantitative and qualitative results that demonstrate the model's generalisation capabilities. This work will provide a foundation for future efforts in developing mobile, but efficient fish monitoring systems and devices.

preprint2022arXiv

Computer Vision and Deep Learning for Fish Classification in Underwater Habitats: A Survey

Marine scientists use remote underwater video recording to survey fish species in their natural habitats. This helps them understand and predict how fish respond to climate change, habitat degradation, and fishing pressure. This information is essential for developing sustainable fisheries for human consumption, and for preserving the environment. However, the enormous volume of collected videos makes extracting useful information a daunting and time-consuming task for a human. A promising method to address this problem is the cutting-edge Deep Learning (DL) technology.DL can help marine scientists parse large volumes of video promptly and efficiently, unlocking niche information that cannot be obtained using conventional manual monitoring methods. In this paper, we provide an overview of the key concepts of DL, while presenting a survey of literature on fish habitat monitoring with a focus on underwater fish classification. We also discuss the main challenges faced when developing DL for underwater image processing and propose approaches to address them. Finally, we provide insights into the marine habitat monitoring research domain and shed light on what the future of DL for underwater image processing may hold. This paper aims to inform a wide range of readers from marine scientists who would like to apply DL in their research to computer scientists who would like to survey state-of-the-art DL-based underwater fish habitat monitoring literature.

preprint2022arXiv

Design Space Exploration of Dense and Sparse Mapping Schemes for RRAM Architectures

The impact of device and circuit-level effects in mixed-signal Resistive Random Access Memory (RRAM) accelerators typically manifest as performance degradation of Deep Learning (DL) algorithms, but the degree of impact varies based on algorithmic features. These include network architecture, capacity, weight distribution, and the type of inter-layer connections. Techniques are continuously emerging to efficiently train sparse neural networks, which may have activation sparsity, quantization, and memristive noise. In this paper, we present an extended Design Space Exploration (DSE) methodology to quantify the benefits and limitations of dense and sparse mapping schemes for a variety of network architectures. While sparsity of connectivity promotes less power consumption and is often optimized for extracting localized features, its performance on tiled RRAM arrays may be more susceptible to noise due to under-parameterization, when compared to dense mapping schemes. Moreover, we present a case study quantifying and formalizing the trade-offs of typical non-idealities introduced into 1-Transistor-1-Resistor (1T1R) tiled memristive architectures and the size of modular crossbar tiles using the CIFAR-10 dataset.

preprint2022arXiv

Navigating Local Minima in Quantized Spiking Neural Networks

Spiking and Quantized Neural Networks (NNs) are becoming exceedingly important for hyper-efficient implementations of Deep Learning (DL) algorithms. However, these networks face challenges when trained using error backpropagation, due to the absence of gradient signals when applying hard thresholds. The broadly accepted trick to overcoming this is through the use of biased gradient estimators: surrogate gradients which approximate thresholding in Spiking Neural Networks (SNNs), and Straight-Through Estimators (STEs), which completely bypass thresholding in Quantized Neural Networks (QNNs). While noisy gradient feedback has enabled reasonable performance on simple supervised learning tasks, it is thought that such noise increases the difficulty of finding optima in loss landscapes, especially during the later stages of optimization. By periodically boosting the Learning Rate (LR) during training, we expect the network can navigate unexplored solution spaces that would otherwise be difficult to reach due to local minima, barriers, or flat surfaces. This paper presents a systematic evaluation of a cosine-annealed LR schedule coupled with weight-independent adaptive moment estimation as applied to Quantized SNNs (QSNNs). We provide a rigorous empirical evaluation of this technique on high precision and 4-bit quantized SNNs across three datasets, demonstrating (close to) state-of-the-art performance on the more complex datasets. Our source code is available at this link: https://github.com/jeshraghian/QSNNs.

preprint2021arXiv

Internet of Underwater Things and Big Marine Data Analytics -- A Comprehensive Survey

The Internet of Underwater Things (IoUT) is an emerging communication ecosystem developed for connecting underwater objects in maritime and underwater environments. The IoUT technology is intricately linked with intelligent boats and ships, smart shores and oceans, automatic marine transportations, positioning and navigation, underwater exploration, disaster prediction and prevention, as well as with intelligent monitoring and security. The IoUT has an influence at various scales ranging from a small scientific observatory, to a midsized harbor, and to covering global oceanic trade. The network architecture of IoUT is intrinsically heterogeneous and should be sufficiently resilient to operate in harsh environments. This creates major challenges in terms of underwater communications, whilst relying on limited energy resources. Additionally, the volume, velocity, and variety of data produced by sensors, hydrophones, and cameras in IoUT is enormous, giving rise to the concept of Big Marine Data (BMD), which has its own processing challenges. Hence, conventional data processing techniques will falter, and bespoke Machine Learning (ML) solutions have to be employed for automatically learning the specific BMD behavior and features facilitating knowledge extraction and decision support. The motivation of this paper is to comprehensively survey the IoUT, BMD, and their synthesis. It also aims for exploring the nexus of BMD with ML. We set out from underwater data collection and then discuss the family of IoUT data communication techniques with an emphasis on the state-of-the-art research challenges. We then review the suite of ML solutions suitable for BMD handling and analytics. We treat the subject deductively from an educational perspective, critically appraising the material surveyed.

preprint2021arXiv

Towards Memristive Deep Learning Systems for Real-time Mobile Epileptic Seizure Prediction

The unpredictability of seizures continues to distress many people with drug-resistant epilepsy. On account of recent technological advances, considerable efforts have been made using different hardware technologies to realize smart devices for the real-time detection and prediction of seizures. In this paper, we investigate the feasibility of using Memristive Deep Learning Systems (MDLSs) to perform real-time epileptic seizure prediction on the edge. Using the MemTorch simulation framework and the Children's Hospital Boston (CHB)-Massachusetts Institute of Technology (MIT) dataset we determine the performance of various simulated MDLS configurations. An average sensitivity of 77.4% and a Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.85 are reported for the optimal configuration that can process Electroencephalogram (EEG) spectrograms with 7,680 samples in 1.408ms while consuming 0.0133W and occupying an area of 0.1269mm$^2$ in a 65nm Complementary Metal-Oxide-Semiconductor (CMOS) process.

preprint2020arXiv

Training Progressively Binarizing Deep Networks Using FPGAs

While hardware implementations of inference routines for Binarized Neural Networks (BNNs) are plentiful, current realizations of efficient BNN hardware training accelerators, suitable for Internet of Things (IoT) edge devices, leave much to be desired. Conventional BNN hardware training accelerators perform forward and backward propagations with parameters adopting binary representations, and optimization using parameters adopting floating or fixed-point real-valued representations--requiring two distinct sets of network parameters. In this paper, we propose a hardware-friendly training method that, contrary to conventional methods, progressively binarizes a singular set of fixed-point network parameters, yielding notable reductions in power and resource utilizations. We use the Intel FPGA SDK for OpenCL development environment to train our progressively binarizing DNNs on an OpenVINO FPGA. We benchmark our training approach on both GPUs and FPGAs using CIFAR-10 and compare it to conventional BNNs.

preprint2019arXiv

Accelerating Deterministic and Stochastic Binarized Neural Networks on FPGAs Using OpenCL

Recent technological advances have proliferated the available computing power, memory, and speed of modern Central Processing Units (CPUs), Graphics Processing Units (GPUs), and Field Programmable Gate Arrays (FPGAs). Consequently, the performance and complexity of Artificial Neural Networks (ANNs) is burgeoning. While GPU accelerated Deep Neural Networks (DNNs) currently offer state-of-the-art performance, they consume large amounts of power. Training such networks on CPUs is inefficient, as data throughput and parallel computation is limited. FPGAs are considered a suitable candidate for performance critical, low power systems, e.g. the Internet of Things (IOT) edge devices. Using the Xilinx SDAccel or Intel FPGA SDK for OpenCL development environment, networks described using the high-level OpenCL framework can be accelerated on heterogeneous platforms. Moreover, the resource utilization and power consumption of DNNs can be further enhanced by utilizing regularization techniques that binarize network weights. In this paper, we introduce, to the best of our knowledge, the first FPGA-accelerated stochastically binarized DNN implementations, and compare them to implementations accelerated using both GPUs and FPGAs. Our developed networks are trained and benchmarked using the popular MNIST and CIFAR-10 datasets, and achieve near state-of-the-art performance, while offering a >16-fold improvement in power consumption, compared to conventional GPU-accelerated networks. Both our FPGA-accelerated determinsitic and stochastic BNNs reduce inference times on MNIST and CIFAR-10 by >9.89x and >9.91x, respectively.

preprint2019arXiv

Variation-aware Binarized Memristive Networks

The quantization of weights to binary states in Deep Neural Networks (DNNs) can replace resource-hungry multiply accumulate operations with simple accumulations. Such Binarized Neural Networks (BNNs) exhibit greatly reduced resource and power requirements. In addition, memristors have been shown as promising synaptic weight elements in DNNs. In this paper, we propose and simulate novel Binarized Memristive Convolutional Neural Network (BMCNN) architectures employing hybrid weight and parameter representations. We train the proposed architectures offline and then map the trained parameters to our binarized memristive devices for inference. To take into account the variations in memristive devices, and to study their effect on the performance, we introduce variations in $R_{ON}$ and $R_{OFF}$. Moreover, we introduce means to mitigate the adverse effect of memristive variations in our proposed networks. Finally, we benchmark our BMCNNs and variation-aware BMCNNs using the MNIST dataset.

preprint2013arXiv

A Neuromorphic VLSI Design for Spike Timing and Rate Based Synaptic Plasticity

Triplet-based Spike Timing Dependent Plasticity (TSTDP) is a powerful synaptic plasticity rule that acts beyond conventional pair-based STDP (PSTDP). Here, the TSTDP is capable of reproducing the outcomes from a variety of biological experiments, while the PSTDP rule fails to reproduce them. Additionally, it has been shown that the behaviour inherent to the spike rate-based Bienenstock-Cooper-Munro (BCM) synaptic plasticity rule can also emerge from the TSTDP rule. This paper proposes an analog implementation of the TSTDP rule. The proposed VLSI circuit has been designed using the AMS 0.35 um CMOS process and has been simulated using design kits for Synopsys and Cadence tools. Simulation results demonstrate how well the proposed circuit can alter synaptic weights according to the timing difference amongst a set of different patterns of spikes. Furthermore, the circuit is shown to give rise to a BCM-like learning rule, which is a rate-based rule. To mimic implementation environment, a 1000 run Monte Carlo (MC) analysis was conducted on the proposed circuit. The presented MC simulation analysis and the simulation result from fine-tuned circuits show that, it is possible to mitigate the effect of process variations in the proof of concept circuit, however, a practical variation aware design technique is required to promise a high circuit performance in a large scale neural network. We believe that the proposed design can play a significant role in future VLSI implementations of both spike timing and rate based neuromorphic learning systems.

preprint2012arXiv

A Novel Design for Quantum-dot Cellular Automata Cells and Full Adders

Quantum dot Cellular Automata (QCA) is a novel and potentially attractive technology for implementing computing architectures at the nanoscale. The basic Boolean primitive in QCA is the majority gate. In this paper we present a novel design for QCA cells and another possible and unconventional scheme for majority gates. By applying these items, the hardware requirements for a QCA design can be reduced and circuits can be simpler in level and gate counts. As an example, a 1-bit QCA adder is constructed by applying our new scheme and is compared to the other existing implementation. Beside, some Boolean functions are expressed as examples and it has been shown, how our reduction method by using new proposed item, decreases gate counts and levels in comparison to the other previous methods.

preprint2012arXiv

A Novel Design for Quantum-dot Cellular Automata Cells and Full Adders

Erroneous submission in violation of copyright removed by arXiv admin.

preprint2012arXiv

Design and Implementation of BCM Rule Based on Spike-Timing Dependent Plasticity

The Bienenstock-Cooper-Munro (BCM) and Spike Timing-Dependent Plasticity (STDP) rules are two experimentally verified form of synaptic plasticity where the alteration of synaptic weight depends upon the rate and the timing of pre- and post-synaptic firing of action potentials, respectively. Previous studies have reported that under specific conditions, i.e. when a random train of Poissonian distributed spikes are used as inputs, and weight changes occur according to STDP, it has been shown that the BCM rule is an emergent property. Here, the applied STDP rule can be either classical pair-based STDP rule, or the more powerful triplet-based STDP rule. In this paper, we demonstrate the use of two distinct VLSI circuit implementations of STDP to examine whether BCM learning is an emergent property of STDP. These circuits are stimulated with random Poissonian spike trains. The first circuit implements the classical pair-based STDP, while the second circuit realizes a previously described triplet-based STDP rule. These two circuits are simulated using 0.35 um CMOS standard model in HSpice simulator. Simulation results demonstrate that the proposed triplet-based STDP circuit significantly produces the threshold-based behaviour of the BCM. Also, the results testify to similar behaviour for the VLSI circuit for pair-based STDP in generating the BCM.

preprint2012arXiv

Efficient Design of Triplet Based Spike-Timing Dependent Plasticity

Spike-Timing Dependent Plasticity (STDP) is believed to play an important role in learning and the formation of computational function in the brain. The classical model of STDP which considers the timing between pairs of pre-synaptic and post-synaptic spikes (p-STDP) is incapable of reproducing synaptic weight changes similar to those seen in biological experiments which investigate the effect of either higher order spike trains (e.g. triplet and quadruplet of spikes), or, simultaneous effect of the rate and timing of spike pairs on synaptic plasticity. In this paper, we firstly investigate synaptic weight changes using a p-STDP circuit and show how it fails to reproduce the mentioned complex biological experiments. We then present a new STDP VLSI circuit which acts based on the timing among triplets of spikes (t-STDP) that is able to reproduce all the mentioned experimental results. We believe that our new STDP VLSI circuit improves upon previous circuits, whose learning capacity exceeds current designs due to its capability of mimicking the outcomes of biological experiments more closely; thus plays a significant role in future VLSI implementation of neuromorphic systems.

Mostafa Rahimi Azghadi

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

AFFormer: Adaptive Feature Fusion Transformer for V2X Cooperative Perception under Channel Impairments

A lightweight Transformer-based model for fish landmark detection

Computer Vision and Deep Learning for Fish Classification in Underwater Habitats: A Survey

Design Space Exploration of Dense and Sparse Mapping Schemes for RRAM Architectures

Navigating Local Minima in Quantized Spiking Neural Networks

Internet of Underwater Things and Big Marine Data Analytics -- A Comprehensive Survey

Towards Memristive Deep Learning Systems for Real-time Mobile Epileptic Seizure Prediction

Training Progressively Binarizing Deep Networks Using FPGAs

Accelerating Deterministic and Stochastic Binarized Neural Networks on FPGAs Using OpenCL

Variation-aware Binarized Memristive Networks

A Neuromorphic VLSI Design for Spike Timing and Rate Based Synaptic Plasticity

A Novel Design for Quantum-dot Cellular Automata Cells and Full Adders

A Novel Design for Quantum-dot Cellular Automata Cells and Full Adders

Design and Implementation of BCM Rule Based on Spike-Timing Dependent Plasticity

Efficient Design of Triplet Based Spike-Timing Dependent Plasticity