Source author record

Wei D. Lu

Wei D. Lu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Neural and Evolutionary Computing Emerging Technologies Hardware Architecture Machine Learning Neurons and Cognition Computation and Language eess.IV Quantitative Methods

Catalog footprint

What is connected

6works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

ADEPT: Adaptive Dynamic Early-Exit Process for Transformers

The inference of large language models imposes significant computational workloads, often requiring the processing of billions of parameters. Although early-exit strategies have proven effective in reducing computational demands by halting inference earlier, they apply either to only the first token in the generation phase or at the prompt level in the prefill phase. Thus, the Key-Value (KV) cache for skipped layers remains a bottleneck for subsequent token generation, limiting the benefits of early exit. We introduce ADEPT (Adaptive Dynamic Early-exit Process for Transformers), a novel approach designed to overcome this issue and enable dynamic early exit in both the prefill and generation phases. The proposed adaptive token-level early-exit mechanism adjusts computation dynamically based on token complexity, optimizing efficiency without compromising performance. ADEPT further enhances KV generation procedure by decoupling sequential dependencies in skipped layers, making token-level early exit more practical. Experimental results demonstrate that ADEPT improves efficiency by up to 25% in language generation tasks and achieves a 4x speed-up in downstream classification tasks, with up to a 45% improvement in performance.

preprint2022arXiv

Design Space Exploration of Dense and Sparse Mapping Schemes for RRAM Architectures

The impact of device and circuit-level effects in mixed-signal Resistive Random Access Memory (RRAM) accelerators typically manifest as performance degradation of Deep Learning (DL) algorithms, but the degree of impact varies based on algorithmic features. These include network architecture, capacity, weight distribution, and the type of inter-layer connections. Techniques are continuously emerging to efficiently train sparse neural networks, which may have activation sparsity, quantization, and memristive noise. In this paper, we present an extended Design Space Exploration (DSE) methodology to quantify the benefits and limitations of dense and sparse mapping schemes for a variety of network architectures. While sparsity of connectivity promotes less power consumption and is often optimized for extracting localized features, its performance on tiled RRAM arrays may be more susceptible to noise due to under-parameterization, when compared to dense mapping schemes. Moreover, we present a case study quantifying and formalizing the trade-offs of typical non-idealities introduced into 1-Transistor-1-Resistor (1T1R) tiled memristive architectures and the size of modular crossbar tiles using the CIFAR-10 dataset.

preprint2022arXiv

Gradient-based Neuromorphic Learning on Dynamical RRAM Arrays

We present MEMprop, the adoption of gradient-based learning to train fully memristive spiking neural networks (MSNNs). Our approach harnesses intrinsic device dynamics to trigger naturally arising voltage spikes. These spikes emitted by memristive dynamics are analog in nature, and thus fully differentiable, which eliminates the need for surrogate gradient methods that are prevalent in the spiking neural network (SNN) literature. Memristive neural networks typically either integrate memristors as synapses that map offline-trained networks, or otherwise rely on associative learning mechanisms to train networks of memristive neurons. We instead apply the backpropagation through time (BPTT) training algorithm directly on analog SPICE models of memristive neurons and synapses. Our implementation is fully memristive, in that synaptic weights and spiking neurons are both integrated on resistive RAM (RRAM) arrays without the need for additional circuits to implement spiking dynamics, e.g., analog-to-digital converters (ADCs) or thresholded comparators. As a result, higher-order electrophysical effects are fully exploited to use the state-driven dynamics of memristive neurons at run time. By moving towards non-approximate gradient-based learning, we obtain highly competitive accuracy amongst previously reported lightweight dense fully MSNNs on several benchmarks.

preprint2022arXiv

Navigating Local Minima in Quantized Spiking Neural Networks

Spiking and Quantized Neural Networks (NNs) are becoming exceedingly important for hyper-efficient implementations of Deep Learning (DL) algorithms. However, these networks face challenges when trained using error backpropagation, due to the absence of gradient signals when applying hard thresholds. The broadly accepted trick to overcoming this is through the use of biased gradient estimators: surrogate gradients which approximate thresholding in Spiking Neural Networks (SNNs), and Straight-Through Estimators (STEs), which completely bypass thresholding in Quantized Neural Networks (QNNs). While noisy gradient feedback has enabled reasonable performance on simple supervised learning tasks, it is thought that such noise increases the difficulty of finding optima in loss landscapes, especially during the later stages of optimization. By periodically boosting the Learning Rate (LR) during training, we expect the network can navigate unexplored solution spaces that would otherwise be difficult to reach due to local minima, barriers, or flat surfaces. This paper presents a systematic evaluation of a cosine-annealed LR schedule coupled with weight-independent adaptive moment estimation as applied to Quantized SNNs (QSNNs). We provide a rigorous empirical evaluation of this technique on high precision and 4-bit quantized SNNs across three datasets, demonstrating (close to) state-of-the-art performance on the more complex datasets. Our source code is available at this link: https://github.com/jeshraghian/QSNNs.

preprint2022arXiv

The fine line between dead neurons and sparsity in binarized spiking neural networks

Spiking neural networks can compensate for quantization error by encoding information either in the temporal domain, or by processing discretized quantities in hidden states of higher precision. In theory, a wide dynamic range state-space enables multiple binarized inputs to be accumulated together, thus improving the representational capacity of individual neurons. This may be achieved by increasing the firing threshold, but make it too high and sparse spike activity turns into no spike emission. In this paper, we propose the use of `threshold annealing' as a warm-up method for firing thresholds. We show it enables the propagation of spikes across multiple layers where neurons would otherwise cease to fire, and in doing so, achieve highly competitive results on four diverse datasets, despite using binarized weights. Source code is available at https://github.com/jeshraghian/snn-tha/

preprint2019arXiv

A Real-Time Retinomorphic Simulator Using a Conductance-Based Discrete Neuronal Network

We present an optimized conductance-based retina microcircuit simulator which transforms light stimuli into a series of graded and spiking action potentials through photo transduction. We use discrete retinal neuron blocks based on a collation of single-compartment models and morphologically realistic formulations, and successfully achieve a biologically real-time simulator. This is done by optimizing the numerical methods employed to solve the system of over 270 nonlinear ordinary differential equations and parameters. Our simulator includes some of the most recent advances in compartmental modeling to include five intrinsic ion currents of each cell whilst ensuring real-time performance, in attaining the ion-current and membrane responses of the photoreceptor rod and cone cells, the bipolar and amacrine cells, their laterally connected electrical and chemical synapses, and the output ganglion cell. It exhibits dynamical retinal behavior such as spike-frequency adaptation, rebound activation, fast-spiking, and subthreshold responsivity. Light stimuli incident at the photoreceptor rod and cone cells is modulated through the system of differential equations, enabling the user to probe the neuronal response at any point in the network. This is in contrast to many other retina encoding schemes which prefer to `black-box' the preceding stages to the spike train output. Our simulator is made available open source, with the hope that it will benefit neuroscientists and machine learning practitioners in better understanding the retina sub-circuitries, how retina cells optimize the representation of visual information, and in generating large datasets of biologically accurate graded and spiking responses.

Wei D. Lu

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

ADEPT: Adaptive Dynamic Early-Exit Process for Transformers

Design Space Exploration of Dense and Sparse Mapping Schemes for RRAM Architectures

Gradient-based Neuromorphic Learning on Dynamical RRAM Arrays

Navigating Local Minima in Quantized Spiking Neural Networks

The fine line between dead neurons and sparsity in binarized spiking neural networks

A Real-Time Retinomorphic Simulator Using a Conductance-Based Discrete Neuronal Network