Source author record

Arman Kazemi

Arman Kazemi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Emerging Technologies Hardware Architecture Machine Learning eess.IV eess.SP Neural and Evolutionary Computing

Catalog footprint

What is connected

5works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

COSIME: FeFET based Associative Memory for In-Memory Cosine Similarity Search

In a number of machine learning models, an input query is searched across the trained class vectors to find the closest feature class vector in cosine similarity metric. However, performing the cosine similarities between the vectors in Von-Neumann machines involves a large number of multiplications, Euclidean normalizations and division operations, thus incurring heavy hardware energy and latency overheads. Moreover, due to the memory wall problem that presents in the conventional architecture, frequent cosine similarity-based searches (CSSs) over the class vectors requires a lot of data movements, limiting the throughput and efficiency of the system. To overcome the aforementioned challenges, this paper introduces COSIME, an general in-memory associative memory (AM) engine based on the ferroelectric FET (FeFET) device for efficient CSS. By leveraging the one-transistor AND gate function of FeFET devices, current-based translinear analog circuit and winner-take-all (WTA) circuitry, COSIME can realize parallel in-memory CSS across all the entries in a memory block, and output the closest word to the input query in cosine similarity metric. Evaluation results at the array level suggest that the proposed COSIME design achieves 333X and 90.5X latency and energy improvements, respectively, and realizes better classification accuracy when compared with an AM design implementing approximated CSS. The proposed in-memory computing fabric is evaluated for an HDC problem, showcasing that COSIME can achieve on average 47.1X and 98.5X speedup and energy efficiency improvements compared with an GPU implementation.

preprint2022arXiv

Experimentally realized memristive memory augmented neural network

Lifelong on-device learning is a key challenge for machine intelligence, and this requires learning from few, often single, samples. Memory augmented neural network has been proposed to achieve the goal, but the memory module has to be stored in an off-chip memory due to its size. Therefore the practical use has been heavily limited. Previous works on emerging memory-based implementation have difficulties in scaling up because different modules with various structures are difficult to integrate on the same chip and the small sense margin of the content addressable memory for the memory module heavily limited the degree of mismatch calculation. In this work, we implement the entire memory augmented neural network architecture in a fully integrated memristive crossbar platform and achieve an accuracy that closely matches standard software on digital hardware for the Omniglot dataset. The successful demonstration is supported by implementing new functions in crossbars in addition to widely reported matrix multiplications. For example, the locality-sensitive hashing operation is implemented in crossbar arrays by exploiting the intrinsic stochasticity of memristor devices. Besides, the content-addressable memory module is realized in crossbars, which also supports the degree of mismatches. Simulations based on experimentally validated models show such an implementation can be efficiently scaled up for one-shot learning on the Mini-ImageNet dataset. The successful demonstration paves the way for practical on-device lifelong learning and opens possibilities for novel attention-based algorithms not possible in conventional hardware.

preprint2020arXiv

A Device Non-Ideality Resilient Approach for Mapping Neural Networks to Crossbar Arrays

We propose a technology-independent method, referred to as adjacent connection matrix (ACM), to efficiently map signed weight matrices to non-negative crossbar arrays. When compared to same-hardware-overhead mapping methods, using ACM leads to improvements of up to 20% in training accuracy for ResNet-20 with the CIFAR-10 dataset when training with 5-bit precision crossbar arrays or lower. When compared with strategies that use two elements to represent a weight, ACM achieves comparable training accuracies, while also offering area and read energy reductions of 2.3x and 7x, respectively. ACM also has a mild regularization effect that improves inference accuracy in crossbar arrays without any retraining or costly device/variation-aware training.

preprint2020arXiv

A Hybrid FeMFET-CMOS Analog Synapse Circuit for Neural Network Training and Inference

An analog synapse circuit based on ferroelectric-metal field-effect transistors is proposed, that offers 6-bit weight precision. The circuit is comprised of volatile least significant bits (LSBs) used solely during training, and non-volatile most significant bits (MSBs) used for both training and inference. The design works at a 1.8V logic-compatible voltage, provides 10^10 endurance cycles, and requires only 250ps update pulses. A variant of LeNet trained with the proposed synapse achieves 98.2% accuracy on MNIST, which is only 0.4% lower than an ideal implementation of the same network with the same bit precision. Furthermore, the proposed synapse offers improvements of up to 26% in area, 44.8% in leakage power, 16.7% in LSB update pulse duration, and two orders of magnitude in endurance cycles, when compared to state-of-the-art hybrid synaptic circuits. Our proposed synapse can be extended to an 8-bit design, enabling a VGG-like network to achieve 88.8% accuracy on CIFAR-10 (only 0.8% lower than an ideal implementation of the same network).

preprint2020arXiv

Performance Analysis of Semi-supervised Learning in the Small-data Regime using VAEs

Extracting large amounts of data from biological samples is not feasible due to radiation issues, and image processing in the small-data regime is one of the critical challenges when working with a limited amount of data. In this work, we applied an existing algorithm named Variational Auto Encoder (VAE) that pre-trains a latent space representation of the data to capture the features in a lower-dimension for the small-data regime input. The fine-tuned latent space provides constant weights that are useful for classification. Here we will present the performance analysis of the VAE algorithm with different latent space sizes in the semi-supervised learning using the CIFAR-10 dataset.

Arman Kazemi

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

COSIME: FeFET based Associative Memory for In-Memory Cosine Similarity Search

Experimentally realized memristive memory augmented neural network

A Device Non-Ideality Resilient Approach for Mapping Neural Networks to Crossbar Arrays

A Hybrid FeMFET-CMOS Analog Synapse Circuit for Neural Network Training and Inference

Performance Analysis of Semi-supervised Learning in the Small-data Regime using VAEs