Source author record

Mahesh Chandra

Mahesh Chandra appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Hardware Architecture Machine Learning eess.SP Computer Vision physics.app-ph Sound

Catalog footprint

What is connected

6works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

A Novel Method for Scalable VLSI Implementation of Hyperbolic Tangent Function

Hyperbolic tangent and Sigmoid functions are used as non-linear activation units in the artificial and deep neural networks. Since, these networks are computationally expensive, customized accelerators are designed for achieving the required performance at lower cost and power. The activation function and MAC units are the key building blocks of these neural networks. A low complexity and accurate hardware implementation of the activation function is required to meet the performance and area targets of such neural network accelerators. Moreover, a scalable implementation is required as the recent studies show that the DNNs may use different precision in different layers. This paper presents a novel method based on trigonometric expansion properties of the hyperbolic function for hardware implementation which can be easily tuned for different accuracy and precision requirements.

preprint2020arXiv

DRACO: Co-Optimizing Hardware Utilization, and Performance of DNNs on Systolic Accelerator

The number of processing elements (PEs) in a fixed-sized systolic accelerator is well matched for large and compute-bound DNNs; whereas, memory-bound DNNs suffer from PE underutilization and fail to achieve peak performance and energy efficiency. To mitigate this, specialized dataflow and/or micro-architectural techniques have been proposed. However, due to the longer development cycle and the rapid pace of evolution in the deep learning fields, these hardware-based solutions can be obsolete and ineffective in dealing with PE underutilization for state-of-the-art DNNs. In this work, we address the challenge of PE underutilization at the algorithm front and propose data reuse aware co-optimization (DRACO). This improves the PE utilization of memory-bound DNNs without any additional need for dataflow/micro-architecture modifications. Furthermore, unlike the previous co-optimization methods, DRACO not only maximizes performance and energy efficiency but also improves the predictive performance of DNNs. To the best of our knowledge, DRACO is the first work that resolves the resource underutilization challenge at the algorithm level and demonstrates a trade-off between computational efficiency, PE utilization, and predictive performance of DNN. Compared to the state-of-the-art row stationary dataflow, DRACO achieves 41.8% and 42.6% improvement in average PE utilization and inference latency (respectively) with negligible loss in predictive performance in MobileNetV1 on a $64\times64$ systolic array. DRACO provides seminal insights for utilization-aware DNN design methodologies that can fully leverage the computation power of systolic array-based hardware accelerators.

preprint2020arXiv

Hardware Implementation of Hyperbolic Tangent Function using Catmull-Rom Spline Interpolation

Deep neural networks yield the state of the art results in many computer vision and human machine interface tasks such as object recognition, speech recognition etc. Since, these networks are computationally expensive, customized accelerators are designed for achieving the required performance at lower cost and power. One of the key building blocks of these neural networks is non-linear activation function such as sigmoid, hyperbolic tangent (tanh), and ReLU. A low complexity accurate hardware implementation of the activation function is required to meet the performance and area targets of the neural network accelerators. This paper presents an implementation of tanh function using the Catmull-Rom spline interpolation. State of the art results are achieved using this method with comparatively smaller logic area.

preprint2020arXiv

On the Impact of Partial Sums on Interconnect Bandwidth and Memory Accesses in a DNN Accelerator

Dedicated accelerators are being designed to address the huge resource requirement of the deep neural network (DNN) applications. The power, performance and area (PPA) constraints limit the number of MACs available in these accelerators. The convolution layers which require huge number of MACs are often partitioned into multiple iterative sub-tasks. This puts huge pressure on the available system resources such as interconnect and memory bandwidth. The optimal partitioning of the feature maps for these sub-tasks can reduce the bandwidth requirement substantially. Some accelerators avoid off-chip or interconnect transfers by implementing local memories; however, the memory accesses are still performed and a reduced bandwidth can help in saving power in such architectures. In this paper, we propose a first order analytical method to partition the feature maps for optimal bandwidth and evaluate the impact of such partitioning on the bandwidth. This bandwidth can be saved by designing an active memory controller which can perform basic arithmetic operations. It is shown that the optimal partitioning and active memory controller can achieve up to 40% bandwidth reduction.

preprint2019arXiv

Na-ion diffusion and electrochemical performance of NaVO$_3$ anode in Li/Na batteries

We study Na ion diffusion and electrochemical performance of NaVO$_3$ (NVO) as anode material in Li/Na--ion batteries with the specific capacity of $\approx$350 mAhg$^{-1}$ at the current density 11~mAg$^{-1}$ after 300 cycles. Remarkably, the capacity retains $\ge$200~mAhg$^{-1}$ even after 400~cycles at 44~mAg$^{-1}$ with Coulombic efficiency $>$99\%. The deduced diffusion coefficient from galvanostatic intermittent titration technique (GITT), electrochemical impedance spectroscopy (EIS) and cyclic voltammetry (CV) measurements for NVO as anode in Li--ion battery is in the range of 10$^{-10}$--10$^{-12}$~cm$^2$s$^{-1}$. In case of Na-ion batteries, the NVO electrode exhibits initial capacity of 385~mAhg$^{-1}$ at 7~mAg$^{-1}$ current rate, but the capacity degradation is relatively faster in subsequent cycles. We find the diffusion coefficient of NVO--Na cells similar to that of NVO--Li. On the other hand, our charge discharge measurements suggest that the overall performance of NVO anode is better in Li--ion battery than Na-ion. Moreover, we use the density functional theory to simulate the energetics of Na vacancy formation in the bulk of the NVO structure, which is found to be 0.88~eV higher than that of the most stable (100) surface. Thus, the Na ion incorporation at the surface of the electrode material is more facile compared to the bulk.

preprint2010arXiv

Spoken Language Identification Using Hybrid Feature Extraction Methods

This paper introduces and motivates the use of hybrid robust feature extraction technique for spoken language identification (LID) system. The speech recognizers use a parametric form of a signal to get the most important distinguishable features of speech signal for recognition task. In this paper Mel-frequency cepstral coefficients (MFCC), Perceptual linear prediction coefficients (PLP) along with two hybrid features are used for language Identification. Two hybrid features, Bark Frequency Cepstral Coefficients (BFCC) and Revised Perceptual Linear Prediction Coefficients (RPLP) were obtained from combination of MFCC and PLP. Two different classifiers, Vector Quantization (VQ) with Dynamic Time Warping (DTW) and Gaussian Mixture Model (GMM) were used for classification. The experiment shows better identification rate using hybrid feature extraction techniques compared to conventional feature extraction methods.BFCC has shown better performance than MFCC with both classifiers. RPLP along with GMM has shown best identification performance among all feature extraction techniques.

Mahesh Chandra

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

A Novel Method for Scalable VLSI Implementation of Hyperbolic Tangent Function

DRACO: Co-Optimizing Hardware Utilization, and Performance of DNNs on Systolic Accelerator

Hardware Implementation of Hyperbolic Tangent Function using Catmull-Rom Spline Interpolation

On the Impact of Partial Sums on Interconnect Bandwidth and Memory Accesses in a DNN Accelerator

Na-ion diffusion and electrochemical performance of NaVO$_3$ anode in Li/Na batteries

Spoken Language Identification Using Hybrid Feature Extraction Methods