Researcher profile

Mahesh Chandra

Mahesh Chandra contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2020arXiv

A Novel Method for Scalable VLSI Implementation of Hyperbolic Tangent Function

Hyperbolic tangent and Sigmoid functions are used as non-linear activation units in the artificial and deep neural networks. Since, these networks are computationally expensive, customized accelerators are designed for achieving the required performance at lower cost and power. The activation function and MAC units are the key building blocks of these neural networks. A low complexity and accurate hardware implementation of the activation function is required to meet the performance and area targets of such neural network accelerators. Moreover, a scalable implementation is required as the recent studies show that the DNNs may use different precision in different layers. This paper presents a novel method based on trigonometric expansion properties of the hyperbolic function for hardware implementation which can be easily tuned for different accuracy and precision requirements.

preprint2020arXiv

DRACO: Co-Optimizing Hardware Utilization, and Performance of DNNs on Systolic Accelerator

The number of processing elements (PEs) in a fixed-sized systolic accelerator is well matched for large and compute-bound DNNs; whereas, memory-bound DNNs suffer from PE underutilization and fail to achieve peak performance and energy efficiency. To mitigate this, specialized dataflow and/or micro-architectural techniques have been proposed. However, due to the longer development cycle and the rapid pace of evolution in the deep learning fields, these hardware-based solutions can be obsolete and ineffective in dealing with PE underutilization for state-of-the-art DNNs. In this work, we address the challenge of PE underutilization at the algorithm front and propose data reuse aware co-optimization (DRACO). This improves the PE utilization of memory-bound DNNs without any additional need for dataflow/micro-architecture modifications. Furthermore, unlike the previous co-optimization methods, DRACO not only maximizes performance and energy efficiency but also improves the predictive performance of DNNs. To the best of our knowledge, DRACO is the first work that resolves the resource underutilization challenge at the algorithm level and demonstrates a trade-off between computational efficiency, PE utilization, and predictive performance of DNN. Compared to the state-of-the-art row stationary dataflow, DRACO achieves 41.8% and 42.6% improvement in average PE utilization and inference latency (respectively) with negligible loss in predictive performance in MobileNetV1 on a $64\times64$ systolic array. DRACO provides seminal insights for utilization-aware DNN design methodologies that can fully leverage the computation power of systolic array-based hardware accelerators.

preprint2020arXiv

Hardware Implementation of Hyperbolic Tangent Function using Catmull-Rom Spline Interpolation

Deep neural networks yield the state of the art results in many computer vision and human machine interface tasks such as object recognition, speech recognition etc. Since, these networks are computationally expensive, customized accelerators are designed for achieving the required performance at lower cost and power. One of the key building blocks of these neural networks is non-linear activation function such as sigmoid, hyperbolic tangent (tanh), and ReLU. A low complexity accurate hardware implementation of the activation function is required to meet the performance and area targets of the neural network accelerators. This paper presents an implementation of tanh function using the Catmull-Rom spline interpolation. State of the art results are achieved using this method with comparatively smaller logic area.

preprint2020arXiv

On the Impact of Partial Sums on Interconnect Bandwidth and Memory Accesses in a DNN Accelerator

Dedicated accelerators are being designed to address the huge resource requirement of the deep neural network (DNN) applications. The power, performance and area (PPA) constraints limit the number of MACs available in these accelerators. The convolution layers which require huge number of MACs are often partitioned into multiple iterative sub-tasks. This puts huge pressure on the available system resources such as interconnect and memory bandwidth. The optimal partitioning of the feature maps for these sub-tasks can reduce the bandwidth requirement substantially. Some accelerators avoid off-chip or interconnect transfers by implementing local memories; however, the memory accesses are still performed and a reduced bandwidth can help in saving power in such architectures. In this paper, we propose a first order analytical method to partition the feature maps for optimal bandwidth and evaluate the impact of such partitioning on the bandwidth. This bandwidth can be saved by designing an active memory controller which can perform basic arithmetic operations. It is shown that the optimal partitioning and active memory controller can achieve up to 40% bandwidth reduction.

preprint2019arXiv

Na-ion diffusion and electrochemical performance of NaVO$_3$ anode in Li/Na batteries

We study Na ion diffusion and electrochemical performance of NaVO$_3$ (NVO) as anode material in Li/Na--ion batteries with the specific capacity of $\approx$350 mAhg$^{-1}$ at the current density 11~mAg$^{-1}$ after 300 cycles. Remarkably, the capacity retains $\ge$200~mAhg$^{-1}$ even after 400~cycles at 44~mAg$^{-1}$ with Coulombic efficiency $>$99\%. The deduced diffusion coefficient from galvanostatic intermittent titration technique (GITT), electrochemical impedance spectroscopy (EIS) and cyclic voltammetry (CV) measurements for NVO as anode in Li--ion battery is in the range of 10$^{-10}$--10$^{-12}$~cm$^2$s$^{-1}$. In case of Na-ion batteries, the NVO electrode exhibits initial capacity of 385~mAhg$^{-1}$ at 7~mAg$^{-1}$ current rate, but the capacity degradation is relatively faster in subsequent cycles. We find the diffusion coefficient of NVO--Na cells similar to that of NVO--Li. On the other hand, our charge discharge measurements suggest that the overall performance of NVO anode is better in Li--ion battery than Na-ion. Moreover, we use the density functional theory to simulate the energetics of Na vacancy formation in the bulk of the NVO structure, which is found to be 0.88~eV higher than that of the most stable (100) surface. Thus, the Na ion incorporation at the surface of the electrode material is more facile compared to the bulk.