Source author record

Animesh Jain

Animesh Jain appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Computation and Language Distributed, Parallel, and Cluster Computing Performance physics.acc-ph physics.flu-dyn Programming Languages

Catalog footprint

What is connected

5works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A physical model for indirect noise in non-isentropic nozzles: Transfer functions and stability

We propose a mathematical model from physical principles to predict the sound generated in nozzles with dissipation. The focus is on the sound generated from the acceleration of temperature inhomogeneities (also known as entropy waves), which is referred to as indirect noise. First, we model the dissipation caused by flow recirculation and wall friction with a friction factor, which enables us to derive quasi-one-dimensional equations from conservation laws. The model is valid for both compact nozzles and nozzles with a spatial extent. Second, the predictions from the proposed model are compared against the experimental data available in the literature. Third, we compute the nozzle transfer functions for a range of Helmholtz numbers and friction factors. It is found that the friction and the Helmholtz number have a significant effect on the gain/phase of the reflected and transmitted waves. The analysis is performed from subsonic to supersonic regimes (with and without shock waves). The acoustic transfer functions vary significantly because of non-isentropic effects and the Helmholtz number, in particular, in the subsonic-choked regime. Finally, we calculate the effect that the friction of a nozzle guide vane has on thermoacoustic stability. It is found that the friction and the Helmholtz number can change thermoacoustic stability from a linearly stable regime to a linearly unstable regime. The study opens up new possibilities for the accurate prediction of indirect noise in realistic nozzles with implications on both noise emissions and thermoacoustic stability. nozzles with implications on both noise emissions and thermoacoustic stability.

preprint2022arXiv

Iterative Activation-based Structured Pruning

Deploying complex deep learning models on edge devices is challenging because they have substantial compute and memory resource requirements, whereas edge devices' resource budget is limited. To solve this problem, extensive pruning techniques have been proposed for compressing networks. Recent advances based on the Lottery Ticket Hypothesis (LTH) show that iterative model pruning tends to produce smaller and more accurate models. However, LTH research focuses on unstructured pruning, which is hardware-inefficient and difficult to accelerate on hardware platforms. In this paper, we investigate iterative pruning in the context of structured pruning because structurally pruned models map well on commodity hardware. We find that directly applying a structured weight-based pruning technique iteratively, called iterative L1-norm based pruning (ILP), does not produce accurate pruned models. To solve this problem, we propose two activation-based pruning methods, Iterative Activation-based Pruning (IAP) and Adaptive Iterative Activation-based Pruning (AIAP). We observe that, with only 1% accuracy loss, IAP and AIAP achieve 7.75X and 15.88$X compression on LeNet-5, and 1.25X and 1.71X compression on ResNet-50, whereas ILP achieves 4.77X and 1.13X, respectively.

preprint2020arXiv

Efficient Execution of Quantized Deep Learning Models: A Compiler Approach

A growing number of applications implement predictive functions using deep learning models, which require heavy use of compute and memory. One popular technique for increasing resource efficiency is 8-bit integer quantization, in which 32-bit floating point numbers (fp32) are represented using shorter 8-bit integer numbers. Although deep learning frameworks such as TensorFlow, TFLite, MXNet, and PyTorch enable developers to quantize models with only a small drop in accuracy, they are not well suited to execute quantized models on a variety of hardware platforms. For example, TFLite is optimized to run inference on ARM CPU edge devices but it does not have efficient support for Intel CPUs and Nvidia GPUs. In this paper, we address the challenges of executing quantized deep learning models on diverse hardware platforms by proposing an augmented compiler approach. A deep learning compiler such as Apache TVM can enable the efficient execution of model from various frameworks on various targets. Many deep learning compilers today, however, are designed primarily for fp32 computation and cannot optimize a pre-quantized INT8 model. To address this issue, we created a new dialect called Quantized Neural Network (QNN) that extends the compiler's internal representation with a quantization context. With this quantization context, the compiler can generate efficient code for pre-quantized models on various hardware platforms. As implemented in Apache TVM, we observe that the QNN-augmented deep learning compiler achieves speedups of 2.35x, 2.15x, 1.35x and 1.40x on Intel Xeon Cascade Lake CPUs, Nvidia Tesla T4 GPUs, ARM Raspberry Pi3 and Pi4 respectively against well optimized fp32 execution, and comparable performance to the state-of-the-art framework-specific solutions.

preprint2020arXiv

Optimizing Memory-Access Patterns for Deep Learning Accelerators

Deep learning (DL) workloads are moving towards accelerators for faster processing and lower cost. Modern DL accelerators are good at handling the large-scale multiply-accumulate operations that dominate DL workloads; however, it is challenging to make full use of the compute power of an accelerator since the data must be properly staged in a software-managed scratchpad memory. Failing to do so can result in significant performance loss. This paper proposes a systematic approach which leverages the polyhedral model to analyze all operators of a DL model together to minimize the number of memory accesses. Experiments show that our approach can substantially reduce the impact of memory accesses required by common neural-network models on a homegrown AWS machine-learning inference chip named Inferentia, which is available through Amazon EC2 Inf1 instances.

preprint2011arXiv

High-energy high-luminosity electron-ion collider eRHIC

In this paper, we describe a future electron-ion collider (EIC), based on the existing Relativistic Heavy Ion Collider (RHIC) hadron facility, with two intersecting superconducting rings, each 3.8 km in circumference. A new ERL accelerator, which provide 5-30 GeV electron beam, will ensure 10^33 to 10^34 cm^-2 s^-1 level luminosity.