Researcher profile

Ramakrishna Upadrasta

Ramakrishna Upadrasta contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2022arXiv

FUSED-PAGERANK: Loop-Fusion based Approximate PageRank

PageRank is a graph centrality metric that gives the importance of each node in a given graph. The PageRank algorithm provides important insights to understand the behavior of nodes through the connections they form with other nodes. It is an iterative algorithm that ranks the nodes in each iteration until all the node values converge. The PageRank algorithm is implemented using sparse storage format, which results in irregular memory accesses in the code. This key feature inhibits optimizations to improve its performance, and makes optimizing the PageRank algorithm a non-trivial problem. In this work we improve the performance of PageRank algorithm by reducing its irregular memory accesses. In this paper, we propose FUSED-PAGERANK algorithm, a compiler optimization oriented approximate technique that reduces the number of irregular memory accesses in the PageRank algorithm, improving its locality while making the convergence of the algorithm faster with better accuracy in results. In particular, we propose an approximate PageRank algorithm using Loop-Fusion. We believe that ours is the first work that formally applies traditional compiler optimization techniques for irregular memory access in the PageRank algorithm. We have verified our method by performing experiments on a variety of datasets: LAW graphs, SNAP datasets and synthesized datasets. On these benchmarks, we have achieved a maximum speedup (vs. -O3 optimization) of 2.05X, 2.23X, 1.74X with sequential version, and ~4.4X, ~2.61X, ~4.22X with parallel version of FUSED-PAGERANK algorithm in comparison with Edge-centric version of PageRank algorithm.

preprint2022arXiv

POSET-RL: Phase ordering for Optimizing Size and Execution Time using Reinforcement Learning

The ever increasing memory requirements of several applications has led to increased demands which might not be met by embedded devices. Constraining the usage of memory in such cases is of paramount importance. It is important that such code size improvements should not have a negative impact on the runtime. Improving the execution time while optimizing for code size is a non-trivial but a significant task. The ordering of standard optimization sequences in modern compilers is fixed, and are heuristically created by the compiler domain experts based on their expertise. However, this ordering is sub-optimal, and does not generalize well across all the cases. We present a reinforcement learning based solution to the phase ordering problem, where the ordering improves both the execution time and code size. We propose two different approaches to model the sequences: one by manual ordering, and other based on a graph called Oz Dependence Graph (ODG). Our approach uses minimal data as training set, and is integrated with LLVM. We show results on x86 and AArch64 architectures on the benchmarks from SPEC-CPU 2006, SPEC-CPU 2017 and MiBench. We observe that the proposed model based on ODG outperforms the current Oz sequence both in terms of size and execution time by 6.19% and 11.99% in SPEC 2017 benchmarks, on an average.

preprint2020arXiv

LLOV: A Fast Static Data-Race Checker for OpenMP Programs

In the era of Exascale computing, writing efficient parallel programs is indispensable and at the same time, writing sound parallel programs is very difficult. Specifying parallelism with frameworks such as OpenMP is relatively easy, but data races in these programs are an important source of bugs. In this paper, we propose LLOV, a fast, lightweight, language agnostic, and static data race checker for OpenMP programs based on the LLVM compiler framework. We compare LLOV with other state-of-the-art data race checkers on a variety of well-established benchmarks. We show that the precision, accuracy, and the F1 score of LLOV is comparable to other checkers while being orders of magnitude faster. To the best of our knowledge, LLOV is the only tool among the state-of-the-art data race checkers that can verify a C/C++ or FORTRAN program to be data race free.

preprint2020arXiv

PolyScientist: Automatic Loop Transformations Combined with Microkernels for Optimization of Deep Learning Primitives

At the heart of deep learning training and inferencing are computationally intensive primitives such as convolutions which form the building blocks of deep neural networks. Researchers have taken two distinct approaches to creating high performance implementations of deep learning kernels, namely, 1) library development exemplified by Intel MKL-DNN for CPUs, 2) automatic compilation represented by the TensorFlow XLA compiler. The two approaches have their drawbacks: even though a custom built library can deliver very good performance, the cost and time of development of the library can be high. Automatic compilation of kernels is attractive but in practice, till date, automatically generated implementations lag expert coded kernels in performance by orders of magnitude. In this paper, we develop a hybrid solution to the development of deep learning kernels that achieves the best of both worlds: the expert coded microkernels are utilized for the innermost loops of kernels and we use the advanced polyhedral technology to automatically tune the outer loops for performance. We design a novel polyhedral model based data reuse algorithm to optimize the outer loops of the kernel. Through experimental evaluation on an important class of deep learning primitives namely convolutions, we demonstrate that the approach we develop attains the same levels of performance as Intel MKL-DNN, a hand coded deep learning library.