Source author record

Venkatesan T. Chakaravarthy

Venkatesan T. Chakaravarthy appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Data Structures and Algorithms Computation and Language Computational Complexity Databases Discrete Mathematics Machine Learning

Catalog footprint

What is connected

8works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

On Optimizing Distributed Tucker Decomposition for Sparse Tensors

The Tucker decomposition generalizes the notion of Singular Value Decomposition (SVD) to tensors, the higher dimensional analogues of matrices. We study the problem of constructing the Tucker decomposition of sparse tensors on distributed memory systems via the HOOI procedure, a popular iterative method. The scheme used for distributing the input tensor among the processors (MPI ranks) critically influences the HOOI execution time. Prior work has proposed different distribution schemes: an offline scheme based on sophisticated hypergraph partitioning method and simple, lightweight alternatives that can be used real-time. While the hypergraph based scheme typically results in faster HOOI execution time, being complex, the time taken for determining the distribution is an order of magnitude higher than the execution time of a single HOOI iteration. Our main contribution is a lightweight distribution scheme, which achieves the best of both worlds. We show that the scheme is near-optimal on certain fundamental metrics associated with the HOOI procedure and as a result, near-optimal on the computational load (FLOPs). Though the scheme may incur higher communication volume, the computation time is the dominant factor and as the result, the scheme achieves better performance on the overall HOOI execution time. Our experimental evaluation on large real-life tensors (having up to 4 billion elements) shows that the scheme outperforms the prior schemes on the HOOI execution time by a factor of up to 3x. On the other hand, its distribution time is comparable to the prior lightweight schemes and is typically lesser than the execution time of a single HOOI iteration.

preprint2020arXiv

PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination

We develop a novel method, called PoWER-BERT, for improving the inference time of the popular BERT model, while maintaining the accuracy. It works by: a) exploiting redundancy pertaining to word-vectors (intermediate encoder outputs) and eliminating the redundant vectors. b) determining which word-vectors to eliminate by developing a strategy for measuring their significance, based on the self-attention mechanism. c) learning how many word-vectors to eliminate by augmenting the BERT model and the loss function. Experiments on the standard GLUE benchmark shows that PoWER-BERT achieves up to 4.5x reduction in inference time over BERT with <1% loss in accuracy. We show that PoWER-BERT offers significantly better trade-off between accuracy and inference time compared to prior methods. We demonstrate that our method attains up to 6.8x reduction in inference time with <1% loss in accuracy when applied over ALBERT, a highly compressed version of BERT. The code for PoWER-BERT is publicly available at https://github.com/IBM/PoWER-BERT.

preprint2016arXiv

Subgraph Counting: Color Coding Beyond Trees

The problem of counting occurrences of query graphs in a large data graph, known as subgraph counting, is fundamental to several domains such as genomics and social network analysis. Many important special cases (e.g. triangle counting) have received significant attention. Color coding is a very general and powerful algorithmic technique for subgraph counting. Color coding has been shown to be effective in several applications, but scalable implementations are only known for the special case of {\em tree queries} (i.e. queries of treewidth one). In this paper we present the first efficient distributed implementation for color coding that goes beyond tree queries: our algorithm applies to any query graph of treewidth $2$. Since tree queries can be solved in time linear in the size of the data graph, our contribution is the first step into the realm of colour coding for queries that require superlinear running time in the worst case. This superlinear complexity leads to significant load balancing problems on graphs with heavy tailed degree distributions. Our algorithm structures the computation to work around high degree nodes in the data graph, and achieves very good runtime and scalability on a diverse collection of data and query graph pairs as a result. We also provide theoretical analysis of our algorithmic techniques, showing asymptotic improvements in runtime on random graphs with power law degree distributions, a popular model for real world graphs.

preprint2013arXiv

Distributed and Parallel Algorithms for Set Cover Problems with Small Neighborhood Covers

In this paper, we study a class of set cover problems that satisfy a special property which we call the {\em small neighborhood cover} property. This class encompasses several well-studied problems including vertex cover, interval cover, bag interval cover and tree cover. We design unified distributed and parallel algorithms that can handle any set cover problem falling under the above framework and yield constant factor approximations. These algorithms run in polylogarithmic communication rounds in the distributed setting and are in NC, in the parallel setting.

preprint2012arXiv

Density Functions subject to a Co-Matroid Constraint

In this paper we consider the problem of finding the {\em densest} subset subject to {\em co-matroid constraints}. We are given a {\em monotone supermodular} set function $f$ defined over a universe $U$, and the density of a subset $S$ is defined to be $f(S)/\crd{S}$. This generalizes the concept of graph density. Co-matroid constraints are the following: given matroid $\calM$ a set $S$ is feasible, iff the complement of $S$ is {\em independent} in the matroid. Under such constraints, the problem becomes $\np$-hard. The specific case of graph density has been considered in literature under specific co-matroid constraints, for example, the cardinality matroid and the partition matroid. We show a 2-approximation for finding the densest subset subject to co-matroid constraints. Thus, for instance, we improve the approximation guarantees for the result for partition matroids in the literature.

preprint2012arXiv

Distributed Algorithms for Scheduling on Line and Tree Networks

We have a set of processors (or agents) and a set of graph networks defined over some vertex set. Each processor can access a subset of the graph networks. Each processor has a demand specified as a pair of vertices $<u, v>$, along with a profit; the processor wishes to send data between $u$ and $v$. Towards that goal, the processor needs to select a graph network accessible to it and a path connecting $u$ and $v$ within the selected network. The processor requires exclusive access to the chosen path, in order to route the data. Thus, the processors are competing for routes/channels. A feasible solution selects a subset of demands and schedules each selected demand on a graph network accessible to the processor owning the demand; the solution also specifies the paths to use for this purpose. The requirement is that for any two demands scheduled on the same graph network, their chosen paths must be edge disjoint. The goal is to output a solution having the maximum aggregate profit. Prior work has addressed the above problem in a distibuted setting for the special case where all the graph networks are simply paths (i.e, line-networks). Distributed constant factor approximation algorithms are known for this case. The main contributions of this paper are twofold. First we design a distributed constant factor approximation algorithm for the more general case of tree-networks. The core component of our algorithm is a tree-decomposition technique, which may be of independent interest. Secondly, for the case of line-networks, we improve the known approximation guarantees by a factor of 5. Our algorithms can also handle the capacitated scenario, wherein the demands and edges have bandwidth requirements and capacities, respectively.

preprint2012arXiv

Mapping Strategies for the PERCS Architecture

The PERCS system was designed by IBM in response to a DARPA challenge that called for a high-productivity high-performance computing system. The IBM PERCS architecture is a two level direct network having low diameter and high bisection bandwidth. Mapping and routing strategies play an important role in the performance of applications on such a topology. In this paper, we study mapping strategies for PERCS architecture, that examine how to map tasks of a given job on to the physical processing nodes. We develop and present fundamental principles for designing good mapping strategies that minimize congestion. This is achieved via a theoretical study of some common communication patterns under both direct and indirect routing mechanisms supported by the architecture.

preprint2010arXiv

On the Complexity of the $k$-Anonymization Problem

We study the problem of anonymizing tables containing personal information before releasing them for public use. One of the formulations considered in this context is the $k$-anonymization problem: given a table, suppress a minimum number of cells so that in the transformed table, each row is identical to atleast $k-1$ other rows. The problem is known to be NP-hard and MAXSNP-hard; but in the known reductions, the number of columns in the constructed tables is arbitrarily large. However, in practical settings the number of columns is much smaller. So, we study the complexity of the practical setting in which the number of columns $m$ is small. We show that the problem is NP-hard, even when the number of columns $m$ is a constant ($m=3$). We also prove MAXSNP-hardness for this restricted version and derive that the problem cannot be approximated within a factor of (6238/6237). Our reduction uses alphabets $Σ$ of arbitrarily large size. A natural question is whether the problem remains NP-hard when both $m$ and $|Σ|$ are small. We prove that the $k$-anonymization problem is in $P$ when both $m$ and $|Σ|$ are constants.

Venkatesan T. Chakaravarthy

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

On Optimizing Distributed Tucker Decomposition for Sparse Tensors

PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination

Subgraph Counting: Color Coding Beyond Trees

Distributed and Parallel Algorithms for Set Cover Problems with Small Neighborhood Covers

Density Functions subject to a Co-Matroid Constraint

Distributed Algorithms for Scheduling on Line and Tree Networks

Mapping Strategies for the PERCS Architecture

On the Complexity of the $k$-Anonymization Problem