Researcher profile

Bapi Chatterjee

Bapi Chatterjee contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

Scaling the Wild: Decentralizing Hogwild!-style Shared-memory SGD

Powered by the simplicity of lock-free asynchrony, Hogwilld! is a go-to approach to parallelize SGD over a shared-memory setting. Despite its popularity and concomitant extensions, such as PASSM+ wherein concurrent processes update a shared model with partitioned gradients, scaling it to decentralized workers has surprisingly been relatively unexplored. To our knowledge, there is no convergence theory of such methods, nor systematic numerical comparisons evaluating speed-up. In this paper, we propose an algorithm incorporating decentralized distributed memory computing architecture with each node running multiprocessing parallel shared-memory SGD itself. Our scheme is based on the following algorithmic tools and features: (a) asynchronous local gradient updates on the shared-memory of workers, (b) partial backpropagation, and (c) non-blocking in-place averaging of the local models. We prove that our method guarantees ergodic convergence rates for non-convex objectives. On the practical side, we show that the proposed method exhibits improved throughput and competitive accuracy for standard image classification benchmarks on the CIFAR-10, CIFAR-100, and Imagenet datasets. Our code is available at https://github.com/bapi/LPP-SGD.

preprint2020arXiv

Asynchronous Optimization Methods for Efficient Training of Deep Neural Networks with Guarantees

Asynchronous distributed algorithms are a popular way to reduce synchronization costs in large-scale optimization, and in particular for neural network training. However, for nonsmooth and nonconvex objectives, few convergence guarantees exist beyond cases where closed-form proximal operator solutions are available. As most popular contemporary deep neural networks lead to nonsmooth and nonconvex objectives, there is now a pressing need for such convergence guarantees. In this paper, we analyze for the first time the convergence of stochastic asynchronous optimization for this general class of objectives. In particular, we focus on stochastic subgradient methods allowing for block variable partitioning, where the shared-memory-based model is asynchronously updated by concurrent processes. To this end, we first introduce a probabilistic model which captures key features of real asynchronous scheduling between concurrent processes; under this model, we establish convergence with probability one to an invariant set for stochastic subgradient methods with momentum. From the practical perspective, one issue with the family of methods we consider is that it is not efficiently supported by machine learning frameworks, as they mostly focus on distributed data-parallel strategies. To address this, we propose a new implementation strategy for shared-memory based training of deep neural networks, whereby concurrent parameter servers are utilized to train a partitioned but shared model in single- and multi-GPU settings. Based on this implementation, we achieve on average 1.2x speed-up in comparison to state-of-the-art training methods for popular image classification tasks without compromising accuracy.

preprint2020arXiv

Dynamic Graph Operations: A Consistent Non-blocking Approach

Graph algorithms enormously contribute to the domains such as blockchains, social networks, biological networks, telecommunication networks, and several others. The ever-increasing demand of data-volume, as well as speed of such applications, have essentially transported these applications from their comfort zone: static setting, to a challenging territory of dynamic updates. At the same time, mainstreaming of multi-core processors have entailed that the dynamic applications should be able to exploit concurrency as soon as parallelization gets inhibited. Thus, the design and implementation of efficient concurrent dynamic graph algorithms have become significant. This paper reports a novel library of concurrent shared-memory algorithms for breadth-first search (BFS), single-source shortest-path (SSSP), and betweenness centrality (BC) in a dynamic graph. The presented algorithms are provably non-blocking and linearizable. We extensively evaluate C++ implementations of the algorithms through several micro-benchmarks. The experimental results demonstrate the scalability with the number of threads. Our experiments also highlight the limitations of static graph analytics methods in a dynamic setting.

preprint2020arXiv

Elastic Consistency: A General Consistency Model for Distributed Stochastic Gradient Descent

Machine learning has made tremendous progress in recent years, with models matching or even surpassing humans on a series of specialized tasks. One key element behind the progress of machine learning in recent years has been the ability to train machine learning models in large-scale distributed shared-memory and message-passing environments. Many of these models are trained employing variants of stochastic gradient descent (SGD) based optimization. In this paper, we introduce a general consistency condition covering communication-reduced and asynchronous distributed SGD implementations. Our framework, called elastic consistency enables us to derive convergence bounds for a variety of distributed SGD methods used in practice to train large-scale machine learning models. The proposed framework de-clutters the implementation-specific convergence analysis and provides an abstraction to derive convergence bounds. We utilize the framework to analyze a sparsification scheme for distributed SGD methods in an asynchronous setting for convex and non-convex objectives. We implement the distributed SGD variant to train deep CNN models in an asynchronous shared-memory setting. Empirical results show that error-feedback may not necessarily help in improving the convergence of sparsified asynchronous distributed SGD, which corroborates an insight suggested by our convergence analysis.

preprint2020arXiv

Stochastic Gradient Langevin with Delayed Gradients

Stochastic Gradient Langevin Dynamics (SGLD) ensures strong guarantees with regards to convergence in measure for sampling log-concave posterior distributions by adding noise to stochastic gradient iterates. Given the size of many practical problems, parallelizing across several asynchronously running processors is a popular strategy for reducing the end-to-end computation time of stochastic optimization algorithms. In this paper, we are the first to investigate the effect of asynchronous computation, in particular, the evaluation of stochastic Langevin gradients at delayed iterates, on the convergence in measure. For this, we exploit recent results modeling Langevin dynamics as solving a convex optimization problem on the space of measures. We show that the rate of convergence in measure is not significantly affected by the error caused by the delayed gradient information used for computation, suggesting significant potential for speedup in wall clock time. We confirm our theoretical results with numerical experiments on some practical problems.