Source author record

Sahasrajit Sarmasarkar

Sahasrajit Sarmasarkar appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Data Structures and Algorithms Distributed, Parallel, and Cluster Computing Machine Learning

Catalog footprint

What is connected

2works

5topics

3close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

On Gradient Coding with Partial Recovery

We consider a generalization of the gradient coding framework where a dataset is divided across $n$ workers and each worker transmits to a master node one or more linear combinations of the gradients over its assigned data subsets. Unlike the conventional framework which requires the master node to recover the sum of the gradients over all the data subsets in the presence of straggler workers, we relax the goal to computing the sum of at least some $α$ fraction of the gradients. We begin by deriving a lower bound on the computation load of any scheme and also propose two strategies which achieve this lower bound, albeit at the cost of high communication load and a number of data partitions which can be polynomial in $n$. We then propose schemes based on cyclic assignment which utilize $n$ data partitions and have a lower communication load. When each worker transmits a single linear combination, we prove lower bounds on the computation load of any scheme using $n$ data partitions. Finally, we describe a class of schemes which achieve different intermediate operating points for the computation and communication load and provide simulation results to demonstrate the empirical performance of our schemes.

preprint2021arXiv

Query complexity of heavy hitter estimation

We consider the problem of identifying the subset $\mathcal{S}^γ_{\mathcal{P}}$ of elements in the support of an underlying distribution $\mathcal{P}$ whose probability value is larger than a given threshold $γ$, by actively querying an oracle to gain information about a sequence $X_1, X_2, \ldots$ of $i.i.d.$ samples drawn from $\mathcal{P}$. We consider two query models: $(a)$ each query is an index $i$ and the oracle return the value $X_i$ and $(b)$ each query is a pair $(i,j)$ and the oracle gives a binary answer confirming if $X_i = X_j$ or not. For each of these query models, we design sequential estimation algorithms which at each round, either decide what query to send to the oracle depending on the entire history of responses or decide to stop and output an estimate of $\mathcal{S}^γ_{\mathcal{P}}$, which is required to be correct with some pre-specified large probability. We provide upper bounds on the query complexity of the algorithms for any distribution $\mathcal{P}$ and also derive lower bounds on the optimal query complexity under the two query models. We also consider noisy versions of the two query models and propose robust estimators which can effectively counter the noise in the oracle responses.