Researcher profile

Parimal Parag

Parimal Parag contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2026arXiv

Self-Creating Random Walks for Decentralized Learning under Pac-Man Attacks

Random walk (RW)-based algorithms have long been popular in distributed systems due to low overheads and scalability, with recent growing applications in decentralized learning. However, their reliance on local interactions makes them inherently vulnerable to malicious behavior. In this work, we investigate an adversarial threat that we term the ``Pac-Man'' attack, in which a malicious node probabilistically terminates any RW that visits it. This stealthy behavior gradually eliminates active RWs from the network, effectively halting the learning process without triggering failure alarms. To counter this threat, we propose the CREATE-IF-LATE (CIL) algorithm, which is a fully decentralized, resilient mechanism that enables self-creating RWs and prevents RW extinction in the presence of Pac-Man. Our theoretical analysis shows that the CIL algorithm guarantees several desirable properties, such as (i) non-extinction of the RW population, (ii) almost sure boundedness of the RW population, and (iii) convergence of RW-based stochastic gradient descent even in the presence of Pac-Man with a quantifiable deviation from the true optimum. Moreover, the learning process experiences at most a linear time delay due to Pac-Man interruptions and RW regeneration. Our extensive empirical results on both synthetic and public benchmark datasets validate our theoretical findings.

preprint2022arXiv

Adaptive Stochastic Gradient Descent for Fast and Communication-Efficient Distributed Learning

We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on $n$ workers, each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or unresponsive workers who cause delays. One solution studied in the literature is to wait at each iteration for the responses of the fastest $k<n$ workers before updating the model, where $k$ is a fixed parameter. The choice of the value of $k$ presents a trade-off between the runtime (i.e., convergence rate) of SGD and the error of the model. Towards optimizing the error-runtime trade-off, we investigate distributed SGD with adaptive~$k$, i.e., varying $k$ throughout the runtime of the algorithm. We first design an adaptive policy for varying $k$ that optimizes this trade-off based on an upper bound on the error as a function of the wall-clock time that we derive. Then, we propose and implement an algorithm for adaptive distributed SGD that is based on a statistical heuristic. Our results show that the adaptive version of distributed SGD can reach lower error values in less time compared to non-adaptive implementations. Moreover, the results also show that the adaptive version is communication-efficient, where the amount of communication required between the master and the workers is less than that of non-adaptive versions.

preprint2021arXiv

Latency-Redundancy Tradeoff in Distributed Read-Write Systems

Data is replicated and stored redundantly over multiple servers for availability in distributed databases. We focus on databases with frequent reads and writes, where both read and write latencies are important. This is in contrast to databases designed primarily for either read or write applications. Redundancy has contrasting effects on read and write latency. Read latency can be reduced by potential parallel access from multiple servers, whereas write latency increases as a larger number of replicas have to be updated. We quantify this tradeoff between read and write latency as a function of redundancy, and provide a closed-form approximation when the request arrival is Poisson and the service is memoryless. We empirically show that this approximation is tight across all ranges of system parameters. Thus, we provide guidelines for redundancy selection in distributed databases.

preprint2021arXiv

Optimal Pricing in Multi Server Systems

We study optimal service pricing in server farms where customers arrive according to a renewal process and have independent and identical ($i.i.d.$) exponential service times and $i.i.d.$ valuations of the service. The service provider charges a time varying service fee aiming at maximizing its revenue rate. The customers that find free servers and service fees lesser than their valuation join for the service else they leave without waiting. We consider both finite server and infinite server farms. We solve the optimal pricing problems using the framework of Markov decision problems. We show that the optimal prices depend on the number of free servers. We propose algorithms to compute the optimal prices. We also establish several properties of the optimal prices and the corresponding revenue rates in the case of Poisson customer arrivals. We illustrate all our findings via numerical results.

preprint2020arXiv

Latency optimal storage and scheduling of replicated fragments for memory-constrained servers

We consider the setting of distributed storage system where a single file is subdivided into smaller fragments of same size which are then replicated with a common replication factor across servers of identical cache size. An incoming file download request is sent to all the servers, and the download is completed whenever request gathers all the fragments. At each server, we are interested in determining the set of fragments to be stored, and the sequence in which fragments should be accessed, such that the mean file download time for a request is minimized. We model the fragment download time as an exponential random variable independent and identically distributed for all fragments across all servers, and show that the mean file download time can be lower bounded in terms of the expected number of useful servers summed over all distinct fragment downloads. We present deterministic storage schemes that attempt to maximize the number of useful servers. We show that finding the optimal sequence of accessing the fragments is a Markov decision problem, whose complexity grows exponentially with the number of fragments. We propose heuristic algorithms that determine the sequence of access to the fragments which are empirically shown to perform well.

preprint2020arXiv

Tracking an Auto-Regressive Process with Limited Communication per Unit Time

Samples from a high-dimensional AR[1] process are observed by a sender which can communicate only finitely many bits per unit time to a receiver. The receiver seeks to form an estimate of the process value at every time instant in real-time. We consider a time-slotted communication model in a slow-sampling regime where multiple communication slots occur between two sampling instants. We propose a successive update scheme which uses communication between sampling instants to refine estimates of the latest sample and study the following question: Is it better to collect communication of multiple slots to send better refined estimates, making the receiver wait more for every refinement, or to be fast but loose and send new information in every communication opportunity? We show that the fast but loose successive update scheme with ideal spherical codes is universally optimal asymptotically for a large dimension. However, most practical quantization codes for fixed dimensions do not meet the ideal performance required for this optimality, and they typically will have a bias in the form of a fixed additive error. Interestingly, our analysis shows that the fast but loose scheme is not an optimal choice in the presence of such errors, and a judiciously chosen frequency of updates outperforms it.

preprint2019arXiv

Optimal Server Selection for Straggler Mitigation

The performance of large-scale distributed compute systems is adversely impacted by stragglers when the execution time of a job is uncertain. To manage stragglers, we consider a multi-fork approach for job scheduling, where additional parallel servers are added at forking instants. In terms of the forking instants and the number of additional servers, we compute the job completion time and the cost of server utilization when the task processing times are assumed to have a shifted exponential distribution. We use this study to provide insights into the scheduling design of the forking instants and the associated number of additional servers to be started. Numerical results demonstrate orders of magnitude improvement in cost in the regime of low completion times as compared to the prior works.

preprint2018arXiv

Minimizing Latency for Secure Coded Computing Using Secret Sharing via Staircase Codes

We consider the setting of a Master server, M, who possesses confidential data (e.g., personal, genomic or medical data) and wants to run intensive computations on it, as part of a machine learning algorithm for example. The Master wants to distribute these computations to untrusted workers who have volunteered or are incentivized to help with this task. However, the data must be kept private and not revealed to the individual workers. Some of the workers may be stragglers, e.g., slow or busy, and will take a random time to finish the task assigned to them. We are interested in reducing the delays experienced by the Master. We focus on linear computations as an essential operation in many iterative algorithms such as principal component analysis, support vector machines and other gradient-descent based algorithms. A classical solution is to use a linear secret sharing scheme, such as Shamir&#39;s scheme, to divide the data into secret shares on which the workers can perform linear computations. However, classical codes can provide straggler mitigation assuming a worst-case scenario of a fixed number of stragglers. We propose a solution based on new secure codes, called Staircase codes, introduced previously by two of the authors. Staircase codes allow flexibility in the number of stragglers up to a given maximum, and universally achieve the information theoretic limit on the download cost by the Master, leading to latency reduction. Under the shifted exponential model, we find upper and lower bounds on the Master&#39;s mean waiting time. We derive the distribution of the Master&#39;s waiting time, and its mean, for systems with up to two stragglers. For systems with any number of stragglers, we derive an expression that can give the exact distribution, and the mean, of the waiting time of the Master. We show that Staircase codes always outperform classical secret sharing codes.

preprint2017arXiv

Minimizing Latency for Secure Distributed Computing

We consider the setting of a master server who possesses confidential data (genomic, medical data, etc.) and wants to run intensive computations on it, as part of a machine learning algorithm for example. The master wants to distribute these computations to untrusted workers who have volunteered or are incentivized to help with this task. However, the data must be kept private (in an information theoretic sense) and not revealed to the individual workers. The workers may be busy, or even unresponsive, and will take a random time to finish the task assigned to them. We are interested in reducing the aggregate delay experienced by the master. We focus on linear computations as an essential operation in many iterative algorithms. A known solution is to use a linear secret sharing scheme to divide the data into secret shares on which the workers can compute. We propose to use instead new secure codes, called Staircase codes, introduced previously by two of the authors. We study the delay induced by Staircase codes which is always less than that of secret sharing. The reason is that secret sharing schemes need to wait for the responses of a fixed fraction of the workers, whereas Staircase codes offer more flexibility in this respect. For instance, for codes with rate $R=1/2$ Staircase codes can lead to up to $40\%$ reduction in delay compared to secret sharing.