Researcher profile

Mohammad Ali Maddah-Ali

Mohammad Ali Maddah-Ali contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
20works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

20 published item(s)

preprint2026arXiv

\mathsf{VISTA}: Decentralized Machine Learning in Adversary Dominated Environments

Decentralized machine learning often relies on outsourcing computations, such as gradient evaluations, to untrusted worker nodes. Existing robust aggregation methods can mitigate malicious behavior under honest-majority assumptions, but may fail when adversaries control a majority of the workers. We study this adversary-dominated setting through an incentive-oriented framework in which reports are accepted and rewarded only when they are mutually consistent up to a threshold. This turns the adversary from a pure saboteur into a rational agent that trades off increasing estimation error against the risk of rejection and loss of reward. We consider iterative optimization under this model. Unlike one-shot computation, iterative learning requires long-horizon decisions: permissive acceptance rules enable faster early progress but admit more adversarial corruption, while strict rules improve estimation accuracy but cause frequent rejections. We propose \mathsf{VISTA}, an adaptive algorithm that tunes the acceptance threshold using the optimization history. Numerical results show that \mathsf{VISTA} improves convergence over static thresholds. We also provide a rigorous convergence analysis showing that, with suitable incentive-aware adaptation, adversary-dominated decentralized learning can retain the asymptotic convergence behavior of standard SGD without relying on an honest majority.

preprint2026arXiv

Game of Coding: Coding Theory in the Presence of Rational Adversaries, Motivated by Decentralized Machine Learning

Coding theory plays a crucial role in enabling reliable communication, storage, and computation. Classical approaches assume a worst-case adversarial model and ensure error correction and data recovery only when the number of honest nodes exceeds the number of adversarial ones by some margin. However, in some emerging decentralized applications, particularly in decentralized machine learning (DeML), participating nodes are rewarded for accepted contributions. This incentive structure naturally gives rise to rational adversaries who act strategically rather than behaving in purely malicious ways. In this paper, we first motivate the need for coding in the presence of rational adversaries, particularly in the context of outsourced computation in decentralized systems. We contrast this need with existing approaches and highlight their limitations. We then introduce the game of coding, a novel game-theoretic framework that extends coding theory to trust-minimized settings where honest nodes are not in the majority. Focusing on repetition coding, we highlight two key features of this framework: (1) the ability to achieve a non-zero probability of data recovery even when adversarial nodes are in the majority, and (2) Sybil resistance, i.e., the equilibrium remains unchanged even as the number of adversarial nodes increases. Finally, we explore scenarios in which the adversary's strategy is unknown and outline several open problems for future research.

preprint2026arXiv

Game of Coding: Sybil Resistant Decentralized Machine Learning with Minimal Trust Assumption

Coding theory plays a crucial role in ensuring data integrity and reliability across various domains, from communication to computation and storage systems. However, its reliance on trust assumptions for data recovery, which requires the number of honest nodes to exceed adversarial nodes by a certain margin, poses significant challenges, particularly in emerging decentralized systems where trust is a scarce resource. To address this, the game of coding framework was introduced, offering insights into strategies for data recovery within incentive-oriented environments. In such environments, participant nodes are rewarded as long as the system remains functional (live). This incentivizes adversaries to maximize their rewards (utility) by ensuring that the decoder, as the data collector (DC), successfully recovers the data, preferably with a high estimation error. This rational behavior is leveraged in a game-theoretic framework, where the equilibrium leads to a robust and resilient system, referred to as the game of coding. The focus of the earliest version of the game of coding was limited to scenarios involving only two nodes. In this paper, we generalize the game of coding framework to scenarios with $N \ge 2$ nodes, exploring critical aspects of system behavior. Specifically, we (i) demonstrate that the adversary's utility at equilibrium is non-increasing with additional adversarial nodes, ensuring no gain for the adversary and no pain for the DC, thus establishing the game of coding framework's Sybil resistance; (ii) show that increasing the number of honest nodes does not always enhance the DC's utility, providing examples and proposing an algorithm to identify and mitigate this counterintuitive effect; and (iii) outline the optimal strategies for both the DC and the adversary, demonstrating that the system achieves enhanced liveness at equilibrium.

preprint2026arXiv

Learning from Acceptance: Cumulative Regret in the Game of Coding

Classical coding-theoretic guarantees often rely on trust assumptions, such as requiring sufficiently many honest nodes compared with adversarial ones. These assumptions are difficult to enforce in open decentralized systems where participants are not centrally certified. At the same time, such environments often contain incentive mechanisms: participants may be rewarded only when their submitted data are accepted and the system remains functional. This changes the role of an adversary. Rather than acting as a pure saboteur, a strategic adversary may submit data that are consistent enough to be accepted while still degrading the quality of the final estimate. The game-of-coding framework models this strategic interaction between a data collector (DC) and an adversary. Existing works on the game of coding mostly consider the complete-information case, where the DC knows how the adversary trades off acceptance and estimation error. In this paper, we study an incomplete-information version of the game of coding in which the DC, acting as a Stackelberg leader, does not know the adversary's utility trade-off and must learn through repeated interaction. Prior work on the unknown-adversary setting considered an explore-then-commit objective, where only the final selected acceptance rule is evaluated. In contrast, we study the full learning trajectory: every acceptance rule used during the algorithm is executed and contributes to performance. We propose an algorithm that refines its search around promising acceptance rules, prove that it achieves sublinear cumulative regret, and evaluate its performance through numerical experiments.

preprint2022arXiv

Distributed Attribute-based Private Access Control

In attribute-based access control, users with certain verified attributes will gain access to some particular data. Concerning with privacy of the users' attributes, we study the problem of distributed attribute-based private access control (DAPAC) with multiple authorities, where each authority will learn and verify only one of the attributes. To investigate its fundamental limits, we introduce an information theoretic DAPAC framework, with $N \in \mathbb{N}$, $N\geq 2$, replicated non-colluding servers (authorities) and some users. Each user has an attribute vector $\mathbf{v^*}=(v_1^*, ..., v_N^*)$ of dimension $N$ and is eligible to retrieve a message $W^{\mathbf{v}^*}$, available in all servers. Each server $n\in [N]$ is able to only observe and verify the $n$'th attribute of a user. In response, it sends a function of its data to the user. The system must satisfy the following conditions: (1) Correctness: the user with attribute vector $\mathbf{v^*}$ is able to retrieve his intended message $W^{\mathbf{v}^*}$ from the servers' response, (2) Data Secrecy: the user will not learn anything about the other messages, (3) Attribute Privacy: each Server~$n$ learns nothing beyond attribute $n$ of the user. The capacity of the DAPAC is defined as the ratio of the file size and the aggregated size of the responses, maximized over all feasible schemes. We obtain a lower bound on the capacity of this problem by proposing an achievable algorithm with rate $\frac{1}{2K}$, where $K$ is the size of the alphabet of each attribute.

preprint2022arXiv

SwiftAgg: Communication-Efficient and Dropout-Resistant Secure Aggregation for Federated Learning with Worst-Case Security Guarantees

We propose SwiftAgg, a novel secure aggregation protocol for federated learning systems, where a central server aggregates local models of $N$ distributed users, each of size $L$, trained on their local data, in a privacy-preserving manner. Compared with state-of-the-art secure aggregation protocols, SwiftAgg significantly reduces the communication overheads without any compromise on security. Specifically, in presence of at most $D$ dropout users, SwiftAgg achieves a users-to-server communication load of $(T+1)L$ and a users-to-users communication load of up to $(N-1)(T+D+1)L$, with a worst-case information-theoretic security guarantee, against any subset of up to $T$ semi-honest users who may also collude with the curious server. The key idea of SwiftAgg is to partition the users into groups of size $D+T+1$, then in the first phase, secret sharing and aggregation of the individual models are performed within each group, and then in the second phase, model aggregation is performed on $D+T+1$ sequences of users across the groups. If a user in a sequence drops out in the second phase, the rest of the sequence remain silent. This design allows only a subset of users to communicate with each other, and only the users in a single group to directly communicate with the server, eliminating the requirements of 1) all-to-all communication network across users; and 2) all users communicating with the server, for other secure aggregation protocols. This helps to substantially slash the communication costs of the system.

preprint2022arXiv

SwiftAgg+: Achieving Asymptotically Optimal Communication Loads in Secure Aggregation for Federated Learning

We propose SwiftAgg+, a novel secure aggregation protocol for federated learning systems, where a central server aggregates local models of $N \in \mathbb{N}$ distributed users, each of size $L \in \mathbb{N}$, trained on their local data, in a privacy-preserving manner. SwiftAgg+ can significantly reduce the communication overheads without any compromise on security, and achieve optimal communication loads within diminishing gaps. Specifically, in presence of at most $D=o(N)$ dropout users, SwiftAgg+ achieves a per-user communication load of $(1+\mathcal{O}(\frac{1}{N}))L$ symbols and a server communication load of $(1+\mathcal{O}(\frac{1}{N}))L$ symbols, with a worst-case information-theoretic security guarantee, against any subset of up to $T=o(N)$ semi-honest users who may also collude with the curious server. Moreover, the proposed SwiftAgg+ allows for a flexible trade-off between communication loads and the number of active communication links. In particular, for $T<N-D$ and for any $K\in\mathbb{N}$, SwiftAgg+ can achieve the server communication load of $(1+\frac{T}{K})L$ symbols, and per-user communication load of up to $(1+\frac{T+D}{K})L$ symbols, where the number of pair-wise active connections in the network is $\frac{N}{2}(K+T+D+1)$.

preprint2021arXiv

CodedSketch: A Coding Scheme for Distributed Computation of Approximated Matrix Multiplication

In this paper, we propose CodedSketch, as a distributed straggler-resistant scheme to compute an approximation of the multiplication of two massive matrices. The objective is to reduce the recovery threshold, defined as the total number of worker nodes that we need to wait for to be able to recover the final result. To exploit the fact that only an approximated result is required, in reducing the recovery threshold, some sorts of pre-compression are required. However, compression inherently involves some randomness that would lose the structure of the matrices. On the other hand, considering the structure of the matrices is crucial to reduce the recovery threshold. In CodedSketch, we use count--sketch, as a hash-based compression scheme, on the rows of the first and columns of the second matrix, and a structured polynomial code on the columns of the first and rows of the second matrix. This arrangement allows us to exploit the gain of both in reducing the recovery threshold. To increase the accuracy of computation, multiple independent count--sketches are needed. This independency allows us to theoretically characterize the accuracy of the result and establish the recovery threshold achieved by the proposed scheme. To guarantee the independency of resulting count--sketches in the output, while keeping its cost on the recovery threshold minimum, we use another layer of structured codes.

preprint2021arXiv

Fundamental Limits of Distributed Encoding

In general coding theory, we often assume that error is observed in transferring or storing encoded symbols, while the process of encoding itself is error-free. Motivated by recent applications of coding theory, in this paper, we consider the case where the process of encoding is distributed and prone to error. We introduce the problem of distributed encoding, comprising of $K\in\mathbb{N}$ isolated source nodes and $N\in\mathbb{N}$ encoding nodes. Each source node has one symbol from a finite field and sends it to all encoding nodes. Each encoding node stores an encoded symbol, as a function of the received symbols. However, some of the source nodes are controlled by the adversary and may send different symbols to different encoding nodes. Depending on the number of adversarial nodes, denoted by $β\in\mathbb{N}$, and the number of symbols that each one generates, denoted by $v\in\mathbb{N}$, the process of decoding from the encoded symbols could be impossible. Assume that a decoder connects to an arbitrary subset of $t \in\mathbb{N}$ encoding nodes and wants to decode the symbols of the honest nodes correctly, without necessarily identifying the sets of honest and adversarial nodes. In this paper, we study $t^*\in\mathbb{N}$, the minimum of $t$, which is a function of $K$, $N$, $β$, and $v$. We show that when the encoding nodes use linear coding, $t^*_{\textrm{linear}}=K+2β(v-1)$, if $N\ge K+2β(v-1)$, and $t^*_{\textrm{linear}}=N$, if $N\le K+2β(v-1)$. In order to achieve $t^*_{\textrm{linear}}$, we use random linear coding and show that in any feasible solution that the decoder finds, the messages of the honest nodes are decoded correctly. For the converse of the fundamental limit, we show that when the adversary behaves in a particular way, it can always confuse the decoder between two feasible solutions that differ in the message of at least one honest node.

preprint2021arXiv

Multi-Party Proof Generation in QAP-based zk-SNARKs

Zero-knowledge succinct non-interactive argument of knowledge (zkSNARK) allows a party, known as the prover, to convince another party, known as the verifier, that he knows a private value $v$, without revealing it, such that $F(u,v)=y$ for some function $F$ and public values $u$ and $y$. There are various versions of zk-SNARK, among them, Quadratic Arithmetic Program (QAP)-based zk-SNARK has been widely used in practice, specially in Blockchain technology. This is attributed to two desirable features; its fixed-size proof and the very light computation load of the verifier. However, the computation load of the prover in QAP-based zkSNARKs, is very heavy, even-though it is designed to be very efficient. This load can be beyond the prover&#39;s computation power to handle, and has to be offloaded to some external servers. In the existing offloading solutions, either (i) the load of computation, offloaded to each sever, is a fraction of the prover&#39;s primary computation (e.g., DZIK), however the servers need to be trusted, (ii) the servers are not required to be trusted, but the computation complexity imposed to each one is the same as the prover&#39;s primary computation (e.g., Trinocchio). In this paper, we present a scheme, which has the benefits of both solutions. In particular, we propose a secure multi-party proof generation algorithm where the prover can delegate its task to $N $ servers, where (i) even if a group of $T \in \mathbb{N}$ servers, $T\le N$, collude, they cannot gain any information about the secret value $v$, (ii) the computation complexity of each server is less than $1/(N-T)$ of the prover&#39;s primary computation. The design is such that we don&#39;t lose the efficiency of the prover&#39;s algorithm in the process of delegating the tasks to external servers.

preprint2021arXiv

Optimal Communication-Computation Trade-Off in Heterogeneous Gradient Coding

Gradient coding allows a master node to derive the aggregate of the partial gradients, calculated by some worker nodes over the local data sets, with minimum communication cost, and in the presence of stragglers. In this paper, for gradient coding with linear encoding, we characterize the optimum communication cost for heterogeneous distributed systems with \emph{arbitrary} data placement, with $s \in \mathbb{N}$ stragglers and $a \in \mathbb{N}$ adversarial nodes. In particular, we show that the optimum communication cost, normalized by the size of the gradient vectors, is equal to $(r-s-2a)^{-1}$, where $r \in \mathbb{N}$ is the minimum number that a data partition is replicated. In other words, the communication cost is determined by the data partition with the minimum replication, irrespective of the structure of the placement. The proposed achievable scheme also allows us to target the computation of a polynomial function of the aggregated gradient matrix. It also allows us to borrow some ideas from approximation computing and propose an approximate gradient coding scheme for the cases when the repetition in data placement is smaller than what is needed to meet the restriction imposed on communication cost or when the number of stragglers appears to be more than the presumed value in the system design.

preprint2021arXiv

The Capacity Region of Distributed Multi-User Secret Sharing

In this paper, we study the problem of distributed multi-user secret sharing, including a trusted master node, $N\in \mathbb{N}$ storage nodes, and $K$ users, where each user has access to the contents of a subset of storage nodes. Each user has an independent secret message with certain rate, defined as the size of the message normalized by the size of a storage node. Having access to the secret messages, the trusted master node places encoded shares in the storage nodes, such that (i) each user can recover its own message from the content of the storage nodes that it has access to, (ii) each user cannot gain any information about the message of any other user. We characterize the capacity region of the distributed multi-user secret sharing, defined as the set of all achievable rate tuples, subject to the correctness and privacy constraints. In the achievable scheme, for each user, the master node forms a polynomial with the degree equal to the number of its accessible storage nodes minus one, where the value of this polynomial at certain points are stored as the encoded shares. The message of that user is embedded in some of the coefficients of the polynomial. The remaining coefficients are determined such that the content of each storage node serves as the encoded shares for all users that have access to that storage node.

preprint2020arXiv

A Hybrid-Order Distributed SGD Method for Non-Convex Optimization to Balance Communication Overhead, Computational Complexity, and Convergence Rate

In this paper, we propose a method of distributed stochastic gradient descent (SGD), with low communication load and computational complexity, and still fast convergence. To reduce the communication load, at each iteration of the algorithm, the worker nodes calculate and communicate some scalers, that are the directional derivatives of the sample functions in some \emph{pre-shared directions}. However, to maintain accuracy, after every specific number of iterations, they communicate the vectors of stochastic gradients. To reduce the computational complexity in each iteration, the worker nodes approximate the directional derivatives with zeroth-order stochastic gradient estimation, by performing just two function evaluations rather than computing a first-order gradient vector. The proposed method highly improves the convergence rate of the zeroth-order methods, guaranteeing order-wise faster convergence. Moreover, compared to the famous communication-efficient methods of model averaging (that perform local model updates and periodic communication of the gradients to synchronize the local models), we prove that for the general class of non-convex stochastic problems and with reasonable choice of parameters, the proposed method guarantees the same orders of communication load and convergence rate, while having order-wise less computational complexity. Experimental results on various learning problems in neural networks applications demonstrate the effectiveness of the proposed approach compared to various state-of-the-art distributed SGD methods.

preprint2020arXiv

BlockMarkchain: A Secure Decentralized Data Market with a Constant Load on the Blockchain

In this paper, we develop BlockMarkchain, as a secure data market place, where individual data sellers can exchange certified data with buyers, in a secure environment, without any mutual trust among the parties, and without trusting on a third party, as a mediator. To develop this platform, we rely on a smart contract, deployed on a secure public blockchain. The main challenges here are to verify the validity of data and to prevent malicious behavior of the parties, while preserving the privacy of the data and taking into account the limited computing and storage resources available on the blockchain. In BlockMarkchain, the buyer has the option to dispute the honesty of the seller and prove the invalidity of the data to the smart contract. The smart contract evaluates the buyer&#39;s claim and punishes the dishonest party by forfeiting his/her deposit in favor of the honest party. BlockMarkchain enjoys several salient features including (i) the certified data has never been revealed on the public blockchain, (ii) the size of data posted on the blockchain, the load of computation on the blockchain, and the cost of communication with the blockchain is constant and negligible, and (iii) the computation cost of verifications on the parties is not expensive.

preprint2020arXiv

Coded Secure Multi-Party Computation for Massive Matrices with Adversarial Nodes

In this work, we consider the problem of secure multi-party computation (MPC), consisting of $Γ$ sources, each has access to a large private matrix, $N$ processing nodes or workers, and one data collector or master. The master is interested in the result of a polynomial function of the input matrices. Each source sends a randomized functions of its matrix, called as its share, to each worker. The workers process their shares in interaction with each other, and send some results to the master such that it can derive the final result. There are several constraints: (1) each worker can store a function of each input matrix, with the size of $\frac{1}{m}$ fraction of that input matrix, (2) up to $t$ of the workers, for some integer $t$, are adversary and may collude to gain information about the private inputs or can do malicious actions to make the final result incorrect. The objective is to design an MPC scheme with the minimum number the workers, called the recovery threshold, such that the final result is correct, workers learn no information about the input matrices, and the master learns nothing beyond the final result. In this paper, we propose an MPC scheme that achieves the recovery threshold of $3t+2m-1$ workers, which is order-wise less than the recovery threshold of the conventional methods. The challenge in dealing with this set up is that when nodes interact with each other, the malicious messages that adversarial nodes generate propagate through the system, and can mislead the honest nodes. To deal with this challenge, we design some subroutines that can detect erroneous messages, and correct or drop them.

preprint2020arXiv

Corella: A Private Multi Server Learning Approach based on Correlated Queries

The emerging applications of machine learning algorithms on mobile devices motivate us to offload the computation tasks of training a model or deploying a trained one to the cloud or at the edge of the network. One of the major challenges in this setup is to guarantee the privacy of the client data. Various methods have been proposed to protect privacy in the literature. Those include (i) adding noise to the client data, which reduces the accuracy of the result, (ii) using secure multiparty computation (MPC), which requires significant communication among the computing nodes or with the client, (iii) relying on homomorphic encryption (HE) methods, which significantly increases computation load at the servers. In this paper, we propose $\textit{Corella}$ as an alternative approach to protect the privacy of data. The proposed scheme relies on a cluster of servers, where at most $T \in \mathbb{N}$ of them may collude, each running a learning model (e.g., a deep neural network). Each server is fed with the client data, added with $\textit{strong}$ noise, independent from user data. The variance of the noise is set to be large enough to make the information leakage to any subset of up to $T$ servers information-theoretically negligible. On the other hand, the added noises for different servers are $\textit{correlated}$. This correlation among the queries allows the parameters of the models running on different servers to be $\textit{trained}$ such that the client can mitigate the contribution of the noises by combining the outputs of the servers, and recover the final result with high accuracy and with a minor computational effort. Simulation results for various datasets demonstrate the accuracy of the proposed approach for the classification, using deep neural networks, and the autoencoder, as supervised and unsupervised learning tasks, respectively.

preprint2020arXiv

DRL-Based QoS-Aware Resource Allocation Scheme for Coexistence of Licensed and Unlicensed Users in LTE and Beyond

In this paper, we employ deep reinforcement learning to develop a novel radio resource allocation and packet scheduling scheme for different Quality of Service (QoS) requirements applicable to LTEadvanced and 5G networks. In addition, regarding the scarcity of spectrum in below 6GHz bands, the proposed algorithm dynamically allocates the resource blocks (RBs) to licensed users in a way to mostly preserve the continuity of unallocated RBs. This would improve the efficiency of communication among the unlicensed entities by increasing the chance of uninterrupted communication and reducing the load of coordination overheads. The optimization problem is formulated as a Markov Decision Process (MDP), observing the entire queue of the demands, where failing to meet QoS constraints penalizes the goal with a multiplicative factor. Furthermore, a notion of continuity for unallocated resources is taken into account as an additive term in the objective function. Considering the variations in both channel coefficients and users requests, we utilize a deep reinforcement learning algorithm as an online and numerically efficient approach to solve the MDP. Numerical results show that the proposed method achieves higher average spectral efficiency, while considering delay budget and packet loss ratio, compared to the conventional greedy min-delay and max-throughput schemes, in which a fixed part of the spectrum is forced to be vacant for unlicensed entities.

preprint2020arXiv

Private Sequential Function Computation

Consider a system, including a user, $N$ servers, and $K$ basic functions which are known at all of the servers. Using the combination of those basic functions, it is possible to construct a wide class of functions. The user wishes to compute a particular combination of the basic functions, by offloading the computation to $N$ servers, while the servers should not obtain any information about which combination of the basic functions is to be computed. The objective is to minimize the total number of queries asked by the user from the servers to achieve the desired result. As a first step toward this problem, in this paper, we consider the case where the user is interested in a class of functions which are composition of the basic functions, while each basic function appears in the composition exactly once. This means that in this case, to ensure privacy, we only require to hide to the order of the basic functions in the desired composition of the user. We further assume that the basic functions are linear and can be represented by (possibly large-scale) matrices. We call this problem as private sequential function computation. We study the capacity $C$, defined as the supremum of the number of desired computations, normalized by the number of computations done at the servers, subject to the privacy constraint. We prove that $(1-\frac{1}{N})/ (1-\frac{1}{\max(K,N)}) \le C \le 1$. For the achievability, we show that the user can retrieve the desired order of composition, by choosing a proper order of inquiries among different servers, while keeping the order of computations for each server fixed, irrespective of the desired order of composition. In the end, we develop an information-theoretic converse which results in an upper bound on the capacity.

preprint2020arXiv

Secure Coded Multi-Party Computation for Massive Matrix Operations

In this paper, we consider a secure multi-party computation problem (MPC), where the goal is to offload the computation of an arbitrary polynomial function of some massive private matrices (inputs) to a cluster of workers. The workers are not reliable. Some of them may collude to gain information about the input data (semi-honest workers). The system is initialized by sharing a (randomized) function of each input matrix to each server. Since the input matrices are massive, each share&#39;s size is assumed to be at most $1/k$ fraction of the input matrix, for some $k \in \mathbb{N}$. The objective is to minimize the number of workers needed to perform the computation task correctly, such that even if an arbitrary subset of $t-1$ workers, for some $t \in \mathbb{N}$, collude, they cannot gain any information about the input matrices. We propose a sharing scheme, called \emph{polynomial sharing}, and show that it admits basic operations such as adding and multiplication of matrices and transposing a matrix. By concatenating the procedures for basic operations, we show that any polynomial function of the input matrices can be calculated, subject to the problem constraints. We show that the proposed scheme can offer order-wise gain in terms of the number of workers needed, compared to the approaches formed by the concatenation of job splitting and conventional MPC approaches.

preprint2020arXiv

Straggler Mitigation in Distributed Matrix Multiplication: Fundamental Limits and Optimal Coding

We consider the problem of massive matrix multiplication, which underlies many data analytic applications, in a large-scale distributed system comprising a group of worker nodes. We target the stragglers&#39; delay performance bottleneck, which is due to the unpredictable latency in waiting for slowest nodes (or stragglers) to finish their tasks. We propose a novel coding strategy, named \emph{entangled polynomial code}, for designing the intermediate computations at the worker nodes in order to minimize the recovery threshold (i.e., the number of workers that we need to wait for in order to compute the final output). We demonstrate the optimality of entangled polynomial code in several cases, and show that it provides orderwise improvement over the conventional schemes for straggler mitigation. Furthermore, we characterize the optimal recovery threshold among all linear coding strategies within a factor of $2$ using \emph{bilinear complexity}, by developing an improved version of the entangled polynomial code. In particular, while evaluating bilinear complexity is a well-known challenging problem, we show that optimal recovery threshold for linear coding strategies can be approximated within a factor of $2$ of this fundamental quantity. On the other hand, the improved version of the entangled polynomial code enables further and orderwise reduction in the recovery threshold, compared to its basic version. Finally, we show that the techniques developed in this paper can also be extended to several other problems such as coded convolution and fault-tolerant computing, leading to tight characterizations.