Researcher profile

Nicholas Woolsey

Nicholas Woolsey contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2020arXiv

A Combinatorial Design for Cascaded Coded Distributed Computing on General Networks

Coding theoretic approached have been developed to significantly reduce the communication load in modern distributed computing system. In particular, coded distributed computing (CDC) introduced by Li et al. can efficiently trade computation resources to reduce the communication load in MapReduce like computing systems. For the more general cascaded CDC, Map computations are repeated at r nodes to significantly reduce the communication load among nodes tasked with computing Q Reduce functions s times. In this paper, we propose a novel low-complexity combinatorial design for cascaded CDC which 1) determines both input file and output function assignments, 2) requires significantly less number of input files and output functions, and 3) operates on heterogeneous networks where nodes have varying storage and computing capabilities. We provide an analytical characterization of the computation-communication tradeoff, from which we show the proposed scheme can outperform the state-of-the-art scheme proposed by Li et al. for the homogeneous networks. Further, when the network is heterogeneous, we show that the performance of the proposed scheme can be better than its homogeneous counterpart. In addition, the proposed scheme is optimal within a constant factor of the information theoretic converse bound while fixing the input file and the output function assignments.

preprint2020arXiv

A New Combinatorial Coded Design for Heterogeneous Distributed Computing

Coded Distributed Computing (CDC) introduced by Li et al. in 2015 offers an efficient approach to trade computing power to reduce the communication load in general distributed computing frameworks such as MapReduce and Spark. In particular, increasing the computation load in the Map phase by a factor of r can create coded multicasting opportunities to reduce the communication load in the Shuffle phase by the same factor. However, the CDC scheme is designed for the homogeneous settings, where the storage, computation load and communication load on the computing nodes are the same. In addition, it requires an exponentially large number of input files (data batches), reduce functions and multicasting groups relative to the number of nodes to achieve the promised gain. We address the CDC limitations by proposing a novel CDC approach based on a combinatorial design, which accommodates heterogeneous networks where nodes have varying storage and computing capabilities. In addition, the proposed approach requires an exponentially less number of input files compared to the original CDC scheme proposed by Li et al. Meanwhile, the resulting computation-communication trade-off maintains the multiplicative gain compared to conventional uncoded unicast and asymptotically achieves the optimal performance proposed by Li et al.

preprint2020arXiv

Cache-aided Interference Management using Hypercube Combinatorial Cache Design with Reduced Subpacketizations and Order Optimal Sum-Degrees of Freedom

We consider a cache-aided interference network which consists of a library of $N$ files, $K_T$ transmitters and $K_R$ receivers (users), each equipped with a local cache of size $M_T$ and $M_R$ files respectively, and connected via a discrete-time additive white Gaussian noise (AWGN) channel. Each receiver requests an arbitrary file from the library. The objective is to design a cache placement without knowing the receivers' requests and a communication scheme such that the sum Degrees of Freedom (sum-DoF) of the delivery is maximized. This network model with one-shot transmission was firstly investigated by Naderializadeh {\em et al.}, who proposed a scheme that achieves a one-shot sum-DoF of $\min\{\frac{M_TK_T+K_RM_R}{N}, K_R\}$, which is optimal within a constant of $2$. One of the biggest limitations of this scheme is the requirement of high subpacketization level. This paper attempts to design new algorithms to reduce the file subpacketization in such a network without hurting the sum-DoF. In particular, we propose a new approach for both prefetching and linearly coded delivery based on a combinatorial design called {\em hypercube}. The proposed approach reduces the subpacketization exponentially in terms of $K_R M/N$ and achieves the identical one-shot sum DoF when $\frac{M_TK_T+K_RM_R}{N} \leq K_R$.

preprint2020arXiv

Cache-aided Interference Management Using Hypercube Combinatorial Cache Designs

We consider a cache-aided interference network which consists of a library of $N$ files, $K_T$ transmitters and $K_R$ receivers (users), each equipped with a local cache of size $M_T$ and $M_R$ files respectively, and connected via a discrete-time additive white Gaussian noise channel. Each receiver requests an arbitrary file from the library. The objective is to design a cache placement without knowing the receivers' requests and a communication scheme such that the sum Degrees of Freedom (sum-DoF) of the delivery is maximized. This network model has been investigated by Naderializadeh {\em et al.}, who proposed a prefetching and a delivery schemes that achieves a sum-DoF of $\min\{\frac{M_TK_T+K_RM_R}{N}, K_R\}$. One of biggest limitations of this scheme is the requirement of high subpacketization level. This paper is the first attempt in the literature (according to our knowledge) to reduce the file subpacketization in such a network. In particular, we propose a new approach for both prefetching and linear delivery schemes based on a combinatorial design called {\em hypercube}. We show that required number of packets per file can be exponentially reduced compared to the state of the art scheme proposed by Naderializadeh {\em et al.}, or the NMA scheme. When $M_TK_T+K_RM_R \geq K_R$, the achievable one-shot sum-DoF using this approach is $\frac{M_TK_T+K_RM_R}{N}$ , which shows that 1) the one-shot sum-DoF scales linearly with the aggregate cache size in the network and 2) it is within a factor of $2$ to the information-theoretic optimum. Surprisingly, the identical and near optimal sum-DoF performance can be achieved using the hypercube approach with a much less file subpacketization.

preprint2020arXiv

Coded Elastic Computing on Machines with Heterogeneous Storage and Computation Speed

We study the optimal design of heterogeneous Coded Elastic Computing (CEC) where machines have varying computation speeds and storage. CEC introduced by Yang et al. in 2018 is a framework that mitigates the impact of elastic events, where machines can join and leave at arbitrary times. In CEC, data is distributed among machines using a Maximum Distance Separable (MDS) code such that subsets of machines can perform the desired computations. However, state-of-the-art CEC designs only operate on homogeneous networks where machines have the same speeds and storage. This may not be practical. In this work, based on an MDS storage assignment, we develop a novel computation assignment approach for heterogeneous CEC networks to minimize the overall computation time. We first consider the scenario where machines have heterogeneous computing speeds but same storage and then the scenario where both heterogeneities are present. We propose a novel combinatorial optimization formulation and solve it exactly by decomposing it into a convex optimization problem for finding the optimal computation load and a "filling problem" for finding the exact computation assignment. A low-complexity "filling algorithm" is adapted and can be completed within a number of iterations equals at most the number of available machines.

preprint2020arXiv

FLCD: A Flexible Low Complexity Design of Coded Distributed Computing

We propose a flexible low complexity design (FLCD) of coded distributed computing (CDC) with empirical evaluation on Amazon Elastic Compute Cloud (Amazon EC2). CDC can expedite MapReduce like computation by trading increased map computations to reduce communication load and shuffle time. A main novelty of FLCD is to utilize the design freedom in defining map and reduce functions to develop asymptotic homogeneous systems to support varying intermediate values (IV) sizes under a general MapReduce framework. Compared to existing designs with constant IV sizes, FLCD offers greater flexibility in adapting to network parameters and significantly reduces the implementation complexity by requiring fewer input files and shuffle groups. The FLCD scheme is the first proposed low-complexity CDC design that can operate on a network with an arbitrary number of nodes and computation load. We perform empirical evaluations of the FLCD by executing the TeraSort algorithm on an Amazon EC2 cluster. This is the first time that theoretical predictions of the CDC shuffle time are validated by empirical evaluations. The evaluations demonstrate a 2.0 to 4.24x speedup compared to conventional uncoded MapReduce, a 12% to 52% reduction in total time, and a wider range of operating network parameters compared to existing CDC schemes.

preprint2020arXiv

Heterogeneous Computation Assignments in Coded Elastic Computing

We study the optimal design of a heterogeneous coded elastic computing (CEC) network where machines have varying relative computation speeds. CEC introduced by Yang {\it et al.} is a framework which mitigates the impact of elastic events, where machines join and leave the network. A set of data is distributed among storage constrained machines using a Maximum Distance Separable (MDS) code such that any subset of machines of a specific size can perform the desired computations. This design eliminates the need to re-distribute the data after each elastic event. In this work, we develop a process for an arbitrary heterogeneous computing network to minimize the overall computation time by defining an optimal computation load, or number of computations assigned to each machine. We then present an algorithm to define a specific computation assignment among the machines that makes use of the MDS code and meets the optimal computation load.