Source author record

Arsany Guirguis

Arsany Guirguis appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Networking and Internet Architecture Distributed, Parallel, and Cluster Computing

Catalog footprint

What is connected

5works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Fast Machine Learning with Byzantine Workers and Servers

Machine Learning (ML) solutions are nowadays distributed and are prone to various types of component failures, which can be encompassed in so-called Byzantine behavior. This paper introduces LiuBei, a Byzantine-resilient ML algorithm that does not trust any individual component in the network (neither workers nor servers), nor does it induce additional communication rounds (on average), compared to standard non-Byzantine resilient algorithms. LiuBei builds upon gradient aggregation rules (GARs) to tolerate a minority of Byzantine workers. Besides, LiuBei replicates the parameter server on multiple machines instead of trusting it. We introduce a novel filtering mechanism that enables workers to filter out replies from Byzantine server replicas without requiring communication with all servers. Such a filtering mechanism is based on network synchrony, Lipschitz continuity of the loss function, and the GAR used to aggregate workers' gradients. We also introduce a protocol, scatter/gather, to bound drifts between models on correct servers with a small number of communication messages. We theoretically prove that LiuBei achieves Byzantine resilience to both servers and workers and guarantees convergence. We build LiuBei using TensorFlow, and we show that LiuBei tolerates Byzantine behavior with an accuracy loss of around 5% and around 24% convergence overhead compared to vanilla TensorFlow. We moreover show that the throughput gain of LiuBei compared to another state-of-the-art Byzantine-resilient ML algorithm (that assumes network asynchrony) is 70%.

preprint2020arXiv

Garfield: System Support for Byzantine Machine Learning

We present Garfield, a library to transparently make machine learning (ML) applications, initially built with popular (but fragile) frameworks, e.g., TensorFlow and PyTorch, Byzantine-resilient. Garfield relies on a novel object-oriented design, reducing the coding effort, and addressing the vulnerability of the shared-graph architecture followed by classical ML frameworks. Garfield encompasses various communication patterns and supports computations on CPUs and GPUs, allowing addressing the general question of the very practical cost of Byzantine resilience in SGD-based ML applications. We report on the usage of Garfield on three main ML architectures: (a) a single server with multiple workers, (b) several servers and workers, and (c) peer-to-peer settings. Using Garfield, we highlight several interesting facts about the cost of Byzantine resilience. In particular, (a) Byzantine resilience, unlike crash resilience, induces an accuracy loss, (b) the throughput overhead comes more from communication than from robust aggregation, and (c) tolerating Byzantine servers costs more than tolerating Byzantine workers.

preprint2020arXiv

Genuinely Distributed Byzantine Machine Learning

Machine Learning (ML) solutions are nowadays distributed, according to the so-called server/worker architecture. One server holds the model parameters while several workers train the model. Clearly, such architecture is prone to various types of component failures, which can be all encompassed within the spectrum of a Byzantine behavior. Several approaches have been proposed recently to tolerate Byzantine workers. Yet all require trusting a central parameter server. We initiate in this paper the study of the ``general'' Byzantine-resilient distributed machine learning problem where no individual component is trusted. We show that this problem can be solved in an asynchronous system, despite the presence of $\frac{1}{3}$ Byzantine parameter servers and $\frac{1}{3}$ Byzantine workers (which is optimal). We present a new algorithm, ByzSGD, which solves the general Byzantine-resilient distributed machine learning problem by relying on three major schemes. The first, Scatter/Gather, is a communication scheme whose goal is to bound the maximum drift among models on correct servers. The second, Distributed Median Contraction (DMC), leverages the geometric properties of the median in high dimensional spaces to bring parameters within the correct servers back close to each other, ensuring learning convergence. The third, Minimum-Diameter Averaging (MDA), is a statistically-robust gradient aggregation rule whose goal is to tolerate Byzantine workers. MDA requires loose bound on the variance of non-Byzantine gradient estimates, compared to existing alternatives (e.g., Krum). Interestingly, ByzSGD ensures Byzantine resilience without adding communication rounds (on a normal path), compared to vanilla non-Byzantine alternatives. ByzSGD requires, however, a larger number of messages which, we show, can be reduced if we assume synchrony.

preprint2016arXiv

Cooperation-based Routing in Cognitive Radio Networks

Primary user activity is a major bottleneck for existing routing protocols in cognitive radio networks. Typical routing protocols avoid areas that are highly congested with primary users, leaving only a small fragment of available links for secondary route construction. In addition, wireless links are prone to channel impairments such as multipath fading; which renders the quality of the available links highly fluctuating. In this paper, we investigate using cooperative communication mechanisms to reveal new routing opportunities, enhance route qualities, and enable true coexistence of primary and secondary networks. As a result, we propose Undercover: a cooperative routing protocol that utilizes the available location information to assist in the routing process. Specifically, our protocol revisits a fundamental assumption taken by the state of the art routing protocols designed for cognitive radio networks. Using Undercover, secondary users can transmit in the regions of primary users activity through utilizing cooperative communication techniques to null out transmission at primary receivers via beamforming. In addition, the secondary links qualities are enhanced using cooperative diversity. To account for the excessive levels of interference typically incurred due to cooperative transmissions, we allow our protocol to be interference-aware. Thus, cooperative transmissions are penalized in accordance to the amount of negatively affected secondary flows. We evaluate the performance of our proposed protocol via NS2 simulations which show that our protocol can enhance the network goodput by a ratio reaches up to 250% compared to other popular cognitive routing protocols with minimal added overhead.

preprint2014arXiv

Primary User-aware Network Coding for Multi-hop Cognitive Radio Networks

Network coding has proved its efficiency in increasing the network performance for traditional ad-hoc networks. In this paper, we investigate using network coding for enhancing the throughput of multi-hop cognitive radio networks. We formulate the network coding throughput maximization problem as a graph theory problem, where different constraints and primary users' characteristics are mapped to the graph structure. We then show that the optimal solution to this problem in NP-hard and propose a heuristic algorithm to efficiently solve it. Evaluation of the proposed algorithm through NS2 simulations shows that we can increase the throughput of the constrained secondary users' network by 150\% to 200\% for a wide range of scenarios covering different primary users' densities, traffic loads, and spectrum availability.

Arsany Guirguis

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Fast Machine Learning with Byzantine Workers and Servers

Garfield: System Support for Byzantine Machine Learning

Genuinely Distributed Byzantine Machine Learning

Cooperation-based Routing in Cognitive Radio Networks

Primary User-aware Network Coding for Multi-hop Cognitive Radio Networks