Source author record

Yashar Ganjali

Yashar Ganjali appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Networking and Internet Architecture Distributed, Parallel, and Cluster Computing Machine Learning

Catalog footprint

What is connected

2works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

DBLP: Phase-Aware Bounded-Loss Transport for Burst-Resilient Distributed ML Training

Distributed machine learning (ML) training has become a necessity with the prevalence of billion to trillion-parameter-scale models. While prior work has improved training efficiency from the ML perspective at the application layer, it often fails to address transient congestion events at the network layer that introduce severe tail latency and training-time variability, thereby undermining the quality of service (QoS) of distributed ML training systems. Existing network optimizations treat all gradients equally and thus fail to integrate sufficient model-training insights into communication protocol design. In this paper, we present Dynamic Bounded-Loss Protocol (DBLP), a burst-resilient, training-phase-aware, and hardware-agnostic transport protocol that incorporates model-level tolerance properties into gradient communication. By dynamically adjusting gradient loss tolerance across training phases, DBLP reduces overall training time and mitigates tail-latency collapse during transient high-loss events (i.e., microbursts). Compared to the current state-of-the-art solution (baseline), DBLP tolerates significantly higher loss while achieving comparable test accuracy, and reduces end-to-end training time by an average of 24.4% and a maximum of 33.9%. At microburst events, DBLP achieves up to 5.88x single-round communication latency speedups over the baseline, preventing burst-induced tail-latency spikes and maintaining stable training performance.

preprint2022arXiv

DWTCP: Ultra Low Latency Congestion Control Protocol for Data Centers

Congestion control algorithms rely on a variety of congestion signals (packet loss, Explicit Congestion Notification, delay, etc.) to achieve fast convergence, high utilization, and fairness among flows. A key limitation of these congestion signals is that they are either late in feedback or they incur significant overheads. An ideal congestion control must discover any available bandwidth in the network, detect congestion as soon as link utilization approaches full capacity, and react timely to avoid queuing and packet drops, without significant overheads. To this end, this work proposes Scout service that leverages priority queues to infer bandwidth availability and link busyness at the host. The key observation here is that as the high priority queue (HPQ) gets busier, the low priority queue (LPQ) is served less. Therefore, the state of the link can be observed from the LPQ and any congestion can be detected several RTTs earlier than observing the HPQ. We propose a new transport protocol, Double-Window Transmission Control Protocol (DWTCP) that builds upon the Scout service to dynamically adjust its congestion window. Our testbed and simulation-based evaluation demonstrates that Scout enables a data center transport to achieve high throughput, near-zero queues, lower latency, and high fairness.