Source author record

Greg Steinbrecher

Greg Steinbrecher appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Networking and Internet Architecture Artificial Intelligence Distributed, Parallel, and Cluster Computing

Catalog footprint

What is connected

3works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Collective Communication for 100k+ GPUs

The increasing scale of large language models (LLMs) necessitates highly efficient collective communication frameworks, particularly as training workloads extend to hundreds of thousands of GPUs. Traditional communication methods face significant throughput and latency limitations at this scale, hindering both the development and deployment of state-of-the-art models. This paper presents the NCCLX collective communication framework, developed at Meta, engineered to optimize performance across the full LLM lifecycle, from the synchronous demands of large-scale training to the low-latency requirements of inference. The framework is designed to support complex workloads on clusters exceeding 100,000 GPUs, ensuring reliable, high-throughput, and low-latency data exchange. Empirical evaluation on the Llama4 model demonstrates substantial improvements in communication efficiency. This research contributes a robust solution for enabling the next generation of LLMs to operate at unprecedented scales.

preprint2026arXiv

Resilient AI Supercomputer Networking using MRC and SRv6

Tail latency dominates the performance of synchronous pretraining jobs when running at very large scales. We describe a three-pronged approach: (1) a new RDMA-based transport protocol, MRC, sprays across many paths and actively load-balances between them, eliminating the issue of flow collisions (2) the use of multi-plane Clos topologies to get the benefits of high switch radix and redundancy, allowing training clusters well over 100K GPUs to be built as two-tier topologies while increasing physical redundancy, and (3) the use of static source-routing using SRv6 to allow MRC the freedom to bypass failures by itself. We describe our experiences running MRC and static SRv6 routing in production in OpenAI and Microsoft's largest training clusters, where it has been used to train the latest frontier models. We demonstrate how MRC allows AI training jobs to ride out many network failures that previously would have interrupted training.

preprint2012arXiv

Cross-Layer Design to Maintain Earthquake Sensor Network Connectivity After Loss of Infrastructure

We present the design of a cross-layer protocol to maintain connectivity in an earthquake monitoring and early warning sensor network in the absence of communications infrastructure. Such systems, by design, warn of events that severely damage or destroy communications infrastructure. However, the data they provide is of critical importance to emergency and rescue decision making in the immediate aftermath of such events, as is continued early warning of aftershocks, tsunamis, or other subsequent dangers. Utilizing a beyond line-of-sight (BLOS) HF physical layer, we propose an adaptable cross-layer network design that meets these specialized requirements. We are able to provide ultra high connectivity (UHC) early warning on strict time deadlines under worst-case channel conditions along with providing sufficient capacity for continued seismic data collection from a 1000 sensor network.