Source author record

Cristian Lumezanu

Cristian Lumezanu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

4works
5topics
4close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

Collective Communication for 100k+ GPUs

The increasing scale of large language models (LLMs) necessitates highly efficient collective communication frameworks, particularly as training workloads extend to hundreds of thousands of GPUs. Traditional communication methods face significant throughput and latency limitations at this scale, hindering both the development and deployment of state-of-the-art models. This paper presents the NCCLX collective communication framework, developed at Meta, engineered to optimize performance across the full LLM lifecycle, from the synchronous demands of large-scale training to the low-latency requirements of inference. The framework is designed to support complex workloads on clusters exceeding 100,000 GPUs, ensuring reliable, high-throughput, and low-latency data exchange. Empirical evaluation on the Llama4 model demonstrates substantial improvements in communication efficiency. This research contributes a robust solution for enabling the next generation of LLMs to operate at unprecedented scales.

preprint2022arXiv

Deep Federated Anomaly Detection for Multivariate Time Series Data

Despite the fact that many anomaly detection approaches have been developed for multivariate time series data, limited effort has been made on federated settings in which multivariate time series data are heterogeneously distributed among different edge devices while data sharing is prohibited. In this paper, we investigate the problem of federated unsupervised anomaly detection and present a Federated Exemplar-based Deep Neural Network (Fed-ExDNN) to conduct anomaly detection for multivariate time series data on different edge devices. Specifically, we first design an Exemplar-based Deep Neural network (ExDNN) to learn local time series representations based on their compatibility with an exemplar module which consists of hidden parameters learned to capture varieties of normal patterns on each edge device. Next, a constrained clustering mechanism (FedCC) is employed on the centralized server to align and aggregate the parameters of different local exemplar modules to obtain a unified global exemplar module. Finally, the global exemplar module is deployed together with a shared feature encoder to each edge device and anomaly detection is conducted by examining the compatibility of testing data to the exemplar module. Fed-ExDNN captures local normal time series patterns with ExDNN and aggregates these patterns by FedCC, and thus can handle the heterogeneous data distributed over different edge devices simultaneously. Thoroughly empirical studies on six public datasets show that ExDNN and Fed-ExDNN can outperform state-of-the-art anomaly detection algorithms and federated learning techniques.

preprint2022arXiv

Ordinal-Quadruplet: Retrieval of Missing Classes in Ordinal Time Series

In this paper, we propose an ordered time series classification framework that is robust against missing classes in the training data, i.e., during testing we can prescribe classes that are missing during training. This framework relies on two main components: (1) our newly proposed ordinal-quadruplet loss, which forces the model to learn latent representation while preserving the ordinal relation among labels, (2) testing procedure, which utilizes the property of latent representation (order preservation). We conduct experiments based on real world multivariate time series data and show the significant improvement in the prediction of missing labels even with 40% of the classes are missing from training. Compared with the well-known triplet loss optimization augmented with interpolation for missing information, in some cases, we nearly double the accuracy.

preprint2011arXiv

Modeling Tiered Pricing in the Internet Transit Market

ISPs are increasingly selling "tiered" contracts, which offer Internet connectivity to wholesale customers in bundles, at rates based on the cost of the links that the traffic in the bundle is traversing. Although providers have already begun to implement and deploy tiered pricing contracts, little is known about how such pricing affects ISPs and their customers. While contracts that sell connectivity on finer granularities improve market efficiency, they are also more costly for ISPs to implement and more difficult for customers to understand. In this work we present two contributions: (1) we develop a novel way of mapping traffic and topology data to a demand and cost model; and (2) we fit this model on three large real-world networks: an European transit ISP, a content distribution network, and an academic research network, and run counterfactuals to evaluate the effects of different pricing strategies on both the ISP profit and the consumer surplus. We highlight three core findings. First, ISPs gain most of the profits with only three or four pricing tiers and likely have little incentive to increase granularity of pricing even further. Second, we show that consumer surplus follows closely, if not precisely, the increases in ISP profit with more pricing tiers. Finally, the common ISP practice of structuring tiered contracts according to the cost of carrying the traffic flows (e.g., offering a discount for traffic that is local) can be suboptimal and that dividing contracts based on both traffic demand and the cost of carrying it into only three or four tiers yields near-optimal profit for the ISP.