Source author record

Rong Pan

Rong Pan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning Networking and Internet Architecture Applications Computation and Language Computer Science and Game Theory Computer Vision Databases Distributed, Parallel, and Cluster Computing Information Retrieval Multiagent Systems Programming Languages Systems and Control

Catalog footprint

What is connected

9works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Resilient AI Supercomputer Networking using MRC and SRv6

Tail latency dominates the performance of synchronous pretraining jobs when running at very large scales. We describe a three-pronged approach: (1) a new RDMA-based transport protocol, MRC, sprays across many paths and actively load-balances between them, eliminating the issue of flow collisions (2) the use of multi-plane Clos topologies to get the benefits of high switch radix and redundancy, allowing training clusters well over 100K GPUs to be built as two-tier topologies while increasing physical redundancy, and (3) the use of static source-routing using SRv6 to allow MRC the freedom to bypass failures by itself. We describe our experiences running MRC and static SRv6 routing in production in OpenAI and Microsoft's largest training clusters, where it has been used to train the latest frontier models. We demonstrate how MRC allows AI training jobs to ride out many network failures that previously would have interrupted training.

preprint2022arXiv

Auction-Based Ex-Post-Payment Incentive Mechanism Design for Horizontal Federated Learning with Reputation and Contribution Measurement

Federated learning trains models across devices with distributed data, while protecting the privacy and obtaining a model similar to that of centralized ML. A large number of workers with data and computing power are the foundation of federal learning. However, the inevitable costs prevent self-interested workers from serving for free. Moreover, due to data isolation, task publishers lack effective methods to select, evaluate and pay reliable workers with high-quality data. Therefore, we design an auction-based incentive mechanism for horizontal federated learning with reputation and contribution measurement. By designing a reasonable method of measuring contribution, we establish the reputation of workers, which is easy to decline and difficult to improve. Through reverse auctions, workers bid for tasks, and the task publisher selects workers combining reputation and bid price. With the budget constraint, winning workers are paid based on performance. We proved that our mechanism satisfies the individual rationality of the honest worker, budget feasibility, truthfulness, and computational efficiency.

preprint2022arXiv

Design Strategies and Approximation Methods for High-Performance Computing Variability Management

Performance variability management is an active research area in high-performance computing (HPC). We focus on input/output (I/O) variability. To study the performance variability, computer scientists often use grid-based designs (GBDs) to collect I/O variability data, and use mathematical approximation methods to build a prediction model. Mathematical approximation models could be biased particularly if extrapolations are needed. Space-filling designs (SFDs) and surrogate models such as Gaussian process (GP) are popular for data collection and building predictive models. The applicability of SFDs and surrogates in the HPC variability needs investigation. We investigate their applicability in the HPC setting in terms of design efficiency, prediction accuracy, and scalability. We first customize the existing SFDs so that they can be applied in the HPC setting. We conduct a comprehensive investigation of design strategies and the prediction ability of approximation methods. We use both synthetic data simulated from three test functions and the real data from the HPC setting. We then compare different methods in terms of design efficiency, prediction accuracy, and scalability. In synthetic and real data analysis, GP with SFDs outperforms in most scenarios. With respect to approximation models, GP is recommended if the data are collected by SFDs. If data are collected using GBDs, both GP and Delaunay can be considered. With the best choice of approximation method, the performance of SFDs and GBD depends on the property of the underlying surface. For the cases in which SFDs perform better, the number of design points needed for SFDs is about half of or less than that of the GBD to achieve the same prediction accuracy. SFDs that can be tailored to high dimension and non-smooth surface are recommended especially when large numbers of input factors need to be considered in the model.

preprint2022arXiv

Online Auction-Based Incentive Mechanism Design for Horizontal Federated Learning with Budget Constraint

Federated learning makes it possible for all parties with data isolation to train the model collaboratively and efficiently while satisfying privacy protection. To obtain a high-quality model, an incentive mechanism is necessary to motivate more high-quality workers with data and computing power. The existing incentive mechanisms are applied in offline scenarios, where the task publisher collects all bids and selects workers before the task. However, it is practical that different workers arrive online in different orders before or during the task. Therefore, we propose a reverse auction-based online incentive mechanism for horizontal federated learning with budget constraint. Workers submit bids when they arrive online. The task publisher with a limited budget leverages the information of the arrived workers to decide on whether to select the new worker. Theoretical analysis proves that our mechanism satisfies budget feasibility, computational efficiency, individual rationality, consumer sovereignty, time truthfulness, and cost truthfulness with a sufficient budget. The experimental results show that our online mechanism is efficient and can obtain high-quality models.

preprint2020arXiv

Data Migration using Datalog Program Synthesis

This paper presents a new technique for migrating data between different schemas. Our method expresses the schema mapping as a Datalog program and automatically synthesizes a Datalog program from simple input-output examples to perform data migration. This approach can transform data between different types of schemas (e.g., relational-to-graph, document-to-relational) and performs synthesis efficiently by leveraging the semantics of Datalog. We implement the proposed technique as a tool called Dynamite and show its effectiveness by evaluating Dynamite on 28 realistic data migration scenarios.

preprint2020arXiv

Evaluating reliability of complex systems for Predictive maintenance

Predictive Maintenance (PdM) can only be implemented when the online knowledge of system condition is available, and this has become available with deployment of on-equipment sensors. To date, most studies on predicting the remaining useful lifetime of a system have been focusing on either single-component systems or systems with deterministic reliability structures. This assumption is not applicable on some realistic problems, where there exist uncertainties in reliability structures of complex systems. In this paper, a PdM scheme is developed by employing a Discrete Time Markov Chain (DTMC) for forecasting the health of monitored components and a Bayesian Network (BN) for modeling the multi-component system reliability. Therefore, probabilistic inferences on both the system and its components status can be made and PdM can be scheduled on both levels.

preprint2016arXiv

Chinese/English mixed Character Segmentation as Semantic Segmentation

OCR character segmentation for multilingual printed documents is difficult due to the diversity of different linguistic characters. Previous approaches mainly focus on monolingual texts and are not suitable for multilingual-lingual cases. In this work, we particularly tackle the Chinese/English mixed case by reframing it as a semantic segmentation problem. We take advantage of the successful architecture called fully convolutional networks (FCN) in the field of semantic segmentation. Given a wide enough receptive field, FCN can utilize the necessary context around a horizontal position to determinate whether this is a splitting point or not. As a deep neural architecture, FCN can automatically learn useful features from raw text line images. Although trained on synthesized samples with simulated random disturbance, our FCN model generalizes well to real-world samples. The experimental results show that our model significantly outperforms the previous methods.

preprint2015arXiv

Tag-Weighted Topic Model For Large-scale Semi-Structured Documents

To date, there have been massive Semi-Structured Documents (SSDs) during the evolution of the Internet. These SSDs contain both unstructured features (e.g., plain text) and metadata (e.g., tags). Most previous works focused on modeling the unstructured text, and recently, some other methods have been proposed to model the unstructured text with specific tags. To build a general model for SSDs remains an important problem in terms of both model fitness and efficiency. We propose a novel method to model the SSDs by a so-called Tag-Weighted Topic Model (TWTM). TWTM is a framework that leverages both the tags and words information, not only to learn the document-topic and topic-word distributions, but also to infer the tag-topic distributions for text mining tasks. We present an efficient variational inference method with an EM algorithm for estimating the model parameters. Meanwhile, we propose three large-scale solutions for our model under the MapReduce distributed computing platform for modeling large-scale SSDs. The experimental results show the effectiveness, efficiency and the robustness by comparing our model with the state-of-the-art methods in document modeling, tags prediction and text classification. We also show the performance of the three distributed solutions in terms of time and accuracy on document modeling.

preprint2013arXiv

Probe and Adapt: Rate Adaptation for HTTP Video Streaming At Scale

Today, the technology for video streaming over the Internet is converging towards a paradigm named HTTP-based adaptive streaming (HAS). HAS comes with two unique flavors. First, by riding on top of HTTP/TCP, it leverages the network-friendly TCP to achieve firewall/NATS traversal and bandwidth sharing. Second, by pre-encoding and storing the video in a number of discrete bitrate levels, it introduces video bitrate adaptivity in a scalable way that the video encoding is excluded from the closed-loop adaptation. A conventional wisdom is that the TCP throughput observed by a HAS client indicates the available network bandwidth, thus can be used as a reliable reference for the video bitrate selection. We argue that this no longer holds true when HAS becomes a substantial fraction of the Internet traffic. We show that when multiple HAS clients compete at a network bottleneck, the presence of competing clients and the discrete nature of the video bitrates would together create confusion for a client to correctly perceive its fair-share bandwidth. Through analysis and real experiments, we demonstrate that this fundamental limitation would lead to, for example, video rate oscillation that negatively impacts the video watching experiences. We therefore argue that it is necessary to implement at the application layer a "probe-and-adapt" mechanism for HAS video rate adaptation, which is akin but orthogonal to the transport-layer network rate adaptation achieved by TCP. We present PANDA -- a client-side rate adaptation algorithm for HAS -- as an embodiment of this idea. Our testbed results show that compared to conventional algorithms, PANDA is able to reduce the instability of video rate by 60%, at a given risk of buffer underrun.

Rong Pan

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Resilient AI Supercomputer Networking using MRC and SRv6

Auction-Based Ex-Post-Payment Incentive Mechanism Design for Horizontal Federated Learning with Reputation and Contribution Measurement

Design Strategies and Approximation Methods for High-Performance Computing Variability Management

Online Auction-Based Incentive Mechanism Design for Horizontal Federated Learning with Budget Constraint

Data Migration using Datalog Program Synthesis

Evaluating reliability of complex systems for Predictive maintenance

Chinese/English mixed Character Segmentation as Semantic Segmentation

Tag-Weighted Topic Model For Large-scale Semi-Structured Documents

Probe and Adapt: Rate Adaptation for HTTP Video Streaming At Scale