Source author record

Xiaoyi Lu

Xiaoyi Lu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Performance Databases Machine Learning

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

SplitFT: An Adaptive Federated Split Learning System For LLMs Fine-Tuning

Federated Split Learning has been identified as an efficient approach to address the computational resource constraints of clients in classical federated learning, while guaranteeing data privacy for distributed model training across data owners. However, it faces some critical challenges when such a training strategy meets large language models (LLMs) for fine-tuning. Such challenges include setting the cutlayer adaptively across different clients to address the data and device heterogeneity issues, which affect the system performance significantly. In addition, efficiently reducing the communication overhead during the fine-tuning procedure is also another challenge. No work tries to address these challenges. To bridge this gap, we propose SplitTF, an adaptive federated split learning system for LLMs fine-tuning. SplitFT enables different clients to set different cut layers according to their computation resources and trained model performance. SplitFT also proposes to reduce the LoRA rank in cutlayer to reduce the communication overhead. In addition to simulating the heterogeneous data in real-world applications for our proposed split federated learning system, we propose a length-based Dirichlet approach to divide the training data into different clients. Extensive experimental results show that our proposed approach outperforms the state-of-the-art approach for fine-tuning time efficiency and model performance based on various popular benchmarks.

preprint2022arXiv

Arcadia: A Fast and Reliable Persistent Memory Replicated Log

The performance properties of byte-addressable persistent memory (PMEM) have the potential to significantly improve system performance over a wide spectrum of applications. But persistent memory brings considerable new challenges to the programmer: only 8-byte write atomicity, out of order flush and availability limited by node failure. It's possible to work with the atomicity and ordering constraints of PMEM directly by carefully sequencing the order of store operations and inserting explicit flush and fence operations at each ordering point. But this is tedious and error-prone: too many flush operations defeat the performance benefits of PMEM, and even with generous use, it is difficult to prove that a given program is crash-consistent. Logging is a great abstraction to deal with these issues but prior work on PMEM logging has not successfully hidden the idiosyncrasies of PMEM. Moreover, shortcomings in the log interface and design have prevented attainment of full PMEM performance. We believe that a log design that hides the idiosyncrasies from programmers while delivering full performance is key to success. In this paper, we present the design and implementation of Arcadia, a generic replicated log on PMEM to address these problems. Arcadia handles atomicity, integrity, and replication of log records to reduce programmer burden. Our design has several novel aspects including concurrent log writes with in-order commit, atomicity and integrity primitives for local and remote PMEM writes, and a frequency-based log force policy for providing low overhead persistence with guaranteed bounded loss of uncommitted records. Our evaluation shows that Arcadia outperforms state-of-the-art PMEM logs, such as PMDK's libpmemlog, FLEX, and Query Fresh by several times while providing stronger log record durability guarantees. We expect Arcadia to become the leading off-the-shelf PMEM log design.

preprint2014arXiv

On Big Data Benchmarking

Big data systems address the challenges of capturing, storing, managing, analyzing, and visualizing big data. Within this context, developing benchmarks to evaluate and compare big data systems has become an active topic for both research and industry communities. To date, most of the state-of-the-art big data benchmarks are designed for specific types of systems. Based on our experience, however, we argue that considering the complexity, diversity, and rapid evolution of big data systems, for the sake of fairness, big data benchmarks must include diversity of data and workloads. Given this motivation, in this paper, we first propose the key requirements and challenges in developing big data benchmarks from the perspectives of generating data with 4V properties (i.e. volume, velocity, variety and veracity) of big data, as well as generating tests with comprehensive workloads for big data systems. We then present the methodology on big data benchmarking designed to address these challenges. Next, the state-of-the-art are summarized and compared, following by our vision for future research directions.

preprint2014arXiv

Performance Benefits of DataMPI: A Case Study with BigDataBench

Apache Hadoop and Spark are gaining prominence in Big Data processing and analytics. Both of them are widely deployed on Internet companies. On the other hand, high-performance data analysis requirements are causing academical and industrial communities to adopt state-of-the-art technologies in HPC to solve Big Data problems. Recently, we have proposed a key-value pair based communication library, DataMPI, which is extending MPI to support Hadoop/Spark-like Big Data Computing jobs. In this paper, we use BigDataBench, a Big Data benchmark suite, to do comprehensive studies on performance and resource utilization characterizations of Hadoop, Spark and DataMPI. From our experiments, we observe that the job execution time of DataMPI has up to 55% and 39% speedups compared with those of Hadoop and Spark, respectively. Most of the benefits come from the high-efficiency communication mechanisms in DataMPI. We also notice that the resource (CPU, memory, disk and network I/O) utilizations of DataMPI are also more efficient than those of the other two frameworks.

Xiaoyi Lu

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

SplitFT: An Adaptive Federated Split Learning System For LLMs Fine-Tuning

Arcadia: A Fast and Reliable Persistent Memory Replicated Log

On Big Data Benchmarking

Performance Benefits of DataMPI: A Case Study with BigDataBench