Source author record

Xinqi Li

Xinqi Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computation and Language cond-mat.mes-hall cond-mat.mtrl-sci Distributed, Parallel, and Cluster Computing Machine Learning

Catalog footprint

What is connected

3works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

PSD: Pushing the Pareto Frontier of Diffusion LLMs via Parallel Speculative Decoding

Diffusion large language models (dLLMs) generate text by iteratively denoising masked token sequences. Although dLLMs can predict all masked positions in parallel within each step, the large number of denoising iterations still makes inference expensive. This cost can be reduced spatially by unmasking multiple tokens per step, or temporally by collapsing multiple denoising steps into one verification call. We propose Parallel Speculative Decoding (PSD), a training-free framework that jointly improves inference along both axes. Using the confidence scores from a single forward pass, PSD selects positions to unmask via a configurable, adaptive unmasking policy and constructs multi-depth speculative drafts without extra model calls. A final batched verification pass then applies hierarchical acceptance, keeping the deepest draft that remains consistent with the updated predictions. Experiments on three dLLMs across reasoning and code generation tasks show that PSD achieves favorable trade-offs between inference efficiency and generation quality, reaching up to $5.5\times$ tokens per forward pass with accuracy comparable to greedy decoding.

preprint2022arXiv

OneFlow: Redesign the Distributed Deep Learning Framework from Scratch

Deep learning frameworks such as TensorFlow and PyTorch provide a productive interface for expressing and training a deep neural network (DNN) model on a single device or using data parallelism. Still, they may not be flexible or efficient enough in training emerging large models on distributed devices, which require more sophisticated parallelism beyond data parallelism. Plugins or wrappers have been developed to strengthen these frameworks for model or pipeline parallelism, but they complicate the usage and implementation of distributed deep learning. Aiming at a simple, neat redesign of distributed deep learning frameworks for various parallelism paradigms, we present OneFlow, a novel distributed training framework based on an SBP (split, broadcast and partial-value) abstraction and the actor model. SBP enables much easier programming of data parallelism and model parallelism than existing frameworks, and the actor model provides a succinct runtime mechanism to manage the complex dependencies imposed by resource constraints, data movement and computation in distributed deep learning. We demonstrate the general applicability and efficiency of OneFlow for training various large DNN models with case studies and extensive experiments. The results show that OneFlow outperforms many well-known customized libraries built on top of the state-of-the-art frameworks. The code of OneFlow is available at: https://github.com/Oneflow-Inc/oneflow.

preprint2016arXiv

Energy gaps of atomically precise armchair graphene nanoribbons

Graphene nanoribbons (GNRs) are one-dimensional (1D) structures that exhibit a rich variety of electronic properties1-17. Therefore, they are predicted to be the building blocks in next-generation nanoelectronic devices. Theoretically, it has been demonstrated that armchair GNRs can be divided into three families, i.e., Na = 3p, Na = 3p + 1, and Na = 3p + 2 (here Na is the number of dimer lines across the ribbon width and p is an integer), according to their electronic structures, and the energy gaps for the three families are quite different even with the same p1,3-6. However, a systematic experimental verification of this fundamental prediction is still lacking, owing to very limited atomic-level control of the width of the armchair GNRs investigated7,9,10,13,17. Here, we studied electronic structures of the armchair GNRs with atomically well-defined widths ranging from Na = 6 to Na = 26 by using scanning tunnelling microscope (STM). Our result demonstrated explicitly that all the studied armchair GNRs exhibit semiconducting gaps due to quantum confinement and, more importantly, the observed gaps as a function of Na are well grouped into the three categories, as predicted by density-functional theory calculations3. Such a result indicated that we can tune the electronic properties of the armchair GNRs dramatically by simply adding or cutting one carbon dimer line along the ribbon width.