Source author record

Yibin Luo

Yibin Luo appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence astro-ph.GA Distributed, Parallel, and Cluster Computing Machine Learning

Catalog footprint

What is connected

2works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Efficient Training on Multiple Consumer GPUs with RoundPipe

Fine-tuning Large Language Models (LLMs) on consumer-grade GPUs is highly cost-effective, yet constrained by limited GPU memory and slow PCIe interconnects. Pipeline parallelism combined with CPU offloading mitigates these hardware bottlenecks by reducing communication overhead. However, existing PP schedules suffer from an inherent limitation termed the weight binding issue. Binding uneven model stages (e.g., the LM head is large) to GPUs limits the pipeline's throughput to that of the GPU with the heaviest load, leading to severe pipeline bubbles. In this paper, we propose RoundPipe, a novel pipeline schedule that breaks the weight binding constraint on consumer GPU servers. RoundPipe treats GPUs as a pool of stateless execution workers and dynamically dispatches computation stages across devices in a round-robin manner, achieving a near-zero-bubble pipeline. To ensure training correctness and system efficiency, RoundPipe integrates a priority-aware transfer scheduling engine, a fine-grained distributed event-based synchronization protocol, and an automated layer partitioning algorithm. Evaluations on an 8$\times$ RTX 4090 server demonstrate that RoundPipe achieves 1.48--2.16$\times$ speedups over state-of-the-art baselines when fine-tuning 1.7B to 32B models. Remarkably, RoundPipe enables LoRA fine-tuning of the Qwen3-235B model with 31K sequence length on a single server. RoundPipe is publicly available as an open-source Python library with comprehensive documentation.

preprint2022arXiv

An overdensity of red galaxies around the hyperluminous dust-obscured quasar W1835$+$4355 at $z=2.3$

\emph{Wide-field Infrared Survey Explorer} all-sky survey has discovered a new population of hot dust-obscured galaxies (Hot DOGs), which has been confirmed to be dusty quasars. Previous statistical studies have found significant overdensities of sub-millimeter and mid-IR selected galaxies around Hot DOGs, indicating they may reside in dense regions. Here we present the near-infrared ($J$ and $K_s$ bands) observations over a $7.5'\times 7.5'$ field centered on a Hot DOG W1835$+$4355 at $z \sim 2.3$ using the wide-field infrared camera on the Palomar 200-inch telescope. We use the color criterion $J-K_s>2.3$ for objects with $K_s<20$, to select Distant Red Galaxies (DRGs). We find a significant excess of number density of DRGs in W1835$+$4355 field compared to three control fields, by a factor of about 2. The overdensity of red galaxies around W1835$+$4355 are consistent with the multi-wavelength environment of Hot DOGs, suggesting that Hot DOGs may be a good tracer for dense regions at high redshift. We find that W1835$+$4355 do not reside in the densest region of the dense environment traced by itself. A possible scenario is that W1835$+$4355 is undergoing merging process, which lowers the local number density of galaxies in its surrounding region.