Researcher profile

Yibin Luo

Yibin Luo contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

Efficient Training on Multiple Consumer GPUs with RoundPipe

Fine-tuning Large Language Models (LLMs) on consumer-grade GPUs is highly cost-effective, yet constrained by limited GPU memory and slow PCIe interconnects. Pipeline parallelism combined with CPU offloading mitigates these hardware bottlenecks by reducing communication overhead. However, existing PP schedules suffer from an inherent limitation termed the weight binding issue. Binding uneven model stages (e.g., the LM head is large) to GPUs limits the pipeline's throughput to that of the GPU with the heaviest load, leading to severe pipeline bubbles. In this paper, we propose RoundPipe, a novel pipeline schedule that breaks the weight binding constraint on consumer GPU servers. RoundPipe treats GPUs as a pool of stateless execution workers and dynamically dispatches computation stages across devices in a round-robin manner, achieving a near-zero-bubble pipeline. To ensure training correctness and system efficiency, RoundPipe integrates a priority-aware transfer scheduling engine, a fine-grained distributed event-based synchronization protocol, and an automated layer partitioning algorithm. Evaluations on an 8$\times$ RTX 4090 server demonstrate that RoundPipe achieves 1.48--2.16$\times$ speedups over state-of-the-art baselines when fine-tuning 1.7B to 32B models. Remarkably, RoundPipe enables LoRA fine-tuning of the Qwen3-235B model with 31K sequence length on a single server. RoundPipe is publicly available as an open-source Python library with comprehensive documentation.

preprint2022arXiv

An overdensity of red galaxies around the hyperluminous dust-obscured quasar W1835$+$4355 at $z=2.3$

\emph{Wide-field Infrared Survey Explorer} all-sky survey has discovered a new population of hot dust-obscured galaxies (Hot DOGs), which has been confirmed to be dusty quasars. Previous statistical studies have found significant overdensities of sub-millimeter and mid-IR selected galaxies around Hot DOGs, indicating they may reside in dense regions. Here we present the near-infrared ($J$ and $K_s$ bands) observations over a $7.5&#39;\times 7.5&#39;$ field centered on a Hot DOG W1835$+$4355 at $z \sim 2.3$ using the wide-field infrared camera on the Palomar 200-inch telescope. We use the color criterion $J-K_s>2.3$ for objects with $K_s<20$, to select Distant Red Galaxies (DRGs). We find a significant excess of number density of DRGs in W1835$+$4355 field compared to three control fields, by a factor of about 2. The overdensity of red galaxies around W1835$+$4355 are consistent with the multi-wavelength environment of Hot DOGs, suggesting that Hot DOGs may be a good tracer for dense regions at high redshift. We find that W1835$+$4355 do not reside in the densest region of the dense environment traced by itself. A possible scenario is that W1835$+$4355 is undergoing merging process, which lowers the local number density of galaxies in its surrounding region.