Researcher profile

Yifeng Yu

Yifeng Yu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

BubbleSpec: Turning Long-Tail Bubbles into Speculative Rollout Drafts for Synchronous Reinforcement Learning

Reinforcement Learning (RL) has become a cornerstone for improving the performance of Large Language Models (LLMs). However, its rollout phase constitutes a significant efficiency bottleneck, mainly arising from the long-tail bubbles across data parallel ranks, particularly in long-context scenarios where faster GPUs remain idle while waiting for stragglers. Existing solutions, such as partial rollout or asynchronous RL, mitigate these bubbles by compromising the algorithm's strict synchronous nature. Instead, we propose BubbleSpec, a novel framework that accelerates RL rollouts while strictly keeping the mathematical exactness. Instead of attempting to eliminate bubbles, BubbleSpec exploits them. We exploit the idle time windows of faster ranks to pre-generate rollout results for subsequent steps, serving as drafts for speculative decoding. Unlike prior speculative methods that rely on historical epoch similarity and warm-ups, BubbleSpec is agnostic to dataset size and provides immediate acceleration from the onset of training. Extensive evaluations demonstrate that BubbleSpec reduces decoding steps by 50% and increases rollout throughput by up to 1.8x. Critically, BubbleSpec is seamlessly compatible with various RL frameworks and strategies as it sustains the strict synchronous property of RL algorithms.

preprint2026arXiv

Diffusion Models with Heavy-Tailed Targets: Score Estimation and Sampling Guarantees

Score-based diffusion models have become a powerful framework for generative modeling, with score estimation as a central statistical bottleneck. Existing guarantees for score estimation largely focus on light-tailed targets or rely on restrictive assumptions such as compact support, which are often violated by heavy-tailed data in practice. In this work, we study conventional (Gaussian) score-based diffusion models when the target distribution is heavy-tailed and belongs to a Sobolev class with smoothness parameter $β>0$. We consider both exponential and polynomial tail decay, indexed by a tail parameter $γ$. Using kernel density estimation, we derive sharp minimax rates for score estimation, revealing a qualitative dichotomy: under exponential tails, the rate matches the light-tailed case up to polylogarithmic factors, whereas under polynomial tails the rate depends explicitly on $γ$. We further provide sampling guarantees for the associated continuous reverse dynamics. In total variation, the generated distribution converges at the minimax optimal rate $n^{-β/(2β+d)}$ under exponential tails (up to logarithmic factors), and at a $γ$-dependent rate under polynomial tails. Whether the latter sampling rate is minimax optimal remains an open question. These results characterize the statistical limits of score estimation and the resulting sampling accuracy for heavy-tailed targets, extending diffusion theory beyond the light-tailed setting.

preprint2026arXiv

InfoLaw: Information Scaling Laws for Large Language Models with Quality-Weighted Mixture Data and Repetition

Upweighting high-quality data in LLM pretraining often improves performance, but in datalimited regimes, especially under overtraining, stronger upweighting increases repetition and can degrade performance. However, standard scaling laws do not reliably extrapolate across mixture recipes or under repetitions, making the selection for optimal data recipes at scaling underdetermined. To solve this, we introduce InfoLaw (Information Scaling Laws), a data-aware scaling framework that predicts loss from consumed tokens, model size, data mixture weights, and repetition. The key idea is to model pretraining as information accumulation, where quality controls information density and repetition induces scaledependent diminishing returns. We first collect the model performance after training on datasets that vary in scale, quality distribution, and repetition level. Then we build up the modeling for information so that information accurately predicts those model performance. InfoLaw predicts performance on unseen data recipes and larger scale runs (up to 7B, 425B tokens) with 0.15% mean and 0.96% max absolute error in loss, and it extrapolates reliably across overtraining levels, enabling efficient data-recipe selection under varying compute budgets.

preprint2026arXiv

On the Limits of Latent Reuse in Diffusion Models

Diffusion models are often trained in low-dimensional latent spaces, which are then reused for related but shifted datasets. In this work, we study when such latent reuse remains reliable under distribution shift. We consider a source-target setting in which both datasets are approximately low-dimensional but may lie near different subspaces. We show that freezing and reusing a source latent space induces a target-domain score error governed by two quantities: the principal-angle misalignment between the source and target subspaces, and the target ambient noise amplified by the diffusion time scale. Motivated by these limits, we further study mixed source-target training and characterize how the required shared latent dimension depends on the relative geometry of the two distributions. Our results provide theoretical guidance on when latent reuse is reliable and when learning a shared representation may be necessary.

preprint2022arXiv

Differentiability of effective fronts in the continuous setting in two dimensions

We study the effective front associated with first-order front propagations in two dimensions ($n=2$) in the periodic setting with continuous coefficients. Our main result says that that the boundary of the effective front is differentiable at every irrational point. Equivalently, the stable norm associated with a continuous $\mathbb{Z}^2$-periodic Riemannian metric is differentiable at irrational points. This conclusion was obtained decades ago for smooth metrics ([3,5]). To the best of our knowledge, our result provides the first nontrivial property of the effective fronts in the continuous setting, which is the standard assumption in the PDE theory. Combining with the sufficiency result in [12], our result implies that for continuous coefficients, a polygon could be an effective front if and only if it is centrally symmetric with rational vertices and nonempty interior.

preprint2022arXiv

Optimal convergence rate for periodic homogenization of convex Hamilton-Jacobi equations

In this paper, we show that the rate of convergence in periodic homogenization of convex Hamilton-Jacobi equations is always $O(\varepsilon)$, which is optimal. This is a natural extension of a result concerning stable norms in metric geometry [4] that is essentially equivalent to the homogenization of convex static Hamilton-Jacobi equations. Another extremely interesting question in this direction is whether the $O(\varepsilon)$ rate holds in the nonconvex setting. We present a special nonconvex example with $O(\varepsilon)$ convergence rate, which relies on identifying the shape of the effective Hamiltonian and game theory interpretation formulas.

preprint2022arXiv

Remarks on optimal rates of convergence in periodic homogenization of linear elliptic equations in non-divergence form

We study and characterize the optimal rates of convergence in periodic homogenization of linear elliptic equations in non-divergence form. We obtain that the optimal rate of convergence is either $O(\varepsilon)$ or $O(\varepsilon^2)$ depending on the diffusion matrix $A$, source term $f$, and boundary data $g$. Moreover, we show that the set of diffusion matrices $A$ that give optimal rate $O(\varepsilon)$ is open and dense in the set of $C^{2,α}$ periodic, symmetric, and positive definite matrices, which means that generically, the optimal rate is $O(\varepsilon)$.