Researcher profile

Youbang Sun

Youbang Sun contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2026arXiv

Post-Trained MoE Can Skip Half Experts via Self-Distillation

Mixture-of-Experts (MoE) scales language models efficiently through sparse expert activation, and its dynamic variant further reduces computation by adjusting the activated experts in an input-dependent manner. Existing dynamic MoE methods usually rely on pre-training from scratch or task-specific adaptation, leaving the practical conversion of fully trained MoE underexplored. Enabling such adaptation would directly alleviate the inference costs by allowing easy tokens to bypass unnecessary expert during serving. This paper introduces Zero-Expert Self-Distillation Adaptation (ZEDA), a low-cost framework that transforms post-trained static MoE models into efficient dynamic ones. To stabilize this architectural conversion, ZEDA injects parameter-free zero-output experts into each MoE layer and adapts the augmented model through two-stage self-distillation, utilizing the original MoE as a frozen teacher and applying a group-level balancing loss. On Qwen3-30B-A3B and GLM-4.7-Flash across 11 benchmarks spanning math, code, and instruction following, ZEDA eliminates over 50% of expert FLOPs at marginal accuracy loss. It outperforms the strongest dynamic MoE baseline by 6.1 and 4.0 points on the two models, and delivers ~1.20$\times$ end-to-end inference speedup.

preprint2022arXiv

On Centralized and Distributed Mirror Descent: Convergence Analysis Using Quadratic Constraints

Mirror descent (MD) is a powerful first-order optimization technique that subsumes several optimization algorithms including gradient descent (GD). In this work, we develop a semi-definite programming (SDP) framework to analyze the convergence rate of MD in centralized and distributed settings under both strongly convex and non-strongly convex assumptions. We view MD with a dynamical system lens and leverage quadratic constraints (QCs) to provide explicit convergence rates based on Lyapunov stability. For centralized MD under strongly convex assumption, we develop a SDP that certifies exponential convergence rates. We prove that the SDP always has a feasible solution that recovers the optimal GD rate as a special case. We complement our analysis by providing the $O(1/k)$ convergence rate for convex problems. Next, we analyze the convergence of distributed MD and characterize the rate using SDP. To the best of our knowledge, the numerical rate of distributed MD has not been previously reported in the literature. We further prove an $O(1/k)$ convergence rate for distributed MD in the convex setting. Our numerical experiments on strongly convex problems indicate that our framework certifies superior convergence rates compared to the existing rates for distributed GD.

preprint2020arXiv

Distributed Mirror Descent with Integral Feedback: Asymptotic Convergence Analysis of Continuous-time Dynamics

This work addresses distributed optimization, where a network of agents wants to minimize a global strongly convex objective function. The global function can be written as a sum of local convex functions, each of which is associated with an agent. We propose a continuous-time distributed mirror descent algorithm that uses purely local information to converge to the global optimum. Unlike previous work on distributed mirror descent, we incorporate an integral feedback in the update, allowing the algorithm to converge with a constant step-size when discretized. We establish the asymptotic convergence of the algorithm using Lyapunov stability analysis. We further illustrate numerical experiments that verify the advantage of adopting integral feedback for improving the convergence rate of distributed mirror descent.