Researcher profile

Feiyu Zhang

Feiyu Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

SPECTRE: Hybrid Ordinary-Parallel Speculative Serving for Resource-Efficient LLM Inference

LLM serving platforms are increasingly deployed as multi-model cloud systems, where user demand is often long-tailed: a few popular large models receive most requests, while many smaller tail models remain underutilized. We propose \textbf{SPECTRE} (Parallel \textbf{SPEC}ulative Decoding with a Multi-\textbf{T}enant \textbf{RE}mote Drafter), a serving framework that reuses underutilized tail-model services as remote drafters for heavily loaded large-model services through speculative decoding. SPECTRE enables draft generation and target-side verification to run in parallel, and makes such parallelism effective through three techniques: a hybrid ordinary-parallel speculative decoding strategy guided by a threshold derived from throughput analysis, speculative priority scheduling to preserve draft--target overlap under multi-tenant traffic, and draft-side prompt compression to reduce draft latency. We implement SPECTRE in \texttt{SGLang} and evaluate it across multiple draft--target model pairs, reasoning benchmarks, real-world long-context workloads, and a wide range of batch sizes. Results show that SPECTRE consistently improves large-model serving throughput while causing only minor interference to the native workloads of tail-model services. In large-model deployments, including Qwen3-235B-A22B with TP=8, SPECTRE achieves up to \textbf{2.28$\times$ speedup} over autoregressive decoding and up to an additional \textbf{66\% relative improvement} over the strongest speculative decoding baselines. Talk is cheap, we show you the code: https://github.com/sgl-project/sglang/pull/22272.

preprint2022arXiv

On sampling Kaczmarz-Motzkin methods for solving large-scale nonlinear systems

In this paper, for solving large-scale nonlinear equations we propose a nonlinear sampling Kaczmarz-Motzkin (NSKM) method. Based on the local tangential cone condition and the Jensen's inequality, we prove convergence of our method with two different assumptions. Then, for solving nonlinear equations with the convex constraints we present two variants of the NSKM method: the projected sampling Kaczmarz-Motzkin (PSKM) method and the accelerated projected sampling Kaczmarz-Motzkin (APSKM) method. With the use of the nonexpansive property of the projection and the convergence of the NSKM method, the convergence analysis is obtained. Numerical results show that the NSKM method with the sample of the suitable size outperforms the nonlinear randomized Kaczmarz (NRK) method in terms of calculation times. The APSKM and PSKM methods are practical and promising for the constrained nonlinear problem.