Researcher profile

Zhongyi Huang

Zhongyi Huang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

SPECTRE: Hybrid Ordinary-Parallel Speculative Serving for Resource-Efficient LLM Inference

LLM serving platforms are increasingly deployed as multi-model cloud systems, where user demand is often long-tailed: a few popular large models receive most requests, while many smaller tail models remain underutilized. We propose \textbf{SPECTRE} (Parallel \textbf{SPEC}ulative Decoding with a Multi-\textbf{T}enant \textbf{RE}mote Drafter), a serving framework that reuses underutilized tail-model services as remote drafters for heavily loaded large-model services through speculative decoding. SPECTRE enables draft generation and target-side verification to run in parallel, and makes such parallelism effective through three techniques: a hybrid ordinary-parallel speculative decoding strategy guided by a threshold derived from throughput analysis, speculative priority scheduling to preserve draft--target overlap under multi-tenant traffic, and draft-side prompt compression to reduce draft latency. We implement SPECTRE in \texttt{SGLang} and evaluate it across multiple draft--target model pairs, reasoning benchmarks, real-world long-context workloads, and a wide range of batch sizes. Results show that SPECTRE consistently improves large-model serving throughput while causing only minor interference to the native workloads of tail-model services. In large-model deployments, including Qwen3-235B-A22B with TP=8, SPECTRE achieves up to \textbf{2.28$\times$ speedup} over autoregressive decoding and up to an additional \textbf{66\% relative improvement} over the strongest speculative decoding baselines. Talk is cheap, we show you the code: https://github.com/sgl-project/sglang/pull/22272.

preprint2022arXiv

A Uniform Convergent Petrov-Galerkin method for a Class of Turning Point Problems

In this paper, we propose a numerical method for turning point problems in one dimension based on Petrov-Galerkin finite element method (PGFEM). We first give a priori estimate for the turning point problem with a single boundary turning point. Then we use PGFEM to solve it, where test functions are the solutions to piecewise approximate dual problems. We prove that our method has a first-order convergence rate in both $L^\infty$ norm and an energy norm when we select the exact solutions to dual problems as test functions. Numerical results show that our scheme is efficient for turning point problems with different types of singularities, and the convergency coincides with our theoretical results.

preprint2022arXiv

Two New Piggybacking Designs with Lower Repair Bandwidth

Piggybacking codes are a special class of MDS array codes that can achieve small repair bandwidth with small sub-packetization by first creating some instances of an $(n,k)$ MDS code, such as a Reed-Solomon (RS) code, and then designing the piggyback function. In this paper, we propose a new piggybacking coding design which designs the piggyback function over some instances of both $(n,k)$ MDS code and $(n,k&#39;)$ MDS code, when $k\geq k&#39;$. We show that our new piggybacking design can significantly reduce the repair bandwidth for single-node failures. When $k=k&#39;$, we design piggybacking code that is MDS code and we show that the designed code has lower repair bandwidth for single-node failures than all existing piggybacking codes when the number of parity node $r=n-k\geq8$ and the sub-packetization $α<r$. Moreover, we propose another piggybacking codes by designing $n$ piggyback functions of some instances of $(n,k)$ MDS code and adding the $n$ piggyback functions into the $n$ newly created empty entries with no data symbols. We show that our code can significantly reduce repair bandwidth for single-node failures at a cost of slightly more storage overhead. In addition, we show that our code can recover any $r+1$ node failures for some parameters. We also show that our code has lower repair bandwidth than locally repairable codes (LRCs) under the same fault-tolerance and redundancy for some parameters.

preprint2021arXiv

Investigating the effect of expected travel distance on individual descent speed in the stairwell with super long distance

Currently, there is an increasing number of super high-rise buildings in urban cities, the issue of evacuation in emergencies from such buildings comes to the fore. An evacuation experiment was carried out by our group in Shanghai Tower, it was found that the evacuation speed of pedestrians evacuated from the 126th floor was always slower than that of those from the 117th floor. Therefore, we propose a hypothesis that the expected evacuation distance will affect pedestrians&#39; movement speed. In order to verify our conjecture, we conduct an experiment in a 12-story office building, that is, to study whether there would be an influence and what kind of influence would be caused on speed by setting the evacuation distance for participants in advance. According to the results, we find that with the increase of expected evacuation distance, the movement speed of pedestrians will decrease, which confirms our hypothesis. At the same time, we give the relation between the increase rate of evacuation distance and the decrease rate of speed. It also can be found that with the increase of expected evacuation distance, the speed decrease rate of the male is greater than that for female. In addition, we study the effects of actual evacuation distance, gender, BMI on evacuation speed. Finally, we obtain the correlation between heart rate and speed during evacuation. The results in this paper are beneficial to the study of pedestrian evacuation in super high-rise buildings.

preprint2020arXiv

An iterative splitting method for pricing European options under the Heston model

In this paper, we propose an iterative splitting method to solve the partial differential equations in option pricing problems. We focus on the Heston stochastic volatility model and the derived two-dimensional partial differential equation (PDE). We take the European option as an example and conduct numerical experiments using different boundary conditions. The iterative splitting method transforms the two-dimensional equation into two quasi one-dimensional equations with the variable on the other dimension fixed, which helps to lower the computational cost. Numerical results show that the iterative splitting method together with an artificial boundary condition (ABC) based on the method by Li and Huang (2019) gives the most accurate option price and Greeks compared to the classic finite difference method with the commonly-used boundary conditions in Heston (1993).

preprint2020arXiv

How many infections of COVID-19 there will be in the &#34;Diamond Princess&#34;-Predicted by a virus transmission model based on the simulation of crowd flow

Objectives: Simulate the transmission process of COVID-19 in a cruise ship, and then to judge how many infections there will be in the 3711 people in the &#34;Diamond Princess&#34; and analyze measures that could have prevented mass transmission. Methods: Based on the crowd flow model, the virus transmission rule between pedestrians is established, to simulate the spread of the virus caused by the close contact during pedestrians&#39; daily activities on the cruise ship. Measurements and main results: Three types of simulation scenarios are designed, the Basic scenario focus on the process of virus transmission caused by a virus carrier and the effect of the personal protective measure against the virus. The condition that the original virus carriers had disembarked halfway and more and more people strengthen self-protection are considered in the Self-protection scenario, which would comparatively accord with the actual situation of &#34;Diamond princess&#34; cruise. Control scenario are set to simulate the effect of taking recommended or mandatory measures on virus transmission Conclusions: There are 850~1009 persons (with large probability) who have been infected with COVID-19 during the voyage of &#34;Diamond Princess&#34;. The crowd infection percentage would be controlled effectively if the recommended or mandatory measures can be taken immediately during the alert phase of COVID-19 outbreaks.