Researcher profile

Yanqi Zhang

Yanqi Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

Unifying Sparse Attention with Hierarchical Memory for Scalable Long-Context LLM Serving

Long-context LLM serving is bottlenecked by the cost of attending over ever-growing KV caches. Dynamic sparse attention promises relief by accessing only a small, query-dependent subset of the KV state per decoding step and extending the KV storage to CPU memory. In practice, however, these algorithmic savings rarely translate into end-to-end system-level gains because sparse methods typically operate at different granularities and thus rely on ad hoc, per-algorithm implementations. At the same time, hierarchical KV storage introduces a new systems bottleneck: retrieving fine-grained, irregular KV subsets across the GPU-CPU boundary can easily erase the benefits of sparsity. We present SPIN, a sparse-attention-aware inference framework that co-designs the execution pipeline with hierarchical KV storage through three techniques: (1) a unified partition abstraction that maps different sparsity granularities onto a shared page-based KV substrate; (2) a locality-aware KV cache manager that dynamically sizes per-request HBM budgets and uses a GPU-friendly bucketed LRU policy to cut PCIe round-trips; and (3) a two-level hierarchical metadata layout sized to the active working set rather than the worst-case address space. Built on vLLM with three representative sparse attention algorithms, SPIN delivers 1.66-5.66x higher end-to-end throughput and 7-9x lower TTFT than vLLM, and reduces TPOT by up to 58% over the original sparse-attention implementations.

preprint2024arXiv

Analytically-Driven Resource Management for Cloud-Native Microservices

Resource management for cloud-native microservices has attracted a lot of recent attention. Previous work has shown that machine learning (ML)-driven approaches outperform traditional techniques, such as autoscaling, in terms of both SLA maintenance and resource efficiency. However, ML-driven approaches also face challenges including lengthy data collection processes and limited scalability. We present Ursa, a lightweight resource management system for cloud-native microservices that addresses these challenges. Ursa uses an analytical model that decomposes the end-to-end SLA into per-service SLA, and maps per-service SLA to individual resource allocations per microservice tier. To speed up the exploration process and avoid prolonged SLA violations, Ursa explores each microservice individually, and swiftly stops exploration if latency exceeds its SLA. We evaluate Ursa on a set of representative and end-to-end microservice topologies, including a social network, media service and video processing pipeline, each consisting of multiple classes and priorities of requests with different SLAs, and compare it against two representative ML-driven systems, Sinan and Firm. Compared to these ML-driven approaches, Ursa provides significant advantages: It shortens the data collection process by more than 128x, and its control plane is 43x faster than ML-driven approaches. At the same time, Ursa does not sacrifice resource efficiency or SLAs. During online deployment, Ursa reduces the SLA violation rate by 9.0% up to 49.9%, and reduces CPU allocation by up to 86.2% compared to ML-driven approaches.

preprint2022arXiv

Fiber-based two-wavelength heterodyne laser interferometer

Displacement measuring interferometry is a crucial component in metrology applications. In this paper, we propose a fiber-based two-wavelength heterodyne interferometer as a compact and highly sensitive displacement sensor that can be used in inertial sensing applications. In the proposed design, two individual heterodyne interferometers are constructed using two different wavelengths, 1064 nm and 1055 nm; one of which measures the target displacement and the other monitors the common-mode noise in the fiber system. A narrow-bandwidth spectral filter separates the beam paths of the two interferometers, which are highly common and provide a high rejection ratio to the environmental noise. The preliminary test shows a sensitivity floor of 7.5pm/rtHz at 1Hz when tested in an enclosed chamber. We also investigated the effects of periodic errors due to imperfect spectral separation on the displacement measurement and propose algorithms to mitigate these effects.

preprint2022arXiv

Investigation and mitigation of noise contributions in a compact heterodyne interferometer

We present a noise estimation and subtraction algorithm capable of increasing the sensitivity of heterodyne laser interferometers by one order of magnitude. The heterodyne interferometer is specially designed for dynamic measurements of a test mass in the application of sub-Hz inertial sensing. A noise floor of 3.31E-11 m/rtHz at 100mHz is achieved after applying our noise subtraction algorithm to a benchtop prototype interferometer that showed a noise level of 2.76E-10 m/rtHz at 100mHz when tested in vacuum at levels of 3E-5 Torr. Based on the previous results, we investigated noise estimation and subtraction techniques of non-linear optical pathlength noise, laser frequency noise, and temperature fluctuations in heterodyne laser interferometers. For each noise source, we identified its contribution and removed it from the measurement by linear fitting or a spectral analysis algorithm. The noise correction algorithm we present in this article can be generally applied to heterodyne laser interferometers.

preprint2022arXiv

Optomechanical accelerometers for geodesy

We present a novel optomechanical inertial sensor for low frequency applications and corresponding acceleration measurements. This sensor has a resonant frequency of 4.7Hz, a mechanical quality factor of 476k, a test mass of 2.6 gram, and a projected noise floor of approximately 5E-11 m s-2. per root-Hz at 1Hz. Such performance, together with its small size, low weight, reduced power consumption, and low susceptibility to environmental variables such as magnetic field or drag conditions makes it an attractive technology for future geodesy missions. In this paper, we present an experimental demonstration of low-frequency ground seismic noise detection by direct comparison with a commercial seismometer, anda data analysis algorithms for the identification, characterization, and correction of several noise sources.

preprint2020arXiv

A compact high-precision periodic-error-free heterodyne interferometer

We present the design, bench-top setup, and experimental results of a compact heterodyne interferometer that achieves picometer-level displacement sensitivities in air over frequencies above 100 mHz. The optical configuration with spatially separated beams prevents frequency and polarization mixing, and therefore eliminates periodic errors. The interferometer is designed to maximize common-mode optical laser beam paths to obtain high rejection of environmental disturbances, such as temperature fluctuations and acoustics. The results of our experiments demonstrate the short- and long-term stabilities of the system during stationary and dynamic measurements. In addition, we provide measurements that compare our interferometer prototype with a commercial system, verifying our higher sensitivity of 3\,pm, higher thermal stability by a factor of two, and periodic-error-free performance.