Researcher profile

Dejun Luo

Dejun Luo contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving

LLMs are widely adopted in production, pushing inference systems to their limits. Disaggregated LLM serving (e.g., PD separation and KV state disaggregation) improves scalability and cost efficiency, but it also turns KV into an explicit payload crossing network and storage boundaries, making KV a dominant end-to-end bottleneck. Existing KV compression are typically static runtime configurations, despite production service context varies over time in workload mix, bandwidth, and SLO/quality budgets. As a result, a fixed choice can be suboptimal or even increase latency. We present \emph{KVServe}, the first service-aware and adaptive KV communication compression framework for disaggregated LLM serving: KVServe (1) unifies KV compression into a modular strategy space with new components and cross-method recomposition; (2) introduces Bayesian Profiling Engine that efficiently searches this space and distills a 3D Pareto candidate set, reducing $50\times$ offline search overhead; and (3) deploys a Service-Aware Online Controller that combines an analytical latency model with a lightweight bandit to select profiles under constraints and correct offline-to-online mismatch. Integrated into vLLM and evaluated across datasets, models, GPUs and networks, KVServe achieves up to $9.13\times$ JCT speedup in PD-separated serving and up to $32.8\times$ TTFT reduction in KV-disaggregated serving.

preprint2026arXiv

Structure preservation and emergent dissipation in stochastic wave equations with transport noise

We study nonlinear wave equations perturbed by transport noise acting either on the displacement or on the velocity. Such noise models random advection and, under suitable scaling of space covariance, may generate an effective dissipative term. We establish well-posedness in both cases and analyse the associated scaling limits. When the noise acts on the displacement, the system preserves its original structure and converges to the deterministic nonlinear wave equation, whereas if it acts on the velocity, the rescaled dynamics produce an additional Laplacian damping term, leading to a stochastic derivation of a Westervelt-type acoustic model.

preprint2019arXiv

Energy conditional measures and 2D turbulence

We show that the invariant measure of point vortices, when conditioning the Hamiltonian to a finite interval, converges weakly to the enstrophy measure by conditioning the renormalized energy to the same interval. We also prove the existence of solutions to 2D Euler equations having the energy conditional measure as invariant measure. Some heuristic discussions and numerical simulations are presented in the last section.