Researcher profile

Ke Zhang

Ke Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

Fast Collaborative Inference via Distributed Speculative Decoding

Speculative decoding accelerates large language model (LLM) inference by allowing a small draft model to predict multiple future tokens for verification by a larger target model. In AI-native radio access networks (AI-RAN), this enables device-edge collaborative inference but introduces significant uplink overhead, as existing distributed speculative decoding schemes transmit full vocabulary logits at every step. We propose a sparsify-then-sample strategy, Truncated Sparse Logits Transmission (TSLT), which transmits only the logits and indices of a truncated candidate set. We provide theoretical guarantees showing that the acceptance rate is preserved under TSLT. TSLT is further extended to multi-candidate case, where multiple draft candidates per step increase acceptance probability. Experiments show that TSLT significantly reduces uplink communication while maintaining end-to-end inference latency and model quality, demonstrating its effectiveness for scalable, communication-efficient distributed LLM inference in future AI-RAN systems.

preprint2026arXiv

Multi-Agent Collaborative Reward Design for Enhancing Reasoning in Reinforcement Learning

We present CRM (Multi-Agent Collaborative Reward Model), a framework that replaces a single black-box reward model with a coordinated team of specialist evaluators to improve robustness and interpretability in RLHF. Conventional reward models struggle to jointly optimize multiple, sometimes conflicting, preference dimensions (e.g., factuality, helpfulness, safety) and offer limited transparency into why a score is assigned. CRM addresses these issues by decomposing preference evaluation into domain-specific agents that each produce partial signals, alongside global evaluators such as ranker-based and embedding-similarity rewards. A centralized aggregator fuses these signals at each timestep, balancing factors like step-wise correctness, multi-agent agreement, and repetition penalties, yielding a single training reward compatible with standard RL pipelines. The policy is optimized with advantage-based updates (e.g., GAE), while a value model regresses to the aggregated reward, enabling multi-perspective reward shaping without requiring additional human annotations beyond those used to train the evaluators. To support training and assessment, we introduce rewardBench, a benchmark and training suite aligned with the collaborative structure of CRM. Together, CRM and rewardBench provide a practical, modular path to more transparent reward modeling and more stable optimization.

preprint2026arXiv

Multi-Scale Generative Modeling with Heat Dissipation Flow Matching

Diffusion models are widely used in image generation, with most relying on noise-based corruption and denoising. A distinct branch instead uses blur as the main corruption, preserving better color budgets and multi-scale detail by providing multi-scale priors. However, blur-based models remain in SDE-based frameworks and are not integrated into ODE-based frameworks, such as Flow Matching (FM). Meanwhile, in the blur-based formulation, the classical inverse heat-dissipation (IHD) process faces an ill-posed challenge. Moreover, under the data-manifold assumption, regressing blurred images from high-dimensional noise (or velocity) space is also difficult. We propose Heat Dissipation Flow Matching (HDFM), which introduces a continuous blurred (heat-dissipation) process into FM to inject multi-scale priors. HDFM aligns an interpolated heat-dissipation path to address ill-posedness and adopts $x$-prediction to mitigate high-dimensional regression difficulty. Toy experiments and ablation studies show that HDFM consistently benefits from both blur and $x$-prediction. The performance of HDFM outperforms most baseline methods on all datasets.

preprint2026arXiv

Protoplanetary disk cavities with JWST-MIRI: a dichotomy in molecular emission

The evolution of planet-forming regions in protoplanetary disks is of fundamental importance to understanding planet formation. Disks with a central deficit in dust emission, a "cavity", have long attracted interest as potential evidence for advanced disk clearing by protoplanets and/or winds. Before JWST, infrared spectra showed that these disks typically lack the strong molecular emission observed in full disks. In this work, we combine a sample of 12 disks with millimeter cavities of a range of sizes ($\sim2$-70 au) and different levels of millimeter and infrared continuum deficits. We analyze their molecular spectra as observed with MIRI on JWST, homogeneously reduced with the new JDISCS pipeline. This analysis demonstrates a stark dichotomy in molecular emission where "molecule-rich" (MR) cavities follow global trends between water, CO, and OH luminosity and accretion luminosity as in full disks, while "molecule-poor" (MP) cavities are significantly sub-luminous in all molecules except sometimes OH. Disk cavities generally show sub-luminous organic emission, higher OH/H$_2$O ratios, and suggest a lower water column density. The sub-thermal excitation of CO and water vibrational lines suggests a decreased gas density in the emitting layer in all cavities, supporting model expectations for C$_2$H$_2$ photodissociation. We discover a bifurcation in infrared index (lower in MR cavities) suggesting that the molecular dichotomy is linked to residual $μ$m-size dust within millimeter disk cavities. Put together, these results suggest a feedback process between dust depletion, gas density decrease, and molecule dissociation. Disk cavities may have a common evolutionary sequence where MR switch into MP over time.

preprint2026arXiv

Tracing Pebble Drift History in Two Protoplanetary Disks with CO Enhancement

Pebble drift is an important mechanism for supplying the materials needed to build planets in the inner region of protoplanetary disks. Thus, constraining pebble drift's timescales and mass flux is essential to understanding planet formation history. Current pebble drift models suggest pebble fluxes can be constrained from the enhancement of gaseous volatile abundances when icy pebbles sublimate after drifting across key snowlines. In this work, we present ALMA observations of spatially resolved $^{13}$C$^{18}$O J=2-1 line emission inside the midplane CO snowline of the HD 163296 and MWC 480 protoplanetary disks. We use radiative transfer and thermochemical models to constrain the spatial distribution of CO gas column density. We find that both disks display centrally peaked CO abundance enhancement of up to ten times of ISM abundance levels. For HD 163296 and MWC 480, the inferred enhancements require 250-350 and 480-660 Earth Masses of pebbles to have drifted across their CO snowlines, respectively. These ranges fall within cumulative pebble mass flux ranges to grow gas giants in the interior to the CO snowline. The centrally peaked CO enhancement is unexpected in current pebble drift models, which predict CO enhancement peaks at the CO snowline or is uniform inside the snowline. We propose two hypotheses to explain the centrally-peaked CO enhancement, including a large CO desorption distance and CO trapped in water ice. By testing both hypotheses with the 1D gas and dust evolution code chemcomp, we find that volatile trapping (about 30\%) best reproduces the centrally peaked CO enhancement observed.

preprint2024arXiv

DFabric: Scaling Out Data Parallel Applications with CXL-Ethernet Hybrid Interconnects

Emerging interconnects, such as CXL and NVLink, have been integrated into the intra-host topology to scale more accelerators and facilitate efficient communication between them, such as GPUs. To keep pace with the accelerator's growing computing throughput, the interconnect has seen substantial enhancement in link bandwidth, e.g., 256GBps for CXL 3.0 links, which surpasses Ethernet and InfiniBand network links by an order of magnitude or more. Consequently, when data-intensive jobs, such as LLM training, scale across multiple hosts beyond the reach limit of the interconnect, the performance is significantly hindered by the limiting bandwidth of the network infrastructure. We address the problem by proposing DFabric, a two-tier interconnect architecture. We address the problem by proposing DFabric, a two-tier interconnect architecture. First, DFabric disaggregates rack's computing units with an interconnect fabric, i.e., CXL fabric, which scales at rack-level, so that they can enjoy intra-rack efficient interconnecting. Second, DFabric disaggregates NICs from hosts, and consolidates them to form a NIC pool with CXL fabric. By providing sufficient aggregated capacity comparable to interconnect bandwidth, the NIC pool bridges efficient communication across racks or beyond the reach limit of interconnect fabric. However, the local memory accessing becomes the bottleneck when enabling each host to utilize the NIC pool efficiently. To the end, DFabric builds a memory pool with sufficient bandwidth by disaggregating host local memory and adding more memory devices. We have implemented a prototype of DFabric that can run applications transparently. We validated its performance gain by running various microbenchmarks and compute-intensive applications such as DNN and graph.