Researcher profile

Rui Qu

Rui Qu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

Agent.xpu: Efficient Scheduling of Agentic LLM Workloads on Heterogeneous SoC

Personal LLM agents increasingly combine foreground reactive interactions with background proactive monitoring, forming long-lived, stateful LLM flows that interleave prefill and token-by-token decode. While modern heterogeneous SoCs integrate CPUs, iGPUs, and NPUs to support on-device intelligence, existing LLM engines assume static, single-shot inference and lack mechanisms for flow-level concurrency, prioritization, and efficient accelerator coordination. As a result, commodity SoCs remain poorly matched to the dynamic, mixed-criticality execution patterns of personal agents. This paper presents Agent$.$xpu, the first LLM engine that orchestrates concurrent reactive and proactive LLM flows on commodity SoCs. Extensive profiling uncovers unique SoC characteristics of operator-accelerator affinity, asymmetric DDR contention, and stage-divergent batching behaviors distinct from cloud-serving assumptions. Agent$.$xpu introduces three key techniques: a heterogeneous execution graph (HEG) capturing NPU/iGPU affinity and elastic operator binding; flow-aware NPU-iGPU coordination with stage elasticity, decoupling prefill and decode to reduce bandwidth contention and enforce priorities; and fine-grained preemption with slack-aware piggybacking to guarantee reactive responsiveness without starving proactive work. Across realistic personal-agent workloads, Agent$.$xpu delivers 1.2-4.9$\times$ proactive throughput and reduces reactive latency by at least 91%, compared with both industrial iGPU-only serving engine and NPU-iGPU static inference with optimal tensor-partitioning schemes. Agent$.$xpu also minimizes energy consumption and graphics interference via controlled iGPU usage.

preprint2022arXiv

Retrieving High-Dimensional Quantum Steering From a Noisy Environment with N Measurement Settings

One of the most often implied benefits of high-dimensional (HD) quantum systems is to lead to stronger forms of correlations, featuring increased robustness to noise. Here, we experimentally demonstrate the $n$-setting linear HD quantum steering criterion. We verify the large violation of the steering inequalities without full-state tomography. The lower bound of the violation is $2.24\pm0.01$ in 11 dimensions, exceeding the bound ($V<2$) of 2-setting criteria. Hence, a higher strength of steering has been revealed. Moreover, we demonstrate the method for enhancing the noise robustness without increasing dimension, alternatively, by increasing measurement settings. Using the entanglement in 11 dimensions, we experimentally retrieve steering nonlocality with $63.4\pm1.4\%$ isotropic noise fraction, surpassing the $50\%$ limitation of 2-setting criteria. Our work offers the potential for practical one-sided device-independent quantum information processing that tolerates the noisy environment, lossy detection, and transcends the present transmission distance limitation.