Source author record

Rui Qu

Rui Qu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Machine Learning quant-ph

Catalog footprint

What is connected

2works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Agent.xpu: Efficient Scheduling of Agentic LLM Workloads on Heterogeneous SoC

Personal LLM agents increasingly combine foreground reactive interactions with background proactive monitoring, forming long-lived, stateful LLM flows that interleave prefill and token-by-token decode. While modern heterogeneous SoCs integrate CPUs, iGPUs, and NPUs to support on-device intelligence, existing LLM engines assume static, single-shot inference and lack mechanisms for flow-level concurrency, prioritization, and efficient accelerator coordination. As a result, commodity SoCs remain poorly matched to the dynamic, mixed-criticality execution patterns of personal agents. This paper presents Agent$.$xpu, the first LLM engine that orchestrates concurrent reactive and proactive LLM flows on commodity SoCs. Extensive profiling uncovers unique SoC characteristics of operator-accelerator affinity, asymmetric DDR contention, and stage-divergent batching behaviors distinct from cloud-serving assumptions. Agent$.$xpu introduces three key techniques: a heterogeneous execution graph (HEG) capturing NPU/iGPU affinity and elastic operator binding; flow-aware NPU-iGPU coordination with stage elasticity, decoupling prefill and decode to reduce bandwidth contention and enforce priorities; and fine-grained preemption with slack-aware piggybacking to guarantee reactive responsiveness without starving proactive work. Across realistic personal-agent workloads, Agent$.$xpu delivers 1.2-4.9$\times$ proactive throughput and reduces reactive latency by at least 91%, compared with both industrial iGPU-only serving engine and NPU-iGPU static inference with optimal tensor-partitioning schemes. Agent$.$xpu also minimizes energy consumption and graphics interference via controlled iGPU usage.

preprint2022arXiv

Retrieving High-Dimensional Quantum Steering From a Noisy Environment with N Measurement Settings

One of the most often implied benefits of high-dimensional (HD) quantum systems is to lead to stronger forms of correlations, featuring increased robustness to noise. Here, we experimentally demonstrate the $n$-setting linear HD quantum steering criterion. We verify the large violation of the steering inequalities without full-state tomography. The lower bound of the violation is $2.24\pm0.01$ in 11 dimensions, exceeding the bound ($V<2$) of 2-setting criteria. Hence, a higher strength of steering has been revealed. Moreover, we demonstrate the method for enhancing the noise robustness without increasing dimension, alternatively, by increasing measurement settings. Using the entanglement in 11 dimensions, we experimentally retrieve steering nonlocality with $63.4\pm1.4\%$ isotropic noise fraction, surpassing the $50\%$ limitation of 2-setting criteria. Our work offers the potential for practical one-sided device-independent quantum information processing that tolerates the noisy environment, lossy detection, and transcends the present transmission distance limitation.