Researcher profile

Minrui Xu

Minrui Xu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

Equipping LLMs with tool-use capabilities via Agentic Reinforcement Learning (Agentic RL) is bottlenecked by two challenges: the lack of scalable, robust execution environments and the scarcity of realistic training data that captures implicit human reasoning. Existing approaches depend on costly real-world APIs, hallucination-prone LLM simulators, or synthetic environments that are often single-turn or depend on pre-collected documents. Moreover, synthetic trajectories are frequently over-specified, resembling instruction sequences rather than natural human intents, reducing their effectiveness for RL training. We introduce EnvFactory, a fully automated framework that addresses both challenges. EnvFactory autonomously explores and verifies stateful, executable tool environments from authentic resources, and synthesizes natural multi-turn trajectories through topology-aware sampling and calibrated refinement, producing grounded queries with implicit intents. Using only 85 verified environments across 7 domains, EnvFactory generates 2,575 SFT and RL trajectories. Despite using significantly fewer environments than prior work, which are often 5 times more, EnvFactory achieves superior training efficiency and downstream performance, improving Qwen3-series models by up to +15% on BFCLv3, +8.6% on MCP-Atlas, and +6% on conversational benchmarks including $τ^2$-Bench and VitaBench. By fully automating both environment construction and trajectory synthesis, EnvFactory provides a scalable, extensible, and robust foundation for Agentic RL.

preprint2026arXiv

Prune-OPD: Efficient and Reliable On-Policy Distillation for Long-Horizon Reasoning

On-policy distillation (OPD) leverages dense teacher rewards to enhance reasoning models. However, scaling OPD to long-horizon tasks exposes a critical flaw: as the student's generated prefix inevitably diverges from the teacher's thought process, the teacher's dense reward loses local exploitability. Continuing to generate and evaluate tokens on these ``drifted'' trajectories not only degrades reward quality but also incurs massive computational waste. To address this, we introduce \textbf{Prune-OPD}, a framework that dynamically aligns training budgets with supervision quality. By continuously monitoring the local compatibility between student and teacher predictions (e.g., via top-$k$ overlap), Prune-OPD detects prefix-drift events in real time. Upon detecting severe drift, it monotonically down-weights subsequent unreliable rewards and triggers dynamic rollout truncation. This allows the training process to halt futile generation and reallocate compute strictly to reliable teacher supervision. Across diverse teacher-student combinations, Prune-OPD consistently aligns computation with supervision reliability. When prefix drift makes dense teacher rewards unreliable, it reduces training time by 37.6\%--68.0\% while preserving, and often improving, performance on challenging benchmarks (AMC, AIME, HMMT). When student-teacher compatibility remains high, it automatically preserves long-context supervision by expanding the training window. These results suggest that Prune-OPD improves OPD not by blindly shortening rollouts, but by reallocating computation toward locally exploitable teacher rewards.

preprint2023arXiv

Generative AI-empowered Effective Physical-Virtual Synchronization in the Vehicular Metaverse

Metaverse seamlessly blends the physical world and virtual space via ubiquitous communication and computing infrastructure. In transportation systems, the vehicular Metaverse can provide a fully-immersive and hyperreal traveling experience (e.g., via augmented reality head-up displays, AR-HUDs) to drivers and users in autonomous vehicles (AVs) via roadside units (RSUs). However, provisioning real-time and immersive services necessitates effective physical-virtual synchronization between physical and virtual entities, i.e., AVs and Metaverse AR recommenders (MARs). In this paper, we propose a generative AI-empowered physical-virtual synchronization framework for the vehicular Metaverse. In physical-to-virtual synchronization, digital twin (DT) tasks generated by AVs are offloaded for execution in RSU with future route generation. In virtual-to-physical synchronization, MARs customize diverse and personal AR recommendations via generative AI models based on user preferences. Furthermore, we propose a multi-task enhanced auction-based mechanism to match and price AVs and MARs for RSUs to provision real-time and effective services. Finally, property analysis and experimental results demonstrate that the proposed mechanism is strategy-proof and adverse-selection free while increasing social surplus by 50%.

preprint2023arXiv

Stochastic Qubit Resource Allocation for Quantum Cloud Computing

Quantum cloud computing is a promising paradigm for efficiently provisioning quantum resources (i.e., qubits) to users. In quantum cloud computing, quantum cloud providers provision quantum resources in reservation and on-demand plans for users. Literally, the cost of quantum resources in the reservation plan is expected to be cheaper than the cost of quantum resources in the on-demand plan. However, quantum resources in the reservation plan have to be reserved in advance without information about the requirement of quantum circuits beforehand, and consequently, the resources are insufficient, i.e., under-reservation. Hence, quantum resources in the on-demand plan can be used to compensate for the unsatisfied quantum resources required. To end this, we propose a quantum resource allocation for the quantum cloud computing system in which quantum resources and the minimum waiting time of quantum circuits are jointly optimized. Particularly, the objective is to minimize the total costs of quantum circuits under uncertainties regarding qubit requirement and minimum waiting time of quantum circuits. In experiments, practical circuits of quantum Fourier transform are applied to evaluate the proposed qubit resource allocation. The results illustrate that the proposed qubit resource allocation can achieve the optimal total costs.

preprint2022arXiv

A Full Dive into Realizing the Edge-enabled Metaverse: Visions, Enabling Technologies,and Challenges

Dubbed "the successor to the mobile Internet", the concept of the Metaverse has grown in popularity. While there exist lite versions of the Metaverse today, they are still far from realizing the full vision of an immersive, embodied, and interoperable Metaverse. Without addressing the issues of implementation from the communication and networking, as well as computation perspectives, the Metaverse is difficult to succeed the Internet, especially in terms of its accessibility to billions of users today. In this survey, we focus on the edge-enabled Metaverse to realize its ultimate vision. We first provide readers with a succinct tutorial of the Metaverse, an introduction to the architecture, as well as current developments. To enable ubiquitous, seamless, and embodied access to the Metaverse, we discuss the communication and networking challenges and survey cutting-edge solutions and concepts that leverage next-generation communication systems for users to immerse as and interact with embodied avatars in the Metaverse. Moreover, given the high computation costs required, e.g., to render 3D virtual worlds and run data-hungry artificial intelligence-driven avatars, we discuss the computation challenges and cloud-edge-end computation framework-driven solutions to realize the Metaverse on resource-constrained edge devices. Next, we explore how blockchain technologies can aid in the interoperable development of the Metaverse, not just in terms of empowering the economic circulation of virtual user-generated content but also to manage physical edge resources in a decentralized, transparent, and immutable manner. Finally, we discuss the future research directions towards realizing the true vision of the edge-enabled Metaverse.

preprint2022arXiv

Adaptive Resource Allocation in Quantum Key Distribution (QKD) for Federated Learning

Increasing privacy and security concerns in intelligence-native 6G networks require quantum key distribution-secured federated learning (QKD-FL), in which data owners connected via quantum channels can train an FL global model collaboratively without exposing their local datasets. To facilitate QKD-FL, the architectural design and routing management framework are essential. However, effective implementation is still lacking. To this end, we propose a hierarchical architecture for QKD-FL systems in which QKD resources (i.e., wavelengths) and routing are jointly optimized for FL applications. In particular, we focus on adaptive QKD resource allocation and routing for FL workers to minimize the deployment cost of QKD nodes under various uncertainties, including security requirements. The experimental results show that the proposed architecture and the resource allocation and routing model can reduce the deployment cost by 7.72\% compared to the CO-QBN algorithm.

preprint2022arXiv

Quantum-Secured Space-Air-Ground Integrated Networks: Concept, Framework, and Case Study

In the upcoming 6G era, existing terrestrial networks have evolved toward space-air-ground integrated networks (SAGIN), providing ultra-high data rates, seamless network coverage, and ubiquitous intelligence for communications of applications and services. However, conventional communications in SAGIN still face data confidentiality issues. Fortunately, the concept of Quantum Key Distribution (QKD) over SAGIN is able to provide information-theoretic security for secure communications in SAGIN with quantum cryptography. Therefore, in this paper, we propose the quantum-secured SAGIN which is feasible to achieve proven secure communications using quantum mechanics to protect data channels between space, air, and ground nodes. Moreover, we propose a universal QKD service provisioning framework to minimize the cost of QKD services under the uncertainty and dynamics of communications in quantum-secured SAGIN. In this framework, fiber-based QKD services are deployed in passive optical networks with the advantages of low loss and high stability. Moreover, the widely covered and flexible satellite- and UAV-based QKD services are provisioned as a supplement during the real-time data transmission phase. Finally, to examine the effectiveness of the proposed concept and framework, a case study of quantum-secured SAGIN in the Metaverse is conducted where uncertain and dynamic factors of the secure communications in Metaverse applications are effectively resolved in the proposed framework.

preprint2022arXiv

Resource Allocation in Quantum Key Distribution (QKD) for Space-Air-Ground Integrated Networks

Space-air-ground integrated networks (SAGIN) are one of the most promising advanced paradigms in the sixth generation (6G) communication. SAGIN can support high data rates, low latency, and seamless network coverage for interconnected applications and services. However, communications in SAGIN are facing tremendous security threats from the ever-increasing capacity of quantum computers. Fortunately, quantum key distribution (QKD) for establishing secure communications in SAGIN, i.e., QKD over SAGIN, can provide information-theoretic security. To minimize the QKD deployment cost in SAGIN with heterogeneous nodes, in this paper, we propose a resource allocation scheme for QKD over SAGIN using stochastic programming. The proposed scheme is formulated via two-stage stochastic programming (SP), while considering uncertainties such as security requirements and weather conditions. Under extensive experiments, the results clearly show that the proposed scheme can achieve the optimal deployment cost under various security requirements and unpredictable weather conditions.