Researcher profile

Xuan Xie

Xuan Xie contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2025arXiv

DaGRPO: Rectifying Gradient Conflict in Reasoning via Distinctiveness-Aware Group Relative Policy Optimization

The evolution of Large Language Models (LLMs) has catalyzed a paradigm shift from superficial instruction following to rigorous long-horizon reasoning. While Group Relative Policy Optimization (GRPO) has emerged as a pivotal mechanism for eliciting such post-training reasoning capabilities due to its exceptional performance, it remains plagued by significant training instability and poor sample efficiency. We theoretically identify the root cause of these issues as the lack of distinctiveness within on-policy rollouts: for routine queries, highly homogeneous samples induce destructive gradient conflicts; whereas for hard queries, the scarcity of valid positive samples results in ineffective optimization. To bridge this gap, we propose Distinctiveness-aware Group Relative Policy Optimization (DaGRPO). DaGRPO incorporates two core mechanisms: (1) Sequence-level Gradient Rectification, which utilizes fine-grained scoring to dynamically mask sample pairs with low distinctiveness, thereby eradicating gradient conflicts at the source; and (2) Off-policy Data Augmentation, which introduces high-quality anchors to recover training signals for challenging tasks. Extensive experiments across 9 mathematical reasoning and out-of-distribution (OOD) generalization benchmarks demonstrate that DaGRPO significantly surpasses existing SFT, GRPO, and hybrid baselines, achieving new state-of-the-art performance (e.g., a +4.7% average accuracy gain on math benchmarks). Furthermore, in-depth analysis confirms that DaGRPO effectively mitigates gradient explosion and accelerates the emergence of long-chain reasoning capabilities.

preprint2022arXiv

DeepGalaxy: Testing Neural Network Verifiers via Two-Dimensional Input Space Exploration

Deep neural networks (DNNs) are widely developed and applied in many areas, and the quality assurance of DNNs is critical. Neural network verification (NNV) aims to provide formal guarantees to DNN models. Similar to traditional software, neural network verifiers could also contain bugs, which would have a critical and serious impact, especially in safety-critical areas. However, little work exists on validating neural network verifiers. In this work, we propose DeepGalaxy, an automated approach based on differential testing to tackle this problem. Specifically, we (1) propose a line of mutation rules, including model level mutation and specification level mutation, to effectively explore the two-dimensional input space of neural network verifiers; and (2) propose heuristic strategies to select test cases. We leveraged our implementation of DeepGalaxy to test three state-of-the-art neural network verifies, Marabou, Eran, and Neurify. The experimental results support the efficiency and effectiveness of DeepGalaxy. Moreover, five unique unknown bugs were discovered

preprint2022arXiv

Neuro-Symbolic Verification of Deep Neural Networks

Formal verification has emerged as a powerful approach to ensure the safety and reliability of deep neural networks. However, current verification tools are limited to only a handful of properties that can be expressed as first-order constraints over the inputs and output of a network. While adversarial robustness and fairness fall under this category, many real-world properties (e.g., "an autonomous vehicle has to stop in front of a stop sign") remain outside the scope of existing verification technology. To mitigate this severe practical restriction, we introduce a novel framework for verifying neural networks, named neuro-symbolic verification. The key idea is to use neural networks as part of the otherwise logical specification, enabling the verification of a wide variety of complex, real-world properties, including the one above. Moreover, we demonstrate how neuro-symbolic verification can be implemented on top of existing verification infrastructure for neural networks, making our framework easily accessible to researchers and practitioners alike.