Researcher profile

Wei Xia

Wei Xia contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

AdvKT: An Adversarial Multi-Step Training Framework for Knowledge Tracing

Knowledge Tracing (KT) monitors students' knowledge states and simulates their responses to question sequences. Existing KT models typically follow a single-step training paradigm, which leads to discrepancies with the multi-step inference process required in real-world simulations, resulting in significant error accumulation. This accumulation of error, coupled with the issue of data sparsity, can substantially degrade the performance of recommendation models in the intelligent tutoring systems. To address these challenges, we propose a novel Adversarial Multi-Step Training Framework for Knowledge Tracing (AdvKT), which, for the first time, focuses on the multi-step KT task. More specifically, AdvKT leverages adversarial learning paradigm involving a generator and a discriminator. The generator mimics high-reward responses, effectively reducing error accumulation across multiple steps, while the discriminator provides feedback to generate synthetic data. Additionally, we design specialized data augmentation techniques to enrich the training data with realistic variations, ensuring that the model generalizes well even in scenarios with sparse data. Experiments conducted on four real-world datasets demonstrate the superiority of AdvKT over existing KT models, showcasing its ability to address both error accumulation and data sparsity issues effectively.

preprint2026arXiv

AP2O-Coder: Adaptively Progressive Preference Optimization for Reducing Compilation and Runtime Errors in LLM-Generated Code

LLMs' code generation capabilities have yielded substantial improvements in the effectiveness of programming tasks. However, LLM-generated code still suffers from compilation and runtime errors. Existing offline preference optimization methods primarily focus on enhancing LLMs' coding abilities using pass/fail signals in the preference data, overlooking the deep-level error types in the failed codes. To address this, we propose Adaptively Progressive Preference Optimization (AP2O) for coding (i.e., AP2O-Coder), a method that guides LLMs adaptively and methodically to reduce code errors for code generation. Specifically, we construct an error notebook from failed codes and progressively optimize the LLM to correct errors type by type. Furthermore, we adaptively replay error types to tailor to the LLM's changing weaknesses throughout the training process. Through extensive experiments on both code and general LLMs (Llama, Qwen, and DeepSeek series) with parameters ranging from 0.5B to 34B, our AP2O-Coder improves code generation performance by up to 3% in pass@k while using less preference data. Code: https://github.com/TsingZ0/AP2O

preprint2026arXiv

Characterization of strongly convex Kähler-Berwald metrics

Let $F: T^{1,0}M\rightarrow[0,+\infty)$ be a strongly convex complex Finsler metric on a complex manifold $M$ and $\pmb{J}$ the canonical complex structure on the complex manifold $T^{1,0}M$. We give a geometric characterization of strongly convex Kähler-Berwald metrics. In particular, we prove that $\pmb{J}$ is horizontally parallel with respect to the Cartan connection iff $F$ is a Kähler-Berwald metric. We also prove that the Cartan connection and the Chern-Finsler connection associated to $F$ coincide iff $\pmb{J}$ is both horizontal and vertical parallel with respect to the Cartan connection. Based on these results, we give a rigidity theorem of strongly convex Kähler-Berwald metrics with constant holomorphic sectional curvatures.

preprint2026arXiv

GAPO: Robust Advantage Estimation for Real-World Code LLMs

Reinforcement learning (RL) is widely used for post-training large language models (LLMs) in code editing, where group-relative methods, such as GRPO, are popular due to their critic-free and normalized advantage estimation. However, in real-world code-editing scenarios, reward distributions are often skewed with unpredictable noise, leading to distorted advantage computation and increased rollout outliers. To address this issue, we propose Group Adaptive Policy Optimization (GAPO), which adaptively finds an interval with the highest SNR (Signal to Noise Ratio) per prompt and uses the median of that interval as an adaptive Q to replace the group mean in advantage calculation to reduce noise further. This adaptive Q robustly handles rollout noise while remaining plug-and-play and efficient. We evaluate GAPO on nine instruction-tuned LLMs (3B-14B) using a collected large dataset of 51,844 real-world, history-aware code-editing tasks spanning 10 programming languages. GAPO yields up to 4.35 in-domain (ID) and 5.30 out-of-domain (OOD) exact-match improvements over GRPO and its variant DAPO, while achieving lower clipping ratios and higher GPU throughput. Code: https://github.com/TsingZ0/verl-GAPO.

preprint2026arXiv

Multi-Constrained Evolutionary Molecular Design Framework: An Interpretable Drug Design Method Combining Rule-Based Evolution and Molecular Crossover

This study proposes MCEMOL (Multi-Constrained Evolutionary Molecular Design Framework), a molecular optimization approach integrating rule-based evolution with molecular crossover. MCEMOL employs dual-layer evolution: optimizing transformation rules at rule level while applying crossover and mutation to molecular structures. Unlike deep learning methods requiring large datasets and extensive training, our algorithm evolves efficiently from minimal starting molecules with low computational overhead. The framework incorporates message-passing neural networks and comprehensive chemical constraints, ensuring efficient and interpretable molecular design. Experimental results demonstrate that MCEMOL provides transparent design pathways through its evolutionary mechanism while generating valid, diverse, target-compliant molecules. The framework achieves 100% molecular validity with high structural diversity and excellent drug-likeness compliance, showing strong performance in symmetry constraints, pharmacophore optimization, and stereochemical integrity. Unlike black-box methods, MCEMOL delivers dual value: interpretable transformation rules researchers can understand and trust, alongside high-quality molecular libraries for practical applications. This establishes a paradigm where interpretable AI-driven drug design and effective molecular generation are achieved simultaneously, bridging the gap between computational innovation and practical drug discovery needs.

preprint2026arXiv

Priming: Hybrid State Space Models From Pre-trained Transformers

Hybrid State-Space models combine Attention with recurrent State-Space Model (SSM) layers, balancing eidetic memory from Attention with compressed fading memory from SSMs. This yields smaller Key-Value caches and faster decoding than Transformers, along with a richer architectural design space. Exploring that design space at scale has so far required training from scratch, a barrier that has kept most large-model Hybrid research within a narrow range of architectures. We introduce Priming, a method that turns Hybrid architecture design from a pre-training problem into a knowledge transfer one. Priming initializes a Hybrid model from a pre-trained Transformer and, through short alignment and post-training phases, recovers downstream quality using less than 0.5% of the source model's pre-training token budget. Priming is agnostic to the source Transformer family (e.g., Qwen, Llama, Mistral), model class (dense or Mixture-of-Experts), and model scale. Priming enables us to run the first controlled comparison of SSM layer types at scale under identical conditions. We evaluate, Gated KalmaNet (GKA), Gated DeltaNet (GDN), and Mamba-2, and show that their expressiveness hierarchy, GKA>GDN>Mamba-2, directly predicts downstream performance on long-context reasoning tasks. We scale Priming to 8B/32B reasoning models with native 128K contexts. Our Hybrid GKA 32B improves over its source Qwen3-32B by +3.8 average reasoning points, while staying within 1% of a Transformer post-trained on the same data and enabling up to 2.3x higher decode throughput. To foster research on Hybrid architectures, we release a model zoo of primed Hybrid models for long-context reasoning and instruction following, together with the Priming training and inference code (Sequence Parallelism algorithms for long-context training, optimized GKA kernels, and vLLM serving plugin), all under Apache~2.0 License.

preprint2026arXiv

Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes

We introduce Talk2Move, a reinforcement learning (RL) based diffusion framework for text-instructed spatial transformation of objects within scenes. Spatially manipulating objects in a scene through natural language poses a challenge for multimodal generation systems. While existing text-based manipulation methods can adjust appearance or style, they struggle to perform object-level geometric transformations-such as translating, rotating, or resizing objects-due to scarce paired supervision and pixel-level optimization limits. Talk2Move employs Group Relative Policy Optimization (GRPO) to explore geometric actions through diverse rollouts generated from input images and lightweight textual variations, removing the need for costly paired data. A spatial reward guided model aligns geometric transformations with linguistic description, while off-policy step evaluation and active step sampling improve learning efficiency by focusing on informative transformation stages. Furthermore, we design object-centric spatial rewards that evaluate displacement, rotation, and scaling behaviors directly, enabling interpretable and coherent transformations. Experiments on curated benchmarks demonstrate that Talk2Move achieves precise, consistent, and semantically faithful object transformations, outperforming existing text-guided editing approaches in both spatial accuracy and scene coherence.

preprint2025arXiv

Ideology as a Problem: Lightweight Logit Steering for Annotator-Specific Alignment in Social Media Analysis

LLMs internally organize political ideology along low-dimensional structures that are partially, but not fully aligned with human ideological space. This misalignment is systematic, model specific, and measurable. We introduce a lightweight linear probe that both quantifies the misalignment and minimally corrects the output layer. This paper introduces a simple and efficient method for aligning models with specific user opinions. Instead of retraining the model, we calculated a bias score from its internal features and directly adjusted the final output probabilities. This solution is practical and low-cost and preserves the original reasoning power of the model.