Source author record

Wei Xu

Wei Xu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence eess.SP Computation and Language cond-mat.quant-gas cond-mat.soft cond-mat.str-el Distributed, Parallel, and Cluster Computing Machine Learning physics.chem-ph quant-ph Robotics

Catalog footprint

What is connected

10works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A microscopic origin for the breakdown of the Stokes Einstein relation in ion transport

Ion transport underlies the operation of biological ion channels and governs the performance of electrochemical energy-storage devices. A long-standing anomaly is that smaller alkali metal ions, such as Li$^+$, migrate more slowly in water than larger ions, in apparent violation of the Stokes-Einstein relation. This breakdown is conventionally attributed to dielectric friction, a collective drag force arising from electrostatic interactions between a drifting ion and its surrounding solvent. Here, combining nanopore transport measurements over electric fields spanning several orders of magnitude with molecular dynamics simulations, we show that the time-averaged electrostatic force on a migrating ion is not a drag force but a net driving force. By contrasting charged ions with neutral particles, we reveal that ionic charge introduces additional Lorentzian peaks in the frequency-dependent friction coefficient. These peaks originate predominantly from short-range Lennard-Jones (LJ) interactions within the first hydration layer and represent additional channels for energy dissipation, strongest for Li$^+$ and progressively weaker for Na$^+$ and K$^+$. Our results demonstrate that electrostatic interactions primarily act to tighten the local hydration structure, thereby amplifying short-range LJ interactions rather than directly opposing ion motion. This microscopic mechanism provides a unified physical explanation for the breakdown of the Stokes-Einstein relation in aqueous ion transport.

preprint2026arXiv

A Readiness-Driven Runtime for Pipeline-Parallel Training under Runtime Variability

Pipeline parallelism is a key technique for scaling large-model training, but modern workloads exhibit runtime variability in computation and communication. Existing pipeline systems typically consume static, profiled, or adaptively generated schedules as pre-committed execution orders. When realized task readiness diverges from the pre-committed order, stages may wait for not-yet-ready work even though other executable work is available, creating stage misalignment, idle bubbles, and reduced utilization. We present Runtime-Readiness-First Pipeline (RRFP), a readiness-driven runtime for pipeline-parallel training. RRFP changes how schedules are consumed at runtime: instead of treating a schedule as a sequence that stages must wait to follow, it treats the schedule as a non-binding hint order for ranking currently ready work. To support this model, RRFP combines message-driven asynchronous communication, lightweight tensor-parallel coordination for collective consistency, and ready-set arbitration for low-overhead dispatch. We implement RRFP in a Megatron-based training framework and evaluate it on language-only and multimodal workloads at up to 128 GPUs. RRFP improves over fixed-order pipeline baselines across all settings. Using the BFW hint, RRFP achieves up to 1.77$\times$ speedup on language-only workloads and up to 2.77$\times$ on multimodal workloads. In cross-framework comparisons, RRFP with the default BF hint outperforms the faster available external system by up to 1.84$\times$ while preserving training correctness.

preprint2026arXiv

CogRail: Benchmarking VLMs in Cognitive Intrusion Perception for Intelligent Railway Transportation Systems

Accurate and early perception of potential intrusion targets is essential for ensuring the safety of railway transportation systems. However, most existing systems focus narrowly on object classification within fixed visual scopes and apply rule-based heuristics to determine intrusion status, often overlooking targets that pose latent intrusion risks. Anticipating such risks requires the cognition of spatial context and temporal dynamics for the object of interest (OOI), which presents challenges for conventional visual models. To facilitate deep intrusion perception, we introduce a novel benchmark, CogRail, which integrates curated open-source datasets with cognitively driven question-answer annotations to support spatio-temporal reasoning and prediction. Building upon this benchmark, we conduct a systematic evaluation of state-of-the-art visual-language models (VLMs) using multimodal prompts to identify their strengths and limitations in this domain. Furthermore, we fine-tune VLMs for better performance and propose a joint fine-tuning framework that integrates three core tasks, position perception, movement prediction, and threat analysis, facilitating effective adaptation of general-purpose foundation models into specialized models tailored for cognitive intrusion perception. Extensive experiments reveal that current large-scale multimodal models struggle with the complex spatial-temporal reasoning required by the cognitive intrusion perception task, underscoring the limitations of existing foundation models in this safety-critical domain. In contrast, our proposed joint fine-tuning framework significantly enhances model performance by enabling targeted adaptation to domain-specific reasoning demands, highlighting the advantages of structured multi-task learning in improving both accuracy and interpretability. Code will be available at https://github.com/Hub-Tian/CogRail.

preprint2026arXiv

Gavel: Agent Meets Checklist for Evaluating LLMs on Long-Context Legal Summarization

Large language models (LLMs) now support contexts of up to 1M tokens, but their effectiveness on complex long-context tasks remains unclear. In this paper, we study multi-document legal case summarization, where a single case often spans many documents totaling 100K-500K tokens. We introduce Gavel-Ref, a reference-based evaluation framework with multi-value checklist evaluation over 26 items, as well as residual fact and writing-style evaluations. Using Gavel-Ref, we go beyond the single aggregate scores reported in prior work and systematically evaluate 12 frontier LLMs on 100 legal cases ranging from 32K to 512K tokens, primarily from 2025. Our results show that even the strongest model, Gemini 2.5 Pro, achieves only around 50 of $S_{\text{Gavel-Ref}}$, highlighting the difficulty of the task. Models perform well on simple checklist items (e.g., filing date) but struggle on multi-value or rare ones such as settlements and monitor reports. As LLMs continue to improve and may surpass human-written summaries -- making human references less reliable -- we develop Gavel-Agent, an efficient and autonomous agent scaffold that equips LLMs with six tools to navigate and extract checklists directly from case documents. With Qwen3, Gavel-Agent reduces token usage by 36% while resulting in only a 7% drop in $S_{\text{checklist}}$ compared to end-to-end extraction with GPT-4.1.

preprint2026arXiv

GR-Dexter Technical Report

Vision-language-action (VLA) models have enabled language-conditioned, long-horizon robot manipulation, but most existing systems are limited to grippers. Scaling VLA policies to bimanual robots with high degree-of-freedom (DoF) dexterous hands remains challenging due to the expanded action space, frequent hand-object occlusions, and the cost of collecting real-robot data. We present GR-Dexter, a holistic hardware-model-data framework for VLA-based generalist manipulation on a bimanual dexterous-hand robot. Our approach combines the design of a compact 21-DoF robotic hand, an intuitive bimanual teleoperation system for real-robot data collection, and a training recipe that leverages teleoperated robot trajectories together with large-scale vision-language and carefully curated cross-embodiment datasets. Across real-world evaluations spanning long-horizon everyday manipulation and generalizable pick-and-place, GR-Dexter achieves strong in-domain performance and improved robustness to unseen objects and unseen instructions. We hope GR-Dexter serves as a practical step toward generalist dexterous-hand robotic manipulation.

preprint2026arXiv

Human-inspired Global-to-Parallel Multi-scale Encoding for Lightweight Vision Models

Lightweight vision networks have witnessed remarkable progress in recent years, yet achieving a satisfactory balance among parameter scale, computational overhead, and task performance remains difficult. Although many existing lightweight models manage to reduce computation considerably, they often do so at the expense of a substantial increase in parameter count (e.g., LSNet, MobileMamba), which still poses obstacles for deployment on resource-limited devices. In parallel, some studies attempt to draw inspiration from human visual perception, but their modeling tends to oversimplify the visual process, making it hard to reflect how perception truly operates. Revisiting the cooperative mechanism of the human visual system, we propose GPM (Global-to-Parallel Multi-scale Encoding). GPM first employs a Global Insight Generator (GIG) to extract holistic cues, and subsequently processes features of different scales through parallel branches: LSAE emphasizes mid-/large-scale semantic relations, while IRB (Inverted Residual Block) preserves fine-grained texture information, jointly enabling coherent representation of global and local features. As such, GPM conforms to two characteristic behaviors of human vision perceiving the whole before focusing on details, and maintaining broad contextual awareness even during local attention. Built upon GPM, we further develop the lightweight H-GPE network. Experiments on image classification, object detection, and semantic segmentation show that H-GPE achieves strong performance while maintaining a balanced footprint in both FLOPs and parameters, delivering a more favorable accuracy-efficiency trade-off compared with recent state-of-the-art lightweight models.

preprint2026arXiv

Interplay of Unidirectional Quantum Strings in Kagome Rydberg Atom Array

Leveraging the rapid development of quantum simulators, the intriguing phenomena of quantum string are observed across various quantum simulation platforms. However, the complex interplay between the quantum strings cannot be well analyzed due to the limited system size in real quantum simulators. Here, with the help of a newly developed quantum Monte Carlo method, we can simulate a larger-scale Kagome Rydberg atom array, providing an ideal playground for studying quantum strings. By introducing a novel edge pinning method, the ends of a quantum string can be attached to edges so that the flexible manipulation of the quantum string becomes possible. Due to the geometric constraint, the quantum strings are unidirectional, which strongly complicates their interplay. To quantitatively describe the quantum string, we built a one-dimensional effective model. With both analytic and numerical methods, rich physics can be found, including ``geometric breaking", heart-like superposition state of quantum strings, and the attractive inter-string interactions. This work can benefit the comprehension of quantum strings and may also shed light on the simulation of high-energy physics.

preprint2026arXiv

Semi-Supervised Facial Expression Recognition based on Dynamic Threshold and Negative Learning

Facial expression recognition is a key task in human-computer interaction and affective computing. However, acquiring a large amount of labeled facial expression data is often costly. Therefore, it is particularly important to design a semi-supervised facial expression recognition algorithm that makes full use of both labeled and unlabeled data. In this paper, we propose a semi-supervised facial expression recognition algorithm based on Dynamic Threshold Adjustment (DTA) and Selective Negative Learning (SNL). Initially, we designed strategies for local attention enhancement and random dropout of feature maps during feature extraction, which strengthen the representation of local features while ensuring the model does not overfit to any specific local area. Furthermore, this study introduces a dynamic thresholding method to adapt to the requirements of the semi-supervised learning framework for facial expression recognition tasks, and through a selective negative learning strategy, it fully utilizes unlabeled samples with low confidence by mining useful expression information from complementary labels, achieving impressive results. We have achieved state-of-the-art performance on the RAF-DB and AffectNet datasets. Our method surpasses fully supervised methods even without using the entire dataset, which proves the effectiveness of our approach.

preprint2026arXiv

Variable-Length Wideband CSI Feedback via Loewner Interpolation and Deep Learning

In this paper, we propose a variable-length wideband channel state information (CSI) feedback scheme for Frequency Division Duplex (FDD) massive multiple-input multipleoutput (MIMO) systems in U6G band (6425MHz-7125MHz). Existing compressive sensing (CS)-based and deep learning (DL)- based schemes preprocess the channel by truncating it in the angular-delay domain. However, the energy leakage effect caused by the Discrete Fourier Transform (DFT) basis will be more serious and leads to a bottleneck in recovery accuracy when applied to wideband channels such as those in U6G. To solve this problem, we introduce the Loewner Interpolation (LI) framework which generates a set of dynamic bases based on the current CSI matrix, enabling highly efficient compression in the frequency domain. Then, the LI basis is further compressed in the spatial domain through a neural network. To achieve a flexible trade-off between feedback overhead and recovery accuracy, we design a rateless auto-encoder trained with tail dropout and a multi-objective learning schedule, supporting variable-length feedback with a singular model. Meanwhile, the codewords are ranked by importance, ensuring that the base station (BS) can still maintain acceptable reconstruction performance under limited feedback with tail erasures. Furthermore, an adaptive quantization strategy is developed for the feedback framework to enhance robustness. Simulation results demonstrate that the proposed scheme could achieve higher CSI feedback accuracy with less or equal feedback overhead, and improve spectral efficiency compared with baseline schemes.

preprint2025arXiv

Digitalizing Over-the-Air Computation via The Novel Complement Coded Modulation

To overcome inherent limitations of analog signals in over-the-air computation (AirComp), this letter proposes a two's complement-based coding scheme for the AirComp implementation with compatible digital modulations. Specifically, quantized discrete values are encoded into binary sequences using the two's complement and transmitted over multiple subcarriers. At the receiver, we design a decoder that constructs a functional mapping between the superimposed digital modulation signals and the target of computational results, theoretically ensuring asymptotic error free computation with the minimal codeword length. To further mitigate the adverse effects of channel fading, we adopt a truncated inversion strategy for pre-processing. Benefiting from the unified symbol distribution after the proposed encoding, we derive the optimal linear minimum mean squared error (LMMSE) detector in closed form and propose a low complexity algorithm seeking for the optimal truncation selection. Furthermore, the inherent importance differences among the coded outputs motivate an uneven power allocation strategy across subcarriers to improve computational accuracy. Numerical results validate the superiority of the proposed scheme over existing digital AirComp approaches, especially at low signal to-noise ratio (SNR) regimes.

Wei Xu

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

A microscopic origin for the breakdown of the Stokes Einstein relation in ion transport

A Readiness-Driven Runtime for Pipeline-Parallel Training under Runtime Variability

CogRail: Benchmarking VLMs in Cognitive Intrusion Perception for Intelligent Railway Transportation Systems

Gavel: Agent Meets Checklist for Evaluating LLMs on Long-Context Legal Summarization

GR-Dexter Technical Report

Human-inspired Global-to-Parallel Multi-scale Encoding for Lightweight Vision Models

Interplay of Unidirectional Quantum Strings in Kagome Rydberg Atom Array

Semi-Supervised Facial Expression Recognition based on Dynamic Threshold and Negative Learning

Variable-Length Wideband CSI Feedback via Loewner Interpolation and Deep Learning

Digitalizing Over-the-Air Computation via The Novel Complement Coded Modulation