Researcher profile

Xin Liu

Xin Liu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
15topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understanding

Non-verbal Vocalizations (NVs), such as laughter and sighs, are vital for conveying emotion and intention in human speech, yet most existing speech systems neglect them, which severely compromises communicative richness and emotional intelligence. Existing methods for NVs acquisition are either costly and unscalable (relying on manual annotation/recording) or unnatural (relying on rule-based synthesis). To address these limitations, we propose a highly scalable automatic annotation framework to label non-verbal phenomena from natural speech, which is low-cost, easily extendable, and inherently diverse and natural. This framework leverages a unified detection model to accurately identify NVs in natural speech and integrates them with transcripts via temporal-semantic alignment method. Using this framework, we created and released \textbf{NonVerbalSpeech-38K}, a diverse, real-world dataset featuring 38,718 samples across 10 NV categories collected from in-the-wild media. Experimental results demonstrate that our dataset provides superior controllability for NVs generation and achieves comparable performance for NVs understanding.

preprint2026arXiv

An Embarrassingly Simple Graph Heuristic Reveals Shortcut-Solvable Benchmarks for Sequential Recommendation

Sequential recommendation has increasingly shifted toward generative recommenders that combine sequential patterns with semantic item information. Yet these methods are often evaluated on a small set of widely used benchmarks, raising a key question: do these benchmarks actually require the advanced modeling capabilities that modern generative recommenders claim to provide? We conduct a benchmark audit with an intentionally simple graph heuristic. Starting from only the last one or two interacted items, it retrieves candidates from a few-hop item-transition graph and ranks them by item-feature similarity. Despite using no sequence encoder, generative objective, or training, this heuristic matches or outperforms many modern baselines, with relative NDCG@10 improvements of 38.10% and 44.18% over the best competing baseline on Amazon Review Sports and CDs. We show that this behavior reflects shortcut solvability rather than an artifact of one heuristic. We identify three shortcut structures that can make next-item prediction easier than expected: low-branching local transitions, feature-smooth transitions, and limited dependence on long user histories. These shortcuts need not appear together; even one or two strong signals can make simple local retrieval highly competitive, while weakening them makes the benefits of more sophisticated models clearer. Across 14 datasets, model rankings vary substantially with dataset properties, yet the heuristic remains competitive on 10 of them. Our findings suggest that strong performance on standard benchmarks does not always demonstrate advanced sequential, semantic, or generative modeling ability. We call for more careful dataset selection and dataset-level diagnostic analysis when using benchmarks to support claims about new recommendation models.

preprint2026arXiv

Anomaly-Preference Image Generation

Synthesizing realistic and diverse anomalous samples from limited data is vital for robust model generalization. However, existing methods struggle to reconcile fidelity and diversity, often hampered by distribution misalignment and overfitting, respectively.To mitigate this, we introduce Anomaly Preference Optimization,a novel paradigm that reformulates anomaly generation as a preference learning problem.Central to our approach is an implicit preference alignment mechanism that leverages real anomalies as positive references, deriving optimization signals directly from denoising trajectory deviations without requiring costly human annotation. Furthermore, we propose a Time-Aware Capacity Allocation module that dynamically distributes model capacity along the diffusion timeline,prioritizing structural diversity during highnoise phases while enhancing fine-grained fidelity in low-noise stages. During inference, a hierarchical sampling strategy modulates the coherencealignment trade-off, enabling precise control over generation. Extensive experiments demonstrate that significantly outperforms existing baselines,achieving state-of-the-art performance in both realism and diversity.

preprint2026arXiv

Convergence of Decentralized Stochastic Subgradient-based Methods for Nonsmooth Nonconvex functions

In this paper, we focus on the decentralized stochastic subgradient-based methods in minimizing nonsmooth nonconvex functions without Clarke regularity, especially in the decentralized training of nonsmooth neural networks. We propose a general framework that unifies various decentralized subgradient-based methods, such as decentralized stochastic subgradient descent (DSGD), DSGD with gradient-tracking technique (DSGD-T), and DSGD with momentum (DSGD-M). To establish the convergence properties of our proposed framework, we relate the discrete iterates to the trajectories of a continuous-time differential inclusion, which is assumed to have a coercive Lyapunov function with a stable set $\mathcal{A}$. We prove the asymptotic convergence of the iterates to the stable set $\mathcal{A}$ with sufficiently small and diminishing step-sizes. These results provide first convergence guarantees for some well-recognized of decentralized stochastic subgradient-based methods without Clarke regularity of the objective function. Preliminary numerical experiments demonstrate that our proposed framework yields highly efficient decentralized stochastic subgradient-based methods with convergence guarantees in the training of nonsmooth neural networks.

preprint2026arXiv

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

General reasoning represents a long-standing and formidable challenge in artificial intelligence. Recent breakthroughs, exemplified by large language models (LLMs) and chain-of-thought prompting, have achieved considerable success on foundational reasoning tasks. However, this success is heavily contingent upon extensive human-annotated demonstrations, and models' capabilities are still insufficient for more complex problems. Here we show that the reasoning abilities of LLMs can be incentivized through pure reinforcement learning (RL), obviating the need for human-labeled reasoning trajectories. The proposed RL framework facilitates the emergent development of advanced reasoning patterns, such as self-reflection, verification, and dynamic strategy adaptation. Consequently, the trained model achieves superior performance on verifiable tasks such as mathematics, coding competitions, and STEM fields, surpassing its counterparts trained via conventional supervised learning on human demonstrations. Moreover, the emergent reasoning patterns exhibited by these large-scale models can be systematically harnessed to guide and enhance the reasoning capabilities of smaller models.

preprint2026arXiv

Ergodic Estimates of One-Step Numerical Approximations for Superlinear SODEs

This paper establishes the first-order convergence rate for the ergodic error of numerical approximations to a class of stochastic ODEs (SODEs) with superlinear coefficients and multiplicative noise. By leveraging the generator approach to the Stein method, we derive a general error representation formula for one-step numerical schemes. Under suitable dissipativity and smoothness conditions, we prove that the error between the accurate invariant measure $π$ and the numerical invariant measure $π_τ$ is of order $\mathscr{O}(τ)$, which is sharp. Our framework applies to several recently studied schemes, including the tamed Euler, projected Euler, and backward Euler methods.

preprint2026arXiv

High-Rate Free-Running Reference-Frame-Independent Measurement-Device-Independent Quantum Key Distribution with Classified Distillation

Reference-frame-independent measurement-device-independent quantum key distribution (RFI-MDI-QKD) eliminates detector side-channel attacks and avoids reference-frame calibration. While its feasibility has been widely demonstrated, existing implementations typically assume fixed or slowly drifting reference-frame misalignment, conditions rarely satisfied outside the laboratory. In realistic environments, rapid and free-running reference-frame variations can severely degrade both the key rate and transmission distance of conventional RFI-MDI-QKD. Here we propose a free-running RFI-MDI-QKD protocol that maintains high-rate key generation under rapid reference-frame variations. By introducing a classification-distillation method that reclassifies total detection events, secure keys can be extracted without modifying the experimental setup. Our protocol achieves a key rate more than nine times higher than the best previous RFI-MDI-QKD scheme and tolerates channel losses exceeding 24 dB, where earlier approaches fail. These results enable practical quantum key distribution on mobile platforms, including satellite-to-ground links and airborne nodes.

preprint2026arXiv

High-Ti induced planar-fault transformation toward superlattice extrinsic stacking faults and microtwins in crept CoNi-based superalloys

Controlling planar fault shearing mechanisms is key for improving the high-temperature creep performance of gamma prime-strengthened high-temperature superalloys. This work examines how the Ti concentration in L12-strengthened CoNi-based alloys affects planar fault formation during creep. Interrupted compressive creep tests were conducted at 1223 K under air with a constant load stress of 241 MPa. We found, for the first time, that high Ti additions shift the dominant gamma prime shearing mode from antiphase boundaries (APBs) in Ti-free and low-Ti alloys to superlattice extrinsic stacking faults (SESFs). Systematic ab initio calculations show that in high-Ti alloys, the elevated APB energy renders APB-shearing mode unfavorable. Nevertheless, the SESF energy decreases relative to that in low-Ti compositions, and an increased ratio of complex intrinsic stacking fault (CISF) to SESF energy promote the transformation of high-energy CISFs into lower-energy SESFs. Chemical analysis using scanning transmission electron microscopy combined with energy-dispersive X-ray spectroscopy further reveals that, SESFs in high-Ti alloys are enriched in Ti, Mo and W, yet no grid-like ordering is observed. Together with the ab initio calculations, Mo and W additions in high Ti alloys could facilitate the transformation from L12 structure to low-energy D024 structure, indicating Mo and W segregation along SESFs is energetically favourable. Furthermore, the successive SESF thickening facilitates microtwinning in the absence of D024 ordering along SESFs, as an additional big carrier for creep strain. These new findings clarify the role of Ti in controlling planar fault shearing mechanisms, providing new insights for optimizing the creep performance of next-generation CoNi-based superalloys.

preprint2026arXiv

Intention Knowledge Graph Construction for User Intention Relation Modeling

Understanding user intentions is challenging for online platforms. Recent work on intention knowledge graphs addresses this but often lacks focus on connecting intentions, which is crucial for modeling user behavior and predicting future actions. This paper introduces a framework to automatically generate an intention knowledge graph, capturing connections between user intentions. Using the Amazon m2 dataset, we construct an intention graph with 351 million edges, demonstrating high plausibility and acceptance. Our model effectively predicts new session intentions and enhances product recommendations, outperforming previous state-of-the-art methods and showcasing the approach's practical utility.

preprint2026arXiv

Mixture Prototype Flow Matching for Open-Set Supervised Anomaly Detection

Open-set supervised anomaly detection (OSAD) aims to identify unseen anomalies using limited anomalous supervision. However, existing prototype-based methods typically model normal data via a unimodal Gaussian prior, failing to capture inherent multi-modality and resulting in blurred decision boundaries. To address this, we propose Mixture Prototype Flow Matching (MPFM), a framework that learns a continuous transformation from normal feature distributions to a structured Gaussian mixture prototype space. Departing from traditional flow-based approaches that rely on a single velocity vector, MPFM explicitly models the velocity field as a Gaussian mixture prior where each component corresponds to a distinct normal class. This design facilitates mode-aware and semantically coherent distribution transport. Furthermore, we introduce a Mutual Information Maximization Regularizer (MIMR) to prevent prototype collapse and maximize normal-anomaly separability. Extensive experiments demonstrate that MPFM achieves state-of-the-art performance across diverse benchmarks under both single- and multi-anomaly settings.

preprint2026arXiv

Model-Driven GPR Inversion Network With Surrogate Forward Solver

Data-driven deep learning is considered a promising solution for ground-penetrating radar (GPR) full-waveform inversion (FWI), while its generalization ability is limited due to the heavy reliance on abundant labeled samples. In contrast, Deep unfolding network (DUN) usually exhibits better generalization by integrating model-driven and data-driven approaches, yet its application to GPR FWI remains challenging due to the high computational cost associated with forward simulations. In this paper, we integrate a deep learning-based (DL-based) forward solver within an unfolding framework to form a fully neural-network-based architecture, UA-Net, for GPR FWI. The forward solver rapidly predicts B-scans given permittivity and conductivity models and enables automatic differentiation to compute gradients for inversion. In the inversion stage, an optimization process based on the Alternating Direction Method of Multipliers (ADMM) is unfolded into a multi-stage network with three interconnected modules: data fitting, regularization, and multiplier update. Specifically, the regularization module is trained end-to-end for adaptive learning of sparse target features. Experimental results demonstrate that UA-Net outperforms classical FWI and data-driven methods in reconstruction accuracy. Moreover, by employing transfer learning to fine-tune the network, UA-Net can be effectively applied to field data and produce reliable results.

preprint2026arXiv

Plastic limit of a viscoplastic Burgers equation -- A toy model for sea-ice dynamics

We study the plastic Burgers equation in one space dimension, i.e., the Burgers equation featuring an additional term formally given by the p-Laplacian with p=1, or rather, by the multivalued subdifferential of the total variation functional. Our study highlights that the interplay of the advection term with the stresses given by the multivalued 1-Laplacian is a crucial feature of this model. Eventhough it is an interesting model in itsef, it can also be regarded as a one-dimensional version of the momentum balance of Hibler's model for sea-ice dynamics. Therein, the stress tensor is given by a term with similar properties as the 1-Laplacian in order to account for plastic effects of the ice. For our analysis we start out from a viscoplastic Burgers equation, i.e., a suitably regularized version of the plastic Burgers equation with a small regularization parameter $\varepsilon>0$. For the viscoplastic Burgers equation, we construct a global BV-solution. In the singular limit $\varepsilon\to0$ we deduce the existence of a BV-solution for the plastic Burgers equation. In addition we show that the term arising as the limit of the regularized stresses is indeed related to an element of the subdifferential of the total variation functional.

preprint2026arXiv

UltraEval-Audio: A Unified Framework for Comprehensive Evaluation of Audio Foundation Models

The development of audio foundation models has accelerated rapidly since the emergence of GPT-4o. However, the lack of comprehensive evaluation has become a critical bottleneck for further progress in the field, particularly in audio generation. Current audio evaluation faces three major challenges: (1) audio evaluation lacks a unified framework, with datasets and code scattered across various sources, hindering fair and efficient cross-model comparison;(2) audio codecs, as a key component of audio foundation models, lack a widely accepted and holistic evaluation methodology; (3) existing speech benchmarks are heavily reliant on English, making it challenging to objectively assess models' performance on Chinese. To address the first issue, we introduce UltraEval-Audio, a unified evaluation framework for audio foundation models, specifically designed for both audio understanding and generation tasks. UltraEval-Audio features a modular architecture, supporting 10 languages and 14 core task categories, while seamlessly integrating 24 mainstream models and 36 authoritative benchmarks. To enhance research efficiency, the framework provides a one-command evaluation feature, accompanied by real-time public leaderboards. For the second challenge, UltraEval-Audio adopts a novel comprehensive evaluation scheme for audio codecs, evaluating performance across three key dimensions: semantic accuracy, timbre fidelity, and acoustic quality. To address the third issue, we propose two new Chinese benchmarks, SpeechCMMLU and SpeechHSK, designed to assess Chinese knowledge proficiency and language fluency. We wish that UltraEval-Audio will provide both academia and industry with a transparent, efficient, and fair platform for comparison of audio models. Our code, benchmarks, and leaderboards are available at https://github.com/OpenBMB/UltraEval-Audio.

preprint2023arXiv

Parity-protected superconducting qubit based on topological insulators

We propose a novel architecture that utilizes two 0-$π$ qubits based on topological Josephson junctions to implement a parity-protected superconducting qubit. The topological Josephson junctions provides protection against fabrication variations, which ensures the identical Josephson junctions required to implement the0-$π$ qubit. By viewing the even and odd parity ground states of a 0-$π$ qubit as spin-$\frac{1}{2}$ states, we construct the logic qubit states using the total parity odd subspace of two 0-$π$ qubits. This parity-protected qubit exhibits robustness against charge noise, similar to a singlet-triplet qubit's immunity to global magnetic field fluctuations. Meanwhile, the flux noise cannot directly couple two states with the same total parity and therefore is greatly suppressed. Benefiting from the simultaneous protection from both charge and flux noise, we demonstrate a dramatic enhancement of both $T_1$ and $T_2$ coherence times. Our work presents a new approach to engineer symmetry-protected superconducting qubits.