Source author record

Xiaopeng Wang

Xiaopeng Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Sound hep-ph Applications astro-ph.HE Computation and Language Computer Vision cond-mat.mtrl-sci cond-mat.supr-con eess.AS Information Theory math.IT Multimedia nucl-th physics.app-ph

Catalog footprint

What is connected

10works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Detect All-Type Deepfake Audio: Wavelet Prompt Tuning for Enhanced Auditory Perception

The rapid advancement of audio generation technologies has escalated the risks of malicious deepfake audio across speech, sound, singing voice, and music, threatening multimedia security and trust. While existing countermeasures (CMs) perform well in single-type audio deepfake detection (ADD), their performance declines in cross-type scenarios. This paper is dedicated to studying the all-type ADD task. We are the first to comprehensively establish an all-type ADD benchmark to evaluate current CMs, incorporating cross-type deepfake detection across speech, sound, singing voice, and music. Then, we introduce the prompt tuning self-supervised learning (PT-SSL) training paradigm, which optimizes SSL front-end by learning specialized prompt tokens for ADD, requiring 458x fewer trainable parameters than fine-tuning (FT). Considering the auditory perception of different audio types, we propose the wavelet prompt tuning (WPT)-SSL method to capture type-invariant auditory deepfake information from the frequency domain without requiring additional training parameters, thereby enhancing performance over FT in the all-type ADD task. To achieve an universally CM, we utilize all types of deepfake audio for co-training. Experimental results demonstrate that WPT-XLSR-AASIST achieved the best performance, with an average EER of 3.58% across all evaluation sets.

preprint2026arXiv

Interpretable All-Type Audio Deepfake Detection with Audio LLMs via Frequency-Time Reinforcement Learning

Recent advances in audio large language models (ALLMs) have made high-quality synthetic audio widely accessible, increasing the risk of malicious audio deepfakes across speech, environmental sounds, singing voice, and music. Real-world audio deepfake detection (ADD) therefore requires all-type detectors that generalize across heterogeneous audio and provide interpretable decisions. Given the strong multi-task generalization ability of ALLMs, we first investigate their performance on all-type ADD under both supervised fine-tuning (SFT) and reinforcement fine-tuning (RFT). However, SFT using only binary real/fake labels tends to reduce the model to a black-box classifier, sacrificing interpretability. Meanwhile, vanilla RFT under sparse supervision is prone to reward hacking and can produce hallucinated, ungrounded rationales. To address this, we propose an automatic annotation and polishing pipeline that constructs Frequency-Time structured chain-of-thought (CoT) rationales, producing ~340K cold-start demonstrations. Building on CoT data, we propose Frequency Time-Group Relative Policy Optimization (FT-GRPO), a two-stage training paradigm that cold-starts ALLMs with SFT and then applies GRPO under rule-based frequency-time constraints. Experiments demonstrate that FT-GRPO achieves state-of-the-art performance on all-type ADD while producing interpretable, FT-grounded rationales. The data and code are available online.

preprint2026arXiv

MM-Sonate: Multimodal Controllable Audio-Video Generation with Zero-Shot Voice Cloning

Joint audio-video generation aims to synthesize synchronized multisensory content, yet current unified models struggle with fine-grained acoustic control, particularly for identity-preserving speech. Existing approaches either suffer from temporal misalignment due to cascaded generation or lack the capability to perform zero-shot voice cloning within a joint synthesis framework. In this work, we present MM-Sonate, a multimodal flow-matching framework that unifies controllable audio-video joint generation with zero-shot voice cloning capabilities. Unlike prior works that rely on coarse semantic descriptions, MM-Sonate utilizes a unified instruction-phoneme input to enforce strict linguistic and temporal alignment. To enable zero-shot voice cloning, we introduce a timbre injection mechanism that effectively decouples speaker identity from linguistic content. Furthermore, addressing the limitations of standard classifier-free guidance in multimodal settings, we propose a noise-based negative conditioning strategy that utilizes natural noise priors to significantly enhance acoustic fidelity. Empirical evaluations demonstrate that MM-Sonate establishes new state-of-the-art performance in joint generation benchmarks, significantly outperforming baselines in lip synchronization and speech intelligibility, while achieving voice cloning fidelity comparable to specialized Text-to-Speech systems.

preprint2023arXiv

Analytic solution of Balitsky-Kovchegov equation with running coupling constant using homogeneous balance method

In this study, we employ the homogeneous balance method to obtain an analytical solution to the Balitsky-Kovchegov equation with running coupling. We utilize two distinct prescriptions of the running coupling scale, namely the saturation scale dependent running coupling and the dipole momentum dependent running coupling. By fitting the proton structure function experimental data, we determine the free parameters in the analytical solution. The resulting $χ^{2}/d.o.f$ values are determined to be $1.07$ and $1.43$, respectively. With these definitive solutions, we are able to predict exclusive $J/ψ$ production, and demonstrate that analytical solutions with running coupling are in excellent agreement with $J/ψ$ differential and total cross section. Furthermore, our numerical results indicate that the analytical solution of the BK equation with running coupling can provide a reliable description for both the proton structure function and exclusive vector meson production.

preprint2022arXiv

Exclusive vector meson productions with the analytical solution of Balitsky-Kovchegov Equation

Exclusive vector meson production is an excellent probe for describing the structure of proton. In this paper, based on dipole model, the differential cross sections, total cross sections and the ratios of the longitudinal to transverse cross sections of $J/ψ$ and $ρ^0$ productions are calculated with the analytical solution of Balitsky-Kovchegov (BK) equation. In addition, we also consider the influences of two meson wave function models on the results. Our predictions, which are little sensitive to meson wave functions, agree with the experimental data. The analytical solution of BK equation is reliable for description of exclusive vector meson production in a certain range of $Q^2$.

preprint2022arXiv

Pasta phases in neutron stars under strong magnetic fields

In the present work, we consider nuclear matter in the innermost crust of neutron stars under the presence of a strong magnetic field within the framework of a relativistic mean-field description. Two models with a different slope of the symmetry energy are considered in order to discuss the density-dependence of the equation of state on the crust structure. The non-homogeneous matter in $β$-equilibrium is described within the coexisting phases method, and the effect of including the anomalous magnetic moment is discussed. Five different geometries for the pasta structures are considered. It is shown that strong magnetic fields cause an extension of the inner crust of the neutron stars, with the occurrence of a series of disconnected non-homogeneous matter regions above the one existing for a null magnetic field. Moreover, we observed that in these disconnected regions, for some values of the magnetic field, all five different cluster geometrical shapes occur, and the gas density is close to the cluster density. Also, the pressure at the neutron star crust-core transition much larger than the pressure obtained for a zero magnetic field. Another noticeable effect of the presence of strong magnetic fields is the increase of the proton fraction, favoring the appearance of protons in the gas background.

preprint2022arXiv

SFE-AI at SemEval-2022 Task 11: Low-Resource Named Entity Recognition using Large Pre-trained Language Models

Large scale pre-training models have been widely used in named entity recognition (NER) tasks. However, model ensemble through parameter averaging or voting can not give full play to the differentiation advantages of different models, especially in the open domain. This paper describes our NER system in the SemEval 2022 task11: MultiCoNER. We proposed an effective system to adaptively ensemble pre-trained language models by a Transformer layer. By assigning different weights to each model for different inputs, we adopted the Transformer layer to integrate the advantages of diverse models effectively. Experimental results show that our method achieves superior performances in Farsi and Dutch.

preprint2021arXiv

Superconductivity and Charge Density Wave of CuIr2Te4 by Iodine Doping

Here we report a systematic investigation on the evolution of the structural and physical properties, including the charge density wave and superconductivity of the polycrystalline CuIr2Te4-xIx. X-ray diffraction results indicate that both of a and c lattice parameters increase linearly. The resistivity measurements indicate that the charge density wave is destabilized with slight x but reappears when x is large than 0.9. Meanwhile, the superconducting transition temperature enhances as x raises and reaches a maximum value of around 2.95 K for the optimal composition CuIr2Te3.9I0.1 followed by a slight decrease with higher iodine doping content. The specific heat jump for the optimal composition CuIr2Te3.9I0.1 is approximately 1.46, which is close to the Bardeen Cooper Schrieffer value which is 1.43, indicating it is a bulk superconductor. The results of thermodynamic heat capacity measurements under different magnetic fields, magnetization and magneto-transport measurements further suggest that CuIr2Te4-xIx bulks are type II superconductors. Finally, an electronic phase diagram for this CuIr2Te4-xIx system has been constructed. The present study provides a suitable material platform for further investigation of the interplay of the CDW and superconductivity.

preprint2020arXiv

Influence of Ln elements (Ln = La, Pr, Nd, Sm) on the structure and oxygen permeability of Ca-containing dual-phase membranes

Developing good performance and low-cost oxygen permeable membranes for CO2 capture based on the oxy-fuel concept is greatly desirable but challenging. Despite tremendous efforts in exploring new CO2-stable dual-phase membranes, its presence is however still far from meeting the industrial requirements. Here we report a series of new Ca-containing CO2-resistant oxygen transporting membranes with composition 60wt.%Ce0.9Ln0.1O2-40wt.%Ln0.6Ca0.4FeO3(CLnO-LnCFO; Ln = La, Pr, Nd, Sm) synthesized via a Pechini one-pot method. Our results indicate all investigated compounds are composed of perovskite and fluorite phases, while the perovskite phases in the CNO-NCFO and CSO-SCFO membranes after sintering generates Ca-rich and Ca-less two kinds of grains with different morphologies, where the Ca-less small perovskite grains block the transport of oxygen ions and eventually result in poor oxygen permeability. Among our investigated CLnO-LnCFO membranes, CPO-PCFO exhibits the highest oxygen permeability and excellent CO2 stability, which were mainly associated with the improvement in crystal symmetry, non-negligible electronic conductivity of fluorite phase and the enhancement in electronic conductivity of perovskite. Our results establish Ca-containing oxides as candidate material platforms for membrane engineering devices that combine CO2 capture and oxygen separation.

preprint2015arXiv

Microwave Surveillance based on Ghost Imaging and Distributed Antennas

In this letter, we proposed a ghost imaging (GI) and distributed antennas based microwave surveillance scheme. By analyzing its imaging resolution and sampling requirement, the potential of employing microwave GI to achieve high-quality surveillance performance with low system complexity has been demonstrated. The theoretical analysis and effectiveness of the proposed microwave surveillance method are also validated via simulations.

Xiaopeng Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Detect All-Type Deepfake Audio: Wavelet Prompt Tuning for Enhanced Auditory Perception

Interpretable All-Type Audio Deepfake Detection with Audio LLMs via Frequency-Time Reinforcement Learning

MM-Sonate: Multimodal Controllable Audio-Video Generation with Zero-Shot Voice Cloning

Analytic solution of Balitsky-Kovchegov equation with running coupling constant using homogeneous balance method

Exclusive vector meson productions with the analytical solution of Balitsky-Kovchegov Equation

Pasta phases in neutron stars under strong magnetic fields

SFE-AI at SemEval-2022 Task 11: Low-Resource Named Entity Recognition using Large Pre-trained Language Models

Superconductivity and Charge Density Wave of CuIr2Te4 by Iodine Doping

Influence of Ln elements (Ln = La, Pr, Nd, Sm) on the structure and oxygen permeability of Ca-containing dual-phase membranes

Microwave Surveillance based on Ghost Imaging and Distributed Antennas