Source author record

Xiaoxue Gao

Xiaoxue Gao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.AS Artificial Intelligence Sound cond-mat.mes-hall cond-mat.mtrl-sci eess.SP Machine Learning physics.optics

Catalog footprint

What is connected

5works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Confidence-Based Self-Training for EMG-to-Speech: Leveraging Synthetic EMG for Robust Modeling

Voiced Electromyography (EMG)-to-Speech (V-ETS) models reconstruct speech from muscle activity signals, facilitating applications such as neurolaryngologic diagnostics. Despite its potential, the advancement of V-ETS is hindered by a scarcity of paired EMG-speech data. To address this, we propose a novel Confidence-based Multi-Speaker Self-training (CoM2S) approach, along with a newly curated Libri-EMG dataset. This approach leverages synthetic EMG data generated by a pre-trained model, followed by a proposed filtering mechanism based on phoneme-level confidence to enhance the ETS model through the proposed self-training techniques. Experiments demonstrate our method improves phoneme accuracy, reduces phonological confusion, and lowers word error rate, confirming the effectiveness of our CoM2S approach for V-ETS. In support of future research, we will release the codes and the proposed Libri-EMG dataset-an open-access, time-aligned, multi-speaker voiced EMG and speech recordings.

preprint2026arXiv

MORE: Multi-Objective Adversarial Attacks on Speech Recognition

The emergence of large-scale automatic speech recognition (ASR) models such as Whisper has greatly expanded their adoption across diverse real-world applications. Ensuring robustness against even minor input perturbations is therefore critical for maintaining reliable performance in real-time environments. While prior work has mainly examined accuracy degradation under adversarial attacks, robustness with respect to efficiency remains largely unexplored. This narrow focus provides only a partial understanding of ASR model vulnerabilities. To address this gap, we conduct a comprehensive study of ASR robustness under multiple attack scenarios. We introduce MORE, a multi-objective repetitive doubling encouragement attack, which jointly degrades recognition accuracy and inference efficiency through a hierarchical staged repulsion-anchoring mechanism. Specifically, we reformulate multi-objective adversarial optimization into a hierarchical framework that sequentially achieves the dual objectives. To further amplify effectiveness, we propose a novel repetitive encouragement doubling objective (REDO) that induces duplicative text generation by maintaining accuracy degradation and periodically doubling the predicted sequence length. Overall, MORE compels ASR models to produce incorrect transcriptions at a substantially higher computational cost, triggered by a single adversarial input. Experiments show that MORE consistently yields significantly longer transcriptions while maintaining high word error rates compared to existing baselines, underscoring its effectiveness in multi-objective adversarial attack.

preprint2023arXiv

Synergistic Photon Management and Strain-Induced Band Gap Engineering of Two-Dimensional MoS2 Using Semimetal Composite Nanostructures

2D MoS2 attracts increasing attention for its application in flexible electronics and photonic devices. For 2D material optoelectronic devices, light absorption of the molecularly thin 2D absorber would be one of the key limiting factors in device efficiency, and conventional photon management techniques are not necessarily compatible with them. In this paper, we show two semimetal composite nanostructures for synergistic photon management and strain-induced band gap engineering of 2D MoS2: (1) pseudo-periodic Sn nanodots, (2) conductive SnOx (x<1) core-shell nanoneedle structures. Without sophisticated nanolithography, both nanostructures are self-assembled from physical vapor deposition. 2D MoS2 achieves up to >15x enhancement in absorption at λ=650-950 nm under Sn nanodots, and 20-30x at λ=700-900 nm under SnOx (x<1) nanoneedles, both spanning from visible to near infrared regime. Enhanced absorption in MoS2 results from strong near field enhancement and reduced MoS2 band gap due to the tensile strain induced by the Sn nanostructures, as confirmed by Raman and photoluminescence spectroscopy. Especially, we demonstrate that up to 3.5% biaxial tensile strain is introduced to 2D MoS2 using conductive nanoneedle-structured SnOx (x<1), which reduces the band gap by ~0.35 eV to further enhance light absorption at longer wavelengths. To the best of our knowledge, this is the first demonstration of a synergistic triple-functional photon management, stressor, and conductive electrode layer on 2D MoS2. Such synergistic photon management and band gap engineering approach for extended spectral response can be further applied to other 2D materials for future 2D photonic devices.

preprint2022arXiv

Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music

Lyrics transcription of polyphonic music is challenging not only because the singing vocals are corrupted by the background music, but also because the background music and the singing style vary across music genres, such as pop, metal, and hip hop, which affects lyrics intelligibility of the song in different ways. In this work, we propose to transcribe the lyrics of polyphonic music using a novel genre-conditioned network. The proposed network adopts pre-trained model parameters, and incorporates the genre adapters between layers to capture different genre peculiarities for lyrics-genre pairs, thereby only requiring lightweight genre-specific parameters for training. Our experiments show that the proposed genre-conditioned network outperforms the existing lyrics transcription systems.

preprint2022arXiv

Music-robust Automatic Lyrics Transcription of Polyphonic Music

Lyrics transcription of polyphonic music is challenging because singing vocals are corrupted by the background music. To improve the robustness of lyrics transcription to the background music, we propose a strategy of combining the features that emphasize the singing vocals, i.e. music-removed features that represent singing vocal extracted features, and the features that capture the singing vocals as well as the background music, i.e. music-present features. We show that these two sets of features complement each other, and their combination performs better than when they are used alone, thus improving the robustness of the acoustic model to the background music. Furthermore, language model interpolation between a general-purpose language model and an in-domain lyrics-specific language model provides further improvement in transcription results. Our experiments show that our proposed strategy outperforms the existing lyrics transcription systems for polyphonic music. Moreover, we find that our proposed music-robust features specially improve the lyrics transcription performance in metal genre of songs, where the background music is loud and dominant.

Xiaoxue Gao

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Confidence-Based Self-Training for EMG-to-Speech: Leveraging Synthetic EMG for Robust Modeling

MORE: Multi-Objective Adversarial Attacks on Speech Recognition

Synergistic Photon Management and Strain-Induced Band Gap Engineering of Two-Dimensional MoS2 Using Semimetal Composite Nanostructures

Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music

Music-robust Automatic Lyrics Transcription of Polyphonic Music