Researcher profile

Xie Chen

Xie Chen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

X-Voice: Enabling Everyone to Speak 30 Languages via Zero-Shot Cross-Lingual Voice Cloning

In this paper, we present X-Voice, a 0.4B multilingual zero-shot voice cloning model that clones arbitrary voices and enables everyone to speak 30 languages. X-Voice is trained on a 420K-hour multilingual corpus using the International Phonetic Alphabet (IPA) as a unified representation. To eliminate the reliance on prompt text without complex preprocessing like forced alignment, we design a two-stage training paradigm. In Stage 1, we establish X-Voice$_{\text{s1}}$ through standard conditional flow-matching training and use it to synthesize 10K hours of speaker-consistent segments as audio prompts. In Stage 2, we fine-tune on these audio pairs with prompt text masked to derive X-Voice$_{\text{s2}}$, which enables zero-shot voice cloning without requiring transcripts of audio prompts. Architecturally, we extend F5-TTS by implementing a dual-level injection of language identifiers and decoupling and scheduling of Classifier-Free Guidance to facilitate multilingual speech synthesis. Subjective and objective evaluation results demonstrate that X-Voice outperforms existing flow-matching based multilingual systems like LEMAS-TTS and achieves zero-shot cross-lingual cloning capabilities comparable to billion-scale models such as Qwen3-TTS. To facilitate research transparency and community advancement, we open-source all related resources.

preprint2024arXiv

EAT: Self-Supervised Pre-Training with Efficient Audio Transformer

Audio self-supervised learning (SSL) pre-training, which aims to learn good representations from unlabeled audio, has made remarkable progress. However, the extensive computational demands during pre-training pose a significant barrier to the potential application and optimization of audio SSL models. In this paper, inspired by the success of data2vec 2.0 in image modality and Audio-MAE in audio modality, we introduce Efficient Audio Transformer (EAT) to further improve the effectiveness and efficiency in audio SSL. The proposed EAT adopts the bootstrap self-supervised training paradigm to the audio domain. A novel Utterance-Frame Objective (UFO) is designed to enhance the modeling capability of acoustic events. Furthermore, we reveal that the masking strategy is critical in audio SSL pre-training, and superior audio representations can be obtained with large inverse block masks. Experiment results demonstrate that EAT achieves state-of-the-art (SOTA) performance on a range of audio-related tasks, including AudioSet (AS-2M, AS-20K), ESC-50, and SPC-2, along with a significant pre-training speedup up to ~15x compared to existing audio SSL models.

preprint2021arXiv

Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset

Recently, Transformer based end-to-end models have achieved great success in many areas including speech recognition. However, compared to LSTM models, the heavy computational cost of the Transformer during inference is a key issue to prevent their applications. In this work, we explored the potential of Transformer Transducer (T-T) models for the fist pass decoding with low latency and fast speed on a large-scale dataset. We combine the idea of Transformer-XL and chunk-wise streaming processing to design a streamable Transformer Transducer model. We demonstrate that T-T outperforms the hybrid model, RNN Transducer (RNN-T), and streamable Transformer attention-based encoder-decoder model in the streaming scenario. Furthermore, the runtime cost and latency can be optimized with a relatively small look-ahead.

preprint2020arXiv

Fracton Phases of Matter

Fractons are a new type of quasiparticle which are immobile in isolation, but can often move by forming bound states. Fractons are found in a variety of physical settings, such as spin liquids and elasticity theory, and exhibit unusual phenomenology, such as gravitational physics and localization. The past several years have seen a surge of interest in these exotic particles, which have come to the forefront of modern condensed matter theory. In this review, we provide a broad treatment of fractons, ranging from pedagogical introductory material to discussions of recent advances in the field. We begin by demonstrating how the fracton phenomenon naturally arises as a consequence of higher moment conservation laws, often accompanied by the emergence of tensor gauge theories. We then provide a survey of fracton phases in spin models, along with the various tools used to characterize them, such as the foliation framework. We discuss in detail the manifestation of fracton physics in elasticity theory, as well as the connections of fractons with localization and gravitation. Finally, we provide an overview of some recently proposed platforms for fracton physics, such as Majorana islands and hole-doped antiferromagnets. We conclude with some open questions and an outlook on the field.

preprint2020arXiv

Fractonic order in infinite-component Chern-Simons gauge theories

2+1D multi-component $U(1)$ gauge theories with a Chern-Simons (CS) term provide a simple and complete characterization of 2+1D Abelian topological orders. In this paper, we extend the theory by taking the number of component gauge fields to infinity and find that they can describe interesting types of 3+1D "fractonic" order. "Fractonic" describes the peculiar phenomena that point excitations in certain strongly interacting systems either cannot move at all or are only allowed to move in a lower dimensional sub-manifold. In the simplest cases of infinite-component CS gauge theory, different components do not couple to each other and the theory describes a decoupled stack of 2+1D fractional Quantum Hall systems with quasi-particles moving only in 2D planes -- hence a fractonic system. We find that when the component gauge fields do couple through the CS term, more varieties of fractonic orders are possible. For example, they may describe foliated fractonic systems for which increasing the system size requires insertion of nontrivial 2+1D topological states. Moreover, we find examples which lie beyond the foliation framework, characterized by 2D excitations of infinite order and braiding statistics that are not strictly local.

preprint2019arXiv

Twisted foliated fracton phases

In the study of three-dimensional gapped models, two-dimensional gapped states should be considered as a free resource. This is the basic idea underlying the notion of `foliated fracton order' proposed in Phys. Rev. X 8, 031051 (2018). We have found that many of the known type I fracton models, although they appear very different, have the same foliated fracton order, known as `X-cube' order. In this paper, we identify three-dimensional fracton models with different kinds of foliated fracton order. Whereas the X-cube order corresponds to the gauge theory of a simple paramagnet with subsystem planar symmetry, the novel orders correspond to twisted versions of the gauge theory for which the system prior to gauging has nontrivial order protected by the planar subsystem symmetry. We present constructions of the twisted models and demonstrate that they possess nontrivial order by studying their fractional excitation contents.