Source author record

Kai Xie

Kai Xie appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT eess.AS math.GT Multimedia nlin.CD Sound

Catalog footprint

What is connected

8works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

LEMAS: Large A 150K-Hour Large-scale Extensible Multilingual Audio Suite with Generative Speech Models

We present the LEMAS-Dataset, which, to our knowledge, is currently the largest open-source multilingual speech corpus with word-level timestamps. Covering over 150,000 hours across 10 major languages, LEMAS-Dataset is constructed via a efficient data processing pipeline that ensures high-quality data and annotations. To validate the effectiveness of LEMAS-Dataset across diverse generative paradigms, we train two benchmark models with distinct architectures and task specializations on this dataset. LEMAS-TTS, built upon a non-autoregressive flow-matching framework, leverages the dataset's massive scale and linguistic diversity to achieve robust zero-shot multilingual synthesis. Our proposed accent-adversarial training and CTC loss mitigate cross-lingual accent issues, enhancing synthesis stability. Complementarily, LEMAS-Edit employs an autoregressive decoder-only architecture that formulates speech editing as a masked token infilling task. By exploiting precise word-level alignments to construct training masks and adopting adaptive decoding strategies, it achieves seamless, smooth-boundary speech editing with natural transitions. Experimental results demonstrate that models trained on LEMAS-Dataset deliver high-quality synthesis and editing performance, confirming the dataset's quality. We envision that this richly timestamp-annotated, fine-grained multilingual corpus will drive future advances in prompt-based speech generation systems.

preprint2015arXiv

Analysis of Performance of Linear Analog Codes

In this paper we carefully study the MSE performance of the linear analog codes. We have derived a lower bound of the MSE performance under Likelihood(ML) and Linear Minimal Mean Square Error(LMMSE) decoding criteria respectively. It is proved in this essay that a kind of linear analog codes called \emph {unitary codes} can simultaneously achieve both of these two bounds. At the same time, we compare the obtained linear analog codes' MSE bounds with the performance of some existing nonlinear codes. The results showed that linear analog codes are actually not very satisfying and convinced us that more concerns should be cast onto the nonlinear class in order to find powerful analog codes.

preprint2013arXiv

Analog Turbo Codes: Turning Chaos to Reliability

Analog error correction codes, by relaxing the source space and the codeword space from discrete fields to continuous fields, present a generalization of digital codes. While linear codes are sufficient for digital codes, they are not for analog codes, and hence nonlinear mappings must be employed to fully harness the power of analog codes. This paper demonstrates new ways of building effective (nonlinear) analog codes from a special class of nonlinear, fast-diverging functions known as the chaotic functions. It is shown that the "butterfly effect" of the chaotic functions matches elegantly with the distance expansion condition required for error correction, and that the useful idea in digital turbo codes can be exploited to construct efficient turbo-like chaotic analog codes. Simulations show that the new analog codes can perform on par with, or better than, their digital counter-parts when transmitting analog sources.

preprint2013arXiv

Iterative Decoding and Turbo Equalization: The Z-Crease Phenomenon

Iterative probabilistic inference, popularly dubbed the soft-iterative paradigm, has found great use in a wide range of communication applications, including turbo decoding and turbo equalization. The classic approach of analyzing the iterative approach inevitably use the statistical and information-theoretical tools that bear ensemble-average flavors. This paper consider the per-block error rate performance, and analyzes it using nonlinear dynamical theory. By modeling the iterative processor as a nonlinear dynamical system, we report a universal "Z-crease phenomenon:" the zig-zag or up-and-down fluctuation -- rather than the monotonic decrease -- of the per-block errors, as the number of iteration increases. Using the turbo decoder as an example, we also report several interesting motion phenomenons which were not previously reported, and which appear to correspond well with the notion of "pseudo codewords" and "stopping/trapping sets." We further propose a heuristic stopping criterion to control Z-crease and identify the best iteration. Our stopping criterion is most useful for controlling the worst-case per-block errors, and helps to significantly reduce the average-iteration numbers.

preprint2013arXiv

Recursive-Cube-of-Rings (RCR) Revisited: Properties and Enhancement

We study recursive-cube-of-rings (RCR), a class of scalable graphs that can potentially provide rich inter-connection network topology for the emerging distributed and parallel computing infrastructure. Through rigorous proof and validating examples, we have corrected previous misunderstandings on the topological properties of these graphs, including node degree, symmetry, diameter and bisection width. To fully harness the potential of structural regularity through RCR construction, new edge connecting rules are proposed. The modified graphs, referred to as {\it Class-II RCR}, are shown to possess uniform node degrees, better connectivity and better network symmetry, and hence will find better application in parallel computing.

preprint2011arXiv

Efficient Image Transmission Through Analog Error Correction

This paper presents a new paradigm for image transmission through analog error correction codes. Conventional schemes rely on digitizing images through quantization (which inevitably causes significant bandwidth expansion) and transmitting binary bit-streams through digital error correction codes (which do not automatically differentiate the different levels of significance among the bits). To strike a better overall performance in terms of transmission efficiency and quality, we propose to use a single analog error correction code in lieu of digital quantization, digital code and digital modulation. The key is to get analog coding right. We show that this can be achieved by cleverly exploiting an elegant "butterfly" property of chaotic systems. Specifically, we demonstrate a tail-biting triple-branch baker's map code and its maximum-likelihood decoding algorithm. Simulations show that the proposed analog code can actually outperform digital turbo code, one of the best codes known to date. The results and findings discussed in this paper speak volume for the promising potential of analog codes, in spite of their rather short history.

preprint2011arXiv

Linear Analog Codes: The Good and The Bad

This paper studies the theory of linear analog error correction coding. Since classical concepts of minimum Hamming distance and minimum Euclidean distance fail in the analog context, a new metric, termed the "minimum (squared Euclidean) distance ratio," is defined. It is shown that linear analog codes that achieve the largest possible value of minimum distance ratio also achieve the smallest possible mean square error (MSE). Based on this achievability, a concept of "maximum distance ratio expansible (MDRE)" is established, in a spirit similar to maximum distance separable (MDS). Existing codes are evaluated, and it is shown that MDRE and MDS can be simultaneously achieved through careful design.

preprint2010arXiv

Precoded Turbo Equalizer for Power Line Communication Systems

Power line communication continues to draw increasing interest by promising a wide range of applications including cost-free last-mile communication solution. However, signal transmitted through the power lines deteriorates badly due to the presence of severe inter-symbol interference (ISI) and harsh random pulse noise. This work proposes a new precoded turbo equalization scheme specifically designed for the PLC channels. By introducing useful precoding to reshape ISI, optimizing maximum {\it a posteriori} (MAP) detection to address the non-Gaussian pulse noise, and performing soft iterative decision refinement, the new equalizer demonstrates a gain significantly better than the existing turbo equalizers.

Kai Xie

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

LEMAS: Large A 150K-Hour Large-scale Extensible Multilingual Audio Suite with Generative Speech Models

Analysis of Performance of Linear Analog Codes

Analog Turbo Codes: Turning Chaos to Reliability

Iterative Decoding and Turbo Equalization: The Z-Crease Phenomenon

Recursive-Cube-of-Rings (RCR) Revisited: Properties and Enhancement

Efficient Image Transmission Through Analog Error Correction

Linear Analog Codes: The Good and The Bad

Precoded Turbo Equalizer for Power Line Communication Systems