Source author record

Meng Cai

Meng Cai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.AS math.NA Numerical Analysis Sound Artificial Intelligence Computation and Language math.PR

Catalog footprint

What is connected

4works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection

Nowadays, most methods in end-to-end contextual speech recognition bias the recognition process towards contextual knowledge. Since all-neural contextual biasing methods rely on phrase-level contextual modeling and attention-based relevance modeling, they may encounter confusion between similar context-specific phrases, which hurts predictions at the token level. In this work, we focus on mitigating confusion problems with fine-grained contextual knowledge selection (FineCoS). In FineCoS, we introduce fine-grained knowledge to reduce the uncertainty of token predictions. Specifically, we first apply phrase selection to narrow the range of phrase candidates, and then conduct token attention on the tokens in the selected phrase candidates. Moreover, we re-normalize the attention weights of most relevant phrases in inference to obtain more focused phrase-level contextual representations, and inject position information to better discriminate phrases or tokens. On LibriSpeech and an in-house 160,000-hour dataset, we explore the proposed methods based on a controllable all-neural biasing method, collaborative decoding (ColDec). The proposed methods provide at most 6.1% relative word error rate reduction on LibriSpeech and 16.4% relative character error rate reduction on the in-house dataset over ColDec.

preprint2022arXiv

Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fire

Speaker change detection is an important task in multi-party interactions such as meetings and conversations. In this paper, we address the speaker change detection task from the perspective of sequence transduction. Specifically, we propose a novel encoder-decoder framework that directly converts the input feature sequence to the speaker identity sequence. The difference-based continuous integrate-and-fire mechanism is designed to support this framework. It detects speaker changes by integrating the speaker difference between the encoder outputs frame-by-frame and transfers encoder outputs to segment-level speaker embeddings according to the detected speaker changes. The whole framework is supervised by the speaker identity sequence, a weaker label than the precise speaker change points. The experiments on the AMI and DIHARD-I corpora show that our sequence-level method consistently outperforms a strong frame-level baseline that uses the precise speaker change labels.

preprint2022arXiv

Strong convergence rates of a fully discrete scheme for the Cahn-Hilliard-Cook equation

The first aim of this paper is to examine existence, uniqueness and regularity for the Cahn-Hilliard-Cook (CHC) equation in space dimension $d\leq 3$. By applying a spectral Galerkin method to the infinite dimensional equation, we elaborate the well-posedness and regularity of the finite dimensional approximate problem. The key idea lies in transforming the stochastic problem {\color{black}{with additive noise}} into an equivalent random equation. The regularity of the solution to the equivalent random equation is obtained, in one dimension, with the aid of the Gagliardo-Nirenberg inequality and done in two and three dimensions, by the energy argument. Further, the approximate solution is shown to be strongly convergent to the unique mild solution of the original CHC equation, whose spatio-temporal regularity can be attained by similar arguments. In addition, a fully discrete approximation of such problem is investigated, performed by the spectral Galerkin method in space and the backward Euler method in time. The previously obtained regularity results of the problem help us to identify strong convergence rates of the fully discrete scheme.

preprint2021arXiv

Weak convergence rates for an explicit full-discretization of stochastic Allen-Cahn equation with additive noise

We discretize the stochastic Allen-Cahn equation with additive noise by means of a spectral Galerkin method in space and a tamed version of the exponential Euler method in time. The resulting error bounds are analyzed for the spatio-temporal full discretization in both strong and weak senses. Different from existing works, we develop a new and direct approach for the weak error analysis, which does not rely on the use of the associated Kolmogorov equation or Itô's formula and is therefore non-Markovian in nature. Such an approach thus has a potential to be applied to non-Markovian equations such as stochastic Volterra equations or other types of fractional SPDEs, which suffer from the lack of Kolmogorov equations. It turns out that the obtained weak convergence rates are, in both spatial and temporal direction, essentially twice as high as the strong convergence rates. Also, it is revealed how the weak convergence rates depend on the regularity of the noise. Numerical experiments are finally reported to confirm the theoretical conclusion.

Meng Cai

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection

Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fire

Strong convergence rates of a fully discrete scheme for the Cahn-Hilliard-Cook equation

Weak convergence rates for an explicit full-discretization of stochastic Allen-Cahn equation with additive noise