Researcher profile

Meng Cai

Meng Cai contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2022arXiv

Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection

Nowadays, most methods in end-to-end contextual speech recognition bias the recognition process towards contextual knowledge. Since all-neural contextual biasing methods rely on phrase-level contextual modeling and attention-based relevance modeling, they may encounter confusion between similar context-specific phrases, which hurts predictions at the token level. In this work, we focus on mitigating confusion problems with fine-grained contextual knowledge selection (FineCoS). In FineCoS, we introduce fine-grained knowledge to reduce the uncertainty of token predictions. Specifically, we first apply phrase selection to narrow the range of phrase candidates, and then conduct token attention on the tokens in the selected phrase candidates. Moreover, we re-normalize the attention weights of most relevant phrases in inference to obtain more focused phrase-level contextual representations, and inject position information to better discriminate phrases or tokens. On LibriSpeech and an in-house 160,000-hour dataset, we explore the proposed methods based on a controllable all-neural biasing method, collaborative decoding (ColDec). The proposed methods provide at most 6.1% relative word error rate reduction on LibriSpeech and 16.4% relative character error rate reduction on the in-house dataset over ColDec.

preprint2022arXiv

Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fire

Speaker change detection is an important task in multi-party interactions such as meetings and conversations. In this paper, we address the speaker change detection task from the perspective of sequence transduction. Specifically, we propose a novel encoder-decoder framework that directly converts the input feature sequence to the speaker identity sequence. The difference-based continuous integrate-and-fire mechanism is designed to support this framework. It detects speaker changes by integrating the speaker difference between the encoder outputs frame-by-frame and transfers encoder outputs to segment-level speaker embeddings according to the detected speaker changes. The whole framework is supervised by the speaker identity sequence, a weaker label than the precise speaker change points. The experiments on the AMI and DIHARD-I corpora show that our sequence-level method consistently outperforms a strong frame-level baseline that uses the precise speaker change labels.

preprint2022arXiv

Strong convergence rates of a fully discrete scheme for the Cahn-Hilliard-Cook equation

The first aim of this paper is to examine existence, uniqueness and regularity for the Cahn-Hilliard-Cook (CHC) equation in space dimension $d\leq 3$. By applying a spectral Galerkin method to the infinite dimensional equation, we elaborate the well-posedness and regularity of the finite dimensional approximate problem. The key idea lies in transforming the stochastic problem {\color{black}{with additive noise}} into an equivalent random equation. The regularity of the solution to the equivalent random equation is obtained, in one dimension, with the aid of the Gagliardo-Nirenberg inequality and done in two and three dimensions, by the energy argument. Further, the approximate solution is shown to be strongly convergent to the unique mild solution of the original CHC equation, whose spatio-temporal regularity can be attained by similar arguments. In addition, a fully discrete approximation of such problem is investigated, performed by the spectral Galerkin method in space and the backward Euler method in time. The previously obtained regularity results of the problem help us to identify strong convergence rates of the fully discrete scheme.

preprint2021arXiv

Weak convergence rates for an explicit full-discretization of stochastic Allen-Cahn equation with additive noise

We discretize the stochastic Allen-Cahn equation with additive noise by means of a spectral Galerkin method in space and a tamed version of the exponential Euler method in time. The resulting error bounds are analyzed for the spatio-temporal full discretization in both strong and weak senses. Different from existing works, we develop a new and direct approach for the weak error analysis, which does not rely on the use of the associated Kolmogorov equation or Itô's formula and is therefore non-Markovian in nature. Such an approach thus has a potential to be applied to non-Markovian equations such as stochastic Volterra equations or other types of fractional SPDEs, which suffer from the lack of Kolmogorov equations. It turns out that the obtained weak convergence rates are, in both spatial and temporal direction, essentially twice as high as the strong convergence rates. Also, it is revealed how the weak convergence rates depend on the regularity of the noise. Numerical experiments are finally reported to confirm the theoretical conclusion.