Paper detail

Masked Pre-trained Encoder base on Joint CTC-Transformer

This study (The work was accomplished during the internship in Tencent AI lab) addresses semi-supervised acoustic modeling, i.e. attaining high-level representations from unsupervised audio data and fine-tuning the parameters of pre-trained model with supervised data. The proposed approach adopts a two-stage training framework, consisting of masked pre-trained encoder (MPE) and Joint CTC-Transformer (JCT). In the MPE framework, part of input frames are masked and reconstructed after the encoder with massive unsupervised data. In JCT framework, compared with original Transformer, acoustic features are applied as input instead of plain text. CTC loss performs as the prediction target on top of the encoder, and decoder blocks remain unchanged. This paper presents a comparison between two-stage training method and the fully supervised JCT. In addition, this paper investigates the our approach's robustness against different volumns of training data. Experiments on the two-stage training method deliver much better performance than fully supervised model. The word error rate (WER) with two-stage training which only exploits 30\% of WSJ labeled data achieves 17\% reduction than which trained by 50\% of WSJ in a fully supervised way. Moreover, increasing unlabeled data for MPE from WSJ (81h) to Librispeech (960h) attains about 22\% WER reduction.

preprint2020arXivOpen access
0citations
0reviews
0saves
Nocode
Nodataset
0institutions

Next steps

Decide what to do with this paper

Use like or dislike for the fast social read. The more specific scholarly feedback stays available below when needed.

Log in to curate

Reading frame

Keep the important context close to the paper

Keep the important signals around this paper in one place: votes, save state, collection context, reviews and the metadata you need before deciding what to do next.

Institutions

Add specific reaction

Move through the context

Research map

Open full explorer

Move through nearby people, institutions, topics and adjacent work without leaving the paper page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Structured reviews

0 review(s)

ContributeLeave structured feedbackUse the review template when you have a concrete strength, concern or method question.Open review form

No structured reviews yet. High-signal critique starts here.

Work discussion

0 comment(s)

DiscussAdd a high-signal commentKeep quick notes, caveats and replication pointers separate from formal reviews.Open comment form

No discussion yet. The first strong comment sets the tone.