Researcher profile

Ruixiong Zhang

Ruixiong Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2021arXiv

DiDiSpeech: A Large Scale Mandarin Speech Corpus

This paper introduces a new open-sourced Mandarin speech corpus, called DiDiSpeech. It consists of about 800 hours of speech data at 48kHz sampling rate from 6000 speakers and the corresponding texts. All speech data in the corpus is recorded in quiet environment and is suitable for various speech processing tasks, such as voice conversion, multi-speaker text-to-speech and automatic speech recognition. We conduct experiments with multiple speech tasks and evaluate the performance, showing that it is promising to use the corpus for both academic research and practical application. The corpus is available at https://outreach.didichuxing.com/research/opendata/.

preprint2021arXiv

Transformer based unsupervised pre-training for acoustic representation learning

Recently, a variety of acoustic tasks and related applications arised. For many acoustic tasks, the labeled data size may be limited. To handle this problem, we propose an unsupervised pre-training method using Transformer based encoder to learn a general and robust high-level representation for all acoustic tasks. Experiments have been conducted on three kinds of acoustic tasks: speech emotion recognition, sound event detection and speech translation. All the experiments have shown that pre-training using its own training data can significantly improve the performance. With a larger pre-training data combining MuST-C, Librispeech and ESC-US datasets, for speech emotion recognition, the UAR can further improve absolutely 4.3% on IEMOCAP dataset. For sound event detection, the F1 score can further improve absolutely 1.5% on DCASE2018 task5 development set and 2.1% on evaluation set. For speech translation, the BLEU score can further improve relatively 12.2% on En-De dataset and 8.4% on En-Fr dataset.

preprint2020arXiv

A Further Study of Unsupervised Pre-training for Transformer Based Speech Recognition

Building a good speech recognition system usually requires large amounts of transcribed data, which is expensive to collect. To tackle this problem, many unsupervised pre-training methods have been proposed. Among these methods, Masked Predictive Coding achieved significant improvements on various speech recognition datasets with BERT-like Masked Reconstruction loss and Transformer backbone. However, many aspects of MPC have not been fully investigated. In this paper, we conduct a further study on MPC and focus on three important aspects: the effect of pre-training data speaking style, its extension on streaming model, and how to better transfer learned knowledge from pre-training stage to downstream tasks. Experiments reveled that pre-training data with matching speaking style is more useful on downstream recognition tasks. A unified training objective with APC and MPC provided 8.46% relative error reduction on streaming model trained on HKUST. Also, the combination of target data adaption and layer-wise discriminative training helped the knowledge transfer of MPC, which achieved 3.99% relative error reduction on AISHELL over a strong baseline.