Researcher profile

Xin Qi

Xin Qi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2022arXiv

Cross-Modal ASR Post-Processing System for Error Correction and Utterance Rejection

Although modern automatic speech recognition (ASR) systems can achieve high performance, they may produce errors that weaken readers' experience and do harm to downstream tasks. To improve the accuracy and reliability of ASR hypotheses, we propose a cross-modal post-processing system for speech recognizers, which 1) fuses acoustic features and textual features from different modalities, 2) joints a confidence estimator and an error corrector in multi-task learning fashion and 3) unifies error correction and utterance rejection modules. Compared with single-modal or single-task models, our proposed system is proved to be more effective and efficient. Experiment result shows that our post-processing system leads to more than 10% relative reduction of character error rate (CER) for both single-speaker and multi-speaker speech on our industrial ASR system, with about 1.7ms latency for each token, which ensures that extra latency introduced by post-processing is acceptable in streaming speech recognition.

preprint2020arXiv

AE-OT-GAN: Training GANs from data specific latent distribution

Though generative adversarial networks (GANs) areprominent models to generate realistic and crisp images,they often encounter the mode collapse problems and arehard to train, which comes from approximating the intrinsicdiscontinuous distribution transform map with continuousDNNs. The recently proposed AE-OT model addresses thisproblem by explicitly computing the discontinuous distribu-tion transform map through solving a semi-discrete optimaltransport (OT) map in the latent space of the autoencoder.However the generated images are blurry. In this paper, wepropose the AE-OT-GAN model to utilize the advantages ofthe both models: generate high quality images and at thesame time overcome the mode collapse/mixture problems.Specifically, we first faithfully embed the low dimensionalimage manifold into the latent space by training an autoen-coder (AE). Then we compute the optimal transport (OT)map that pushes forward the uniform distribution to the la-tent distribution supported on the latent manifold. Finally,our GAN model is trained to generate high quality imagesfrom the latent distribution, the distribution transform mapfrom which to the empirical data distribution will be con-tinuous. The paired data between the latent code and thereal images gives us further constriction about the generator.Experiments on simple MNIST dataset and complex datasetslike Cifar-10 and CelebA show the efficacy and efficiency ofour proposed method.