Source author record

Zhiping Zeng

Zhiping Zeng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language eess.AS Machine Learning physics.optics Sound

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning

In this work, we study leveraging extra text data to improve low-resource end-to-end ASR under cross-lingual transfer learning setting. To this end, we extend our prior work [1], and propose a hybrid Transformer-LSTM based architecture. This architecture not only takes advantage of the highly effective encoding capacity of the Transformer network but also benefits from extra text data due to the LSTM-based independent language model network. We conduct experiments on our in-house Malay corpus which contains limited labeled data and a large amount of extra text. Results show that the proposed architecture outperforms the previous LSTM-based architecture [1] by 24.2% relative word error rate (WER) when both are trained using limited labeled data. Starting from this, we obtain further 25.4% relative WER reduction by transfer learning from another resource-rich language. Moreover, we obtain additional 13.6% relative WER reduction by boosting the LSTM decoder of the transferred model with the extra text data. Overall, our best model outperforms the vanilla Transformer ASR by 11.9% relative WER. Last but not least, the proposed hybrid architecture offers much faster inference compared to both LSTM and Transformer architectures.

preprint2019arXiv

Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data

The lack of code-switch training data is one of the major concerns in the development of end-to-end code-switching automatic speech recognition (ASR) models. In this work, we propose a method to train an improved end-to-end code-switching ASR using only monolingual data. Our method encourages the distributions of output token embeddings of monolingual languages to be similar, and hence, promotes the ASR model to easily code-switch between languages. Specifically, we propose to use Jensen-Shannon divergence and cosine distance based constraints. The former will enforce output embeddings of monolingual languages to possess similar distributions, while the later simply brings the centroids of two distributions to be close to each other. Experimental results demonstrate high effectiveness of the proposed method, yielding up to 4.5% absolute mixed error rate improvement on Mandarin-English code-switching ASR task.

preprint2019arXiv

Enriching Rare Word Representations in Neural Language Models by Embedding Matrix Augmentation

The neural language models (NLM) achieve strong generalization capability by learning the dense representation of words and using them to estimate probability distribution function. However, learning the representation of rare words is a challenging problem causing the NLM to produce unreliable probability estimates. To address this problem, we propose a method to enrich representations of rare words in pre-trained NLM and consequently improve its probability estimation performance. The proposed method augments the word embedding matrices of pre-trained NLM while keeping other parameters unchanged. Specifically, our method updates the embedding vectors of rare words using embedding vectors of other semantically and syntactically similar words. To evaluate the proposed method, we enrich the rare street names in the pre-trained NLM and use it to rescore 100-best hypotheses output from the Singapore English speech recognition system. The enriched NLM reduces the word error rate by 6% relative and improves the recognition accuracy of the rare words by 16% absolute as compared to the baseline NLM.

preprint2015arXiv

Fast Super-Resolution Imaging with Ultra-High Labeling Density Achieved by Joint Tagging Super-Resolution Optical Fluctuation Imaging (JT-SOFI)

Previous stochastic localization-based super-resolution techniques are largely limited by the labeling density and the fidelity to the morphology of specimen. We report on an optical super-resolution imaging scheme implementing joint tagging using multiple fluorescent blinking dyes associated with super-resolution optical fluctuation imaging (JT-SOFI), achieving ultra-high labeling density super-resolution imaging. To demonstrate the feasibility of JT-SOFI, quantum dots with different emission spectra were jointly labeled to the tubulin in COS7 cells, creating ultra-high density labeling. After analyzing and combining the fluorescence intermittency images emanating from spectrally resolved quantum dots, the microtubule networks are capable of being investigated with high fidelity and remarkably enhanced contrast at sub-diffraction resolution. The spectral separation also significantly decreased the frame number required for SOFI, enabling fast super-resolution microscopy through simultaneous data acquisition. As the joint-tagging scheme can decrease the labeling density in each spectral channel, we can faithfully reflect the continuous microtubule structure with high resolution through collection of only 100 frames per channel. The improved continuity of the microtubule structure is quantitatively validated with image skeletonization, thus demonstrating the advantage of JT-SOFI over other localization-based super-resolution methods.