Source author record

Thai-Son Nguyen

Thai-Son Nguyen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.AS Sound Computation and Language Computer Vision Machine Learning cond-mat.mes-hall cond-mat.mtrl-sci physics.app-ph

Catalog footprint

What is connected

6works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Shubnikov-de Haas oscillations of two-dimensional electron gases in AlYN/GaN and AlScN/GaN heterostructures

AlYN and AlScN have recently emerged as promising nitride materials that can be integrated with GaN to form two-dimensional electron gases (2DEGs) at heterojunctions. Electron transport properties in these heterostructures have been enhanced through careful design and optimization of epitaxial growth conditions. In this work, we report for the first time Shubnikov-de Haas (SdH) oscillations of 2DEGs in AlYN/GaN and AlScN/GaN heterostructures, grown by metal-organic chemical vapor deposition. SdH oscillations provide direct access to key 2DEG parameters at the Fermi level: (1) carrier density, (2) electron effective mass (m* ~ 0.24 me for AlYN/GaN and m* ~ 0.25 me for AlScN/GaN), and (3) quantum scattering time (~ 68 fs for AlYN/GaN and ~ 70 fs for AlScN/GaN). These measurements of fundamental transport properties provide critical insights for advancing emerging nitride semiconductors for future high-frequency and power electronics.

preprint2020arXiv

ELITR Non-Native Speech Translation at IWSLT 2020

This paper is an ELITR system submission for the non-native speech translation task at IWSLT 2020. We describe systems for offline ASR, real-time ASR, and our cascaded approach to offline SLT and real-time SLT. We select our primary candidates from a pool of pre-existing systems, develop a new end-to-end general ASR system, and a hybrid ASR trained on non-native speech. The provided small validation set prevents us from carrying out a complex validation, but we submit all the unselected candidates for contrastive evaluation on the test set.

preprint2020arXiv

High Performance Sequence-to-Sequence Model for Streaming Speech Recognition

Recently sequence-to-sequence models have started to achieve state-of-the-art performance on standard speech recognition tasks when processing audio data in batch mode, i.e., the complete audio data is available when starting processing. However, when it comes to performing run-on recognition on an input stream of audio data while producing recognition results in real-time and with low word-based latency, these models face several challenges. For many techniques, the whole audio sequence to be decoded needs to be available at the start of the processing, e.g., for the attention mechanism or the bidirectional LSTM (BLSTM). In this paper, we propose several techniques to mitigate these problems. We introduce an additional loss function controlling the uncertainty of the attention mechanism, a modified beam search identifying partial, stable hypotheses, ways of working with BLSTM in the encoder, and the use of chunked BLSTM. Our experiments show that with the right combination of these techniques, it is possible to perform run-on speech recognition with low word-based latency without sacrificing in word error rate performance.

preprint2020arXiv

Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation

Sequence-to-Sequence (S2S) models recently started to show state-of-the-art performance for automatic speech recognition (ASR). With these large and deep models overfitting remains the largest problem, outweighing performance improvements that can be obtained from better architectures. One solution to the overfitting problem is increasing the amount of available training data and the variety exhibited by the training data with the help of data augmentation. In this paper we examine the influence of three data augmentation methods on the performance of two S2S model architectures. One of the data augmentation method comes from literature, while two other methods are our own development - a time perturbation in the frequency domain and sub-sequence sampling. Our experiments on Switchboard and Fisher data show state-of-the-art performance for S2S models that are trained solely on the speech training data and do not use additional text data.

preprint2020arXiv

Relative Positional Encoding for Speech Recognition and Direct Translation

Transformer models are powerful sequence-to-sequence architectures that are capable of directly mapping speech inputs to transcriptions or translations. However, the mechanism for modeling positions in this model was tailored for text modeling, and thus is less ideal for acoustic inputs. In this work, we adapt the relative position encoding scheme to the Speech Transformer, where the key addition is relative distance between input states in the self-attention network. As a result, the network can better adapt to the variable distributions present in speech data. Our experiments show that our resulting model achieves the best recognition result on the Switchboard benchmark in the non-augmentation condition, and the best published result in the MuST-C speech translation benchmark. We also show that this model is able to better utilize synthetic data than the Transformer, and adapts better to variable sentence segmentation quality for speech translation.

preprint2020arXiv

Toward Cross-Domain Speech Recognition with End-to-End Models

In the area of multi-domain speech recognition, research in the past focused on hybrid acoustic models to build cross-domain and domain-invariant speech recognition systems. In this paper, we empirically examine the difference in behavior between hybrid acoustic models and neural end-to-end systems when mixing acoustic training data from several domains. For these experiments we composed a multi-domain dataset from public sources, with the different domains in the corpus covering a wide variety of topics and acoustic conditions such as telephone conversations, lectures, read speech and broadcast news. We show that for the hybrid models, supplying additional training data from other domains with mismatched acoustic conditions does not increase the performance on specific domains. However, our end-to-end models optimized with sequence-based criterion generalize better than the hybrid models on diverse domains. In term of word-error-rate performance, our experimental acoustic-to-word and attention-based models trained on multi-domain dataset reach the performance of domain-specific long short-term memory (LSTM) hybrid models, thus resulting in multi-domain speech recognition systems that do not suffer in performance over domain specific ones. Moreover, the use of neural end-to-end models eliminates the need of domain-adapted language models during recognition, which is a great advantage when the input domain is unknown.

Thai-Son Nguyen

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Shubnikov-de Haas oscillations of two-dimensional electron gases in AlYN/GaN and AlScN/GaN heterostructures

ELITR Non-Native Speech Translation at IWSLT 2020

High Performance Sequence-to-Sequence Model for Streaming Speech Recognition

Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation

Relative Positional Encoding for Speech Recognition and Direct Translation

Toward Cross-Domain Speech Recognition with End-to-End Models