Source author record

Xiaodong Cui

Xiaodong Cui appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.mes-hall Machine Learning eess.AS Sound cond-mat.mtrl-sci Distributed, Parallel, and Cluster Computing Computation and Language Computer Vision cond-mat.other cond-mat.str-el eess.IV

Catalog footprint

What is connected

20works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization

We report on aggressive quantization strategies that greatly accelerate inference of Recurrent Neural Network Transducers (RNN-T). We use a 4 bit integer representation for both weights and activations and apply Quantization Aware Training (QAT) to retrain the full model (acoustic encoder and language model) and achieve near-iso-accuracy. We show that customized quantization schemes that are tailored to the local properties of the network are essential to achieve good performance while limiting the computational overhead of QAT. Density ratio Language Model fusion has shown remarkable accuracy gains on RNN-T workloads but it severely increases the computational cost of inference. We show that our quantization strategies enable using large beam widths for hypothesis search while achieving streaming-compatible runtimes and a full model compression ratio of 7.6$\times$ compared to the full precision model. Via hardware simulations, we estimate a 3.4$\times$ acceleration from FP16 to INT4 for the end-to-end quantized RNN-T inclusive of LM fusion, resulting in a Real Time Factor (RTF) of 0.06. On the NIST Hub5 2000, Hub5 2001, and RT-03 test sets, we retain most of the gains associated with LM fusion, improving the average WER by $>$1.5%.

preprint2022arXiv

Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing

We introduce two techniques, length perturbation and n-best based label smoothing, to improve generalization of deep neural network (DNN) acoustic models for automatic speech recognition (ASR). Length perturbation is a data augmentation algorithm that randomly drops and inserts frames of an utterance to alter the length of the speech feature sequence. N-best based label smoothing randomly injects noise to ground truth labels during training in order to avoid overfitting, where the noisy labels are generated from n-best hypotheses. We evaluate these two techniques extensively on the 300-hour Switchboard (SWB300) dataset and an in-house 500-hour Japanese (JPN500) dataset using recurrent neural network transducer (RNNT) acoustic models for ASR. We show that both techniques improve the generalization of RNNT models individually and they can also be complementary. In particular, they yield good improvements over a strong SWB300 baseline and give state-of-art performance on SWB300 using RNNT models.

preprint2021arXiv

Federated Acoustic Modeling For Automatic Speech Recognition

Data privacy and protection is a crucial issue for any automatic speech recognition (ASR) service provider when dealing with clients. In this paper, we investigate federated acoustic modeling using data from multiple clients. A client's data is stored on a local data server and the clients communicate only model parameters with a central server, and not their data. The communication happens infrequently to reduce the communication cost. To mitigate the non-iid issue, client adaptive federated training (CAFT) is proposed to canonicalize data across clients. The experiments are carried out on 1,150 hours of speech data from multiple domains. Hybrid LSTM acoustic models are trained via federated learning and their performance is compared to traditional centralized acoustic model training. The experimental results demonstrate the effectiveness of the proposed federated acoustic modeling strategy. We also show that CAFT can further improve the performance of the federated acoustic model.

preprint2021arXiv

Speech Emotion Recognition with Multiscale Area Attention and Data Augmentation

In Speech Emotion Recognition (SER), emotional characteristics often appear in diverse forms of energy patterns in spectrograms. Typical attention neural network classifiers of SER are usually optimized on a fixed attention granularity. In this paper, we apply multiscale area attention in a deep convolutional neural network to attend emotional characteristics with varied granularities and therefore the classifier can benefit from an ensemble of attentions with different scales. To deal with data sparsity, we conduct data augmentation with vocal tract length perturbation (VTLP) to improve the generalization capability of the classifier. Experiments are carried out on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) dataset. We achieved 79.34% weighted accuracy (WA) and 77.54% unweighted accuracy (UA), which, to the best of our knowledge, is the state of the art on this dataset.

preprint2020arXiv

Distributed Training of Deep Neural Network Acoustic Models for Automatic Speech Recognition

The past decade has witnessed great progress in Automatic Speech Recognition (ASR) due to advances in deep learning. The improvements in performance can be attributed to both improved models and large-scale training data. Key to training such models is the employment of efficient distributed learning techniques. In this article, we provide an overview of distributed training techniques for deep neural network acoustic models for ASR. Starting with the fundamentals of data parallel stochastic gradient descent (SGD) and ASR acoustic modeling, we will investigate various distributed training strategies and their realizations in high performance computing (HPC) environments with an emphasis on striking the balance between communication and computation. Experiments are carried out on a popular public benchmark to study the convergence, speedup and recognition performance of the investigated strategies.

preprint2020arXiv

Improving Efficiency in Large-Scale Decentralized Distributed Training

Decentralized Parallel SGD (D-PSGD) and its asynchronous variant Asynchronous Parallel SGD (AD-PSGD) is a family of distributed learning algorithms that have been demonstrated to perform well for large-scale deep learning tasks. One drawback of (A)D-PSGD is that the spectral gap of the mixing matrix decreases when the number of learners in the system increases, which hampers convergence. In this paper, we investigate techniques to accelerate (A)D-PSGD based training by improving the spectral gap while minimizing the communication cost. We demonstrate the effectiveness of our proposed techniques by running experiments on the 2000-hour Switchboard speech recognition task and the ImageNet computer vision task. On an IBM P9 supercomputer, our system is able to train an LSTM acoustic model in 2.28 hours with 7.5% WER on the Hub5-2000 Switchboard (SWB) test set and 13.3% WER on the CallHome (CH) test set using 64 V100 GPUs and in 1.98 hours with 7.7% WER on SWB and 13.3% WER on CH using 128 V100 GPUs, the fastest training time reported to date.

preprint2020arXiv

Many-body effect in optical properties of monolayer molybdenum diselenide

Excitons in monolayer transition metal dichalcogenide (TMD) provide a paradigm of composite Boson in 2D system. This letter reports a photoluminescence and reflectance study of excitons in monolayer molybdenum diselenide (MoSe2) with electrostatic gating. We observe the repulsive and attractive Fermi polaron modes of the band edge exciton, its excited state and the spin-off excitons. Our data validate the polaronic behavior of excitonic states in the system quantitatively where the simple three-particle trion model is insufficient to explain.

preprint2020arXiv

Map Generation from Large Scale Incomplete and Inaccurate Data Labels

Accurately and globally mapping human infrastructure is an important and challenging task with applications in routing, regulation compliance monitoring, and natural disaster response management etc.. In this paper we present progress in developing an algorithmic pipeline and distributed compute system that automates the process of map creation using high resolution aerial images. Unlike previous studies, most of which use datasets that are available only in a few cities across the world, we utilizes publicly available imagery and map data, both of which cover the contiguous United States (CONUS). We approach the technical challenge of inaccurate and incomplete training data adopting state-of-the-art convolutional neural network architectures such as the U-Net and the CycleGAN to incrementally generate maps with increasingly more accurate and more complete labels of man-made infrastructure such as roads and houses. Since scaling the mapping task to CONUS calls for parallelization, we then adopted an asynchronous distributed stochastic parallel gradient descent training scheme to distribute the computational workload onto a cluster of GPUs with nearly linear speed-up.

preprint2020arXiv

Task-Based Learning via Task-Oriented Prediction Network with Applications in Finance

Real-world applications often involve domain-specific and task-based performance objectives that are not captured by the standard machine learning losses, but are critical for decision making. A key challenge for direct integration of more meaningful domain and task-based evaluation criteria into an end-to-end gradient-based training process is the fact that often such performance objectives are not necessarily differentiable and may even require additional decision-making optimization processing. We propose the Task-Oriented Prediction Network (TOPNet), an end-to-end learning scheme that automatically integrates task-based evaluation criteria into the learning process via a learnable surrogate loss function, which directly guides the model towards the task-based goal. A major benefit of the proposed TOPNet learning scheme lies in its capability of automatically integrating non-differentiable evaluation criteria, which makes it particularly suitable for diversified and customized task-based evaluation criteria in real-world tasks. We validate the performance of TOPNet on two real-world financial prediction tasks, revenue surprise forecasting and credit risk modeling. The experimental results demonstrate that TOPNet significantly outperforms both traditional modeling with standard losses and modeling with hand-crafted heuristic differentiable surrogate losses.

preprint2015arXiv

Manipulating Spin-polarized Photocurrent in 2D Transition Metal Dichalcogenides

Manipulating spin polarization of electrons in nonmagnetic semiconductors by means of electric fields or optical fields is an essential theme of the conceptual nonmagnetic semiconductor-based spintronics. Here we experimentally demonstrate a method of generating spin polarization in monolayer transition metal dichalcogenides (TMD) by the circularly polarized optical pumping. The fully spin-polarized photocurrent is achieved through the valley dependent optical selection rules and the spin-valley locking in monolayer WS2, and electrically detected by a lateral spin-valve structure with ferromagnetic contacts. The demonstrated long spin lifetime, the unique valley contrasted physics and the spin-valley locking make monolayer WS2 an unprecedented candidate for semiconductor based spintronics.

preprint2015arXiv

Valley excitons in two-dimensional semiconductors

Monolayer group-VIB transition metal dichalcogenides have recently emerged as a new class of semiconductors in the two-dimensional limit. The attractive properties include: the visible range direct band gap ideal for exploring optoelectronic applications; the intriguing physics associated with spin and valley pseudospin of carriers which implies potentials for novel electronics based on these internal degrees of freedom; the exceptionally strong Coulomb interaction due to the two-dimensional geometry and the large effective masses. The physics of excitons, the bound states of electrons and holes, has been one of the most actively studied topics on these two-dimensional semiconductors, where the excitons exhibit remarkably new features due to the strong Coulomb binding, the valley degeneracy of the band edges, and the valley dependent optical selection rules for interband transitions. Here we give a brief overview of the experimental and theoretical findings on excitons in two-dimensional transition metal dichalcogenides, with focus on the novel properties associated with their valley degrees of freedom.

preprint2014arXiv

Electronic Raman Scattering On Individual Single Walled Carbon Nanotubes

We report experimental measurements of electronic Raman scattering under resonant conditions by electrons in individual single-walled carbon nanotubes (SWNTs). The inelastic Raman scattering at low frequency range reveals a single particle excitation feature and the dispersion of electronic structure around the center of Brillouin zone of a semiconducting SWNT (14, 13) is extracted.

preprint2014arXiv

Exciton Binding Energy of Monolayer WS2

The optical properties of monolayer transition metal dichalcogenides (TMDC) feature prominent excitonic natures. Here we report an experimental approach toward measuring the exciton binding energy of monolayer WS2 with linear differential transmission spectroscopy and two-photon photoluminescence excitation spectroscopy (TP-PLE). TP-PLE measurements show the exciton binding energy of 0.71eV around K valley in the Brillouin zone. The trion binding energy of 34meV, two-photon absorption cross section 4X10^{4}cm^{2}W^{-2}S^{-1} at 780nm and exciton-exciton annihilation rate around 0.5cm^{2}/s are experimentally obtained.

preprint2013arXiv

Low-Frequency Raman Modes and Electronic Excitations In Atomically Thin MoS2 Crystals

Atomically thin MoS$_{2}$ crystals have been recognized as a quasi-2D semiconductor with remarkable physics properties. This letter reports our Raman scattering measurements on multilayer and monolayer MoS$_{2}$, especially in the low-frequency range ($<$50 cm$^{-1}$). We find two low-frequency Raman modes with contrasting thickness dependence. With increasing the number of MoS$_{2}$ layers, one shows a significant increase in frequency while the other decreases following a 1/N (N denotes layer-number) trend. With the aid of first-principle calculations we assign the former as the shear mode $E_{2g}^{2}$ and the latter as the compression vibrational mode. The opposite evolution of the two modes with thickness demonstrates novel vibrational modes in atomically thin crystal as well as a new and more precise way to characterize thickness of atomically thin MoS$_{2}$ films. In addition, we observe a broad feature around 38 cm$^{-1}$ (~5 meV) which is visible only under near-resonance excitation and pinned at the fixed energy independent of thickness. We interpret the feature as an electronic Raman scattering associated with the spin-orbit coupling induced splitting in conduction band at K points in their Brillouin zone.

preprint2013arXiv

Magnetoelectric effects and valley controlled spin quantum gates in transition metal dichalcogenide bilayers

In monolayer group-VI transition metal dichalcogenides (TMDC), charge carriers have spin and valley degrees of freedom, both associated with magnetic moments. On the other hand, the layer degree of freedom in multilayers is associated with electrical polarization. Here, we show that TMDC bilayers offer an unprecedented platform to realize a strong coupling between the spin, layer pseudospin, and valley degrees of freedom of holes. Such coupling not only gives rise to the spin Hall effect and spin circular dichroism in inversion symmetric bilayer, but also leads to a variety of magnetoelectric effects permitting quantum manipulation of these electronic degrees of freedom. Oscillating electric and magnetic fields can both drive the hole spin resonance where the two fields have valley-dependent interference, making possible a prototype interplay between the spin and valley as information carriers for potential valley-spintronic applications. We show how to realize quantum gates on the spin qubit controlled by the valley bit.

preprint2012arXiv

Optical signature of symmetry variations and spin-valley coupling in atomically thin tungsten dichalcogenides

Motivated by the triumph and limitation of graphene for electronic applications, atomically thin layers of group VI transition metal dichalcogenides are attracting extensive interest as a class of graphene-like semiconductors with a desired band-gap in the visible frequency range. The monolayers feature a valence band spin splitting with opposite sign in the two valleys located at corners of 1st Brillouin zone. This spin-valley coupling, particularly pronounced in tungsten dichalcogenides, can benefit potential spintronics and valleytronics with the important consequences of spin-valley interplay and the suppression of spin and valley relaxations. Here we report the first optical studies of WS2 and WSe2 monolayers and multilayers. The efficiency of second harmonic generation shows a dramatic even-odd oscillation with the number of layers, consistent with the presence (absence) of inversion symmetry in even-layer (odd-layer). Photoluminescence (PL) measurements show the crossover from an indirect band gap semiconductor at mutilayers to a direct-gap one at monolayers. The PL spectra and first-principle calculations consistently reveal a spin-valley coupling of 0.4 eV which suppresses interlayer hopping and manifests as a thickness independent splitting pattern at valence band edge near K points. This giant spin-valley coupling, together with the valley dependent physical properties, may lead to rich possibilities for manipulating spin and valley degrees of freedom in these atomically thin 2D materials.

preprint2012arXiv

Valley polarization in MoS2 monolayers by optical pumping

We report experimental evidences on selective occupation of the degenerate valleys in MoS2 monolayers by circularly polarized optical pumping. Over 30% valley polarization has been observed at K and K' valley via the polarization resolved luminescence spectra on pristine MoS2 monolayers. It demonstrates one viable way to generate and detect valley polarization towards the conceptual valleytronics applications with information carried by the valley index.

preprint2010arXiv

Magneto-electric photocurrent generated by direct inter-band transitions in InGaAs/InAlAs two-dimensional electron gas

We report observation of magneto-electric photocurrent generated via direct inter-band transitions in an InGaAs/InAlAs two-dimensional electron gas excited by a linearly polarized incident light.The electric current is proportional to the in-plane magnetic field which unbalances the velocities of the photoexcited carriers with opposite spins and consequently generates electric current from a spin photocurrent. The observed light polarization dependence of the electric current is explained microscopically by taking into account of the anisotropy of the photoexcited carrier density in wave vector space. The spin photocurrent can be extracted from the measured current and the conversion coefficient of spin photocurrent to electric current is estimated to be $10^{-3}$$\sim$$10^{-2}$ per Tesla.

preprint2009arXiv

Determination of the Sign of g factors for Conduction Electrons Using Time-resolved Kerr Rotation

The knowledge of electron g factor is essential for spin manipulation in the field of spintronics and quantum computing. While there exist technical difficulties in determining the sign of g factor in semiconductors by the established magneto-optical spectroscopic methods. We develop a time resolved Kerr rotation technique to precisely measure the sign and the amplitude of electron g factor in semiconductors.

preprint2009arXiv

Spin relaxation in sub-monolayer and monolayer InAs structures grown in GaAs matrix

Electron spin dynamics in InAs/GaAs heterostructures consisting of a single layer of InAs (1/3$\sim$1 monolayer) embeded in (001) and (311)A GaAs matrix was studied by means of time-resolved Kerr rotation spectroscopy. The spin relaxation time of the sub-monolayer InAs samples is significantly enhanced, compared with that of the monolayer InAs sample. We attributed the slowing of the spin relaxation to dimensionally constrained D\textquoteright{}yakonov-Perel\textquoteright{} mechanism in the motional narrowing regime. The electron spin relaxation time and the effective g-factor in sub-monolayer samples were found to be strongly dependent on the photon-generated carrier density. The contribution from both D\textquoteright{}yakonov-Perel\textquoteright{} mechanism and Bir-Aronov-Pikus mechanism were discussed to interpret the temperature dependence of spin decoherence at various carrier densities.

Xiaodong Cui

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization

Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing

Federated Acoustic Modeling For Automatic Speech Recognition

Speech Emotion Recognition with Multiscale Area Attention and Data Augmentation

Distributed Training of Deep Neural Network Acoustic Models for Automatic Speech Recognition

Improving Efficiency in Large-Scale Decentralized Distributed Training

Many-body effect in optical properties of monolayer molybdenum diselenide

Map Generation from Large Scale Incomplete and Inaccurate Data Labels

Task-Based Learning via Task-Oriented Prediction Network with Applications in Finance

Manipulating Spin-polarized Photocurrent in 2D Transition Metal Dichalcogenides

Valley excitons in two-dimensional semiconductors

Electronic Raman Scattering On Individual Single Walled Carbon Nanotubes

Exciton Binding Energy of Monolayer WS2

Low-Frequency Raman Modes and Electronic Excitations In Atomically Thin MoS2 Crystals

Magnetoelectric effects and valley controlled spin quantum gates in transition metal dichalcogenide bilayers

Optical signature of symmetry variations and spin-valley coupling in atomically thin tungsten dichalcogenides

Valley polarization in MoS2 monolayers by optical pumping

Magneto-electric photocurrent generated by direct inter-band transitions in InGaAs/InAlAs two-dimensional electron gas

Determination of the Sign of g factors for Conduction Electrons Using Time-resolved Kerr Rotation

Spin relaxation in sub-monolayer and monolayer InAs structures grown in GaAs matrix