Source author record

Sungsoo Kim

Sungsoo Kim appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

astro-ph.GA eess.AS Sound Machine Learning

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Two-Pass End-to-End ASR Model Compression

Speech recognition on smart devices is challenging owing to the small memory footprint. Hence small size ASR models are desirable. With the use of popular transducer-based models, it has become possible to practically deploy streaming speech recognition models on small devices [1]. Recently, the two-pass model [2] combining RNN-T and LAS modules has shown exceptional performance for streaming on-device speech recognition. In this work, we propose a simple and effective approach to reduce the size of the two-pass model for memory-constrained devices. We employ a popular knowledge distillation approach in three stages using the Teacher-Student training technique. In the first stage, we use a trained RNN-T model as a teacher model and perform knowledge distillation to train the student RNN-T model. The second stage uses the shared encoder and trains a LAS rescorer for student model using the trained RNN-T+LAS teacher model. Finally, we perform deep-finetuning for the student model with a shared RNN-T encoder, RNN-T decoder, and LAS rescorer. Our experimental results on standard LibriSpeech dataset show that our system can achieve a high compression rate of 55% without significant degradation in the WER compared to the two-pass teacher model.

preprint2020arXiv

Attention based on-device streaming speech recognition with large speech corpus

In this paper, we present a new on-device automatic speech recognition (ASR) system based on monotonic chunk-wise attention (MoChA) models trained with large (> 10K hours) corpus. We attained around 90% of a word recognition rate for general domain mainly by using joint training of connectionist temporal classifier (CTC) and cross entropy (CE) losses, minimum word error rate (MWER) training, layer-wise pre-training and data augmentation methods. In addition, we compressed our models by more than 3.4 times smaller using an iterative hyper low-rank approximation (LRA) method while minimizing the degradation in recognition accuracy. The memory footprint was further reduced with 8-bit quantization to bring down the final model size to lower than 39 MB. For on-demand adaptation, we fused the MoChA models with statistical n-gram models, and we could achieve a relatively 36% improvement on average in word error rate (WER) for target domains including the general domain.

preprint2014arXiv

Formation of Warped Disks by Galactic Fly-by Encounters. I. Stellar Disks

Warped disks are almost ubiquitous among spiral galaxies. Here we revisit and test the `fly-by scenario' of warp formation, in which impulsive encounters between galaxies are responsible for warped disks. Based on N-body simulations, we investigate the morphological and kinematical evolution of the stellar component of disks when galaxies undergo fly-by interactions with adjacent dark matter halos. We find that the so-called `S'-shaped warps can be excited by fly-bys and sustained for even up to a few billion years, and that this scenario provides a cohesive explanation for several key observations. We show that disk warp properties are governed primarily by the following three parameters; (1) the impact parameter, i.e., the minimum distance between two halos, (2) the mass ratio between two halos, and (3) the incident angle of the fly-by perturber. The warp angle is tied up with all three parameters, yet the warp lifetime is particularly sensitive to the incident angle of the perturber. Interestingly, the modeled S-shaped warps are often non-symmetric depending on the incident angle. We speculate that the puzzling U- and L-shaped warps are geometrically superimposed S-types produced by successive fly-bys with different incident angles, including multiple interactions with a satellite on a highly elongated orbit.

preprint2012arXiv

Initial Size Distribution of the Galactic Globular Cluster System

Despite the importance of their size evolution in understanding the dynamical evolution of globular clusters (GCs) of the Milky Way, studies are rare that focus specifically on this issue. Based on the advanced, realistic Fokker-Planck (FP) approach, we predict theoretically the initial size distribution (SD) of the Galactic GCs along with their initial mass function and radial distribution. Over one thousand FP calculations in a wide parameter space have pinpointed the best-fit initial conditions for the SD, mass function, and radial distribution. Our best-fit model shows that the initial SD of the Galactic GCs is of larger dispersion than today's SD, and that typical projected half-light radius of the initial GCs is ~4.6 pc, which is 1.8 times larger than that of the present-day GCs (~2.5 pc). Their large size signifies greater susceptibility to the Galactic tides: the total mass of destroyed GCs reaches 3-5x10^8 M_sun$, several times larger than the previous estimates. Our result challenges a recent view that the Milky Way GCs were born compact on the sub-pc scale, and rather implies that (1) the initial GCs are generally larger than the typical size of the present-day GCs, (2) the initially large GCs mostly shrink and/or disrupt as a result of the galactic tides, and (3) the initially small GCs expand by two-body relaxation, and later shrink by the galactic tides.