Source author record

Chanwoo Kim

Chanwoo Kim appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.AP eess.AS Sound Machine Learning math-ph math.MP Computation and Language eess.SP

Catalog footprint

What is connected

18works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

HiKE: Hierarchical Evaluation Framework for Korean-English Code-Switching Speech Recognition

Despite advances in multilingual automatic speech recognition (ASR), code-switching (CS), the mixing of languages within an utterance common in daily speech, remains a severely underexplored challenge. In this paper, we introduce HiKE: the Hierarchical Korean-English code-switching benchmark, the first globally accessible non-synthetic evaluation framework for Korean-English CS, aiming to provide a means for the precise evaluation of multilingual ASR models and to foster research in the field. The proposed framework not only consists of high-quality, natural CS data across various topics, but also provides meticulous loanword labels and a hierarchical CS-level labeling scheme (word, phrase, and sentence) that together enable a systematic evaluation of a model's ability to handle each distinct level of code-switching. Through evaluations of diverse multilingual ASR models and fine-tuning experiments, this paper demonstrates that although most multilingual ASR models initially exhibit inadequate CS-ASR performance, this capability can be enabled through fine-tuning with synthetic CS data. HiKE is available at https://github.com/ThetaOne-AI/HiKE.

preprint2024arXiv

Macroscopic estimate of the linear Boltzmann and Landau equations with Specular reflection boundary

In this short note, we prove an $L^6$-control of the macroscopic part of the linear Boltzmann and Landau equations. This result is an extension of the test function method of Esposito-Guo-Kim-Marra~\cite{EGKM}\cite{EGKM2} to the specular reflection boundary condition, in which we crucially used the Korn's inequality \cite{DV2} and the system of symmetric Poisson equations \cite{Bernou}.

preprint2022arXiv

Decision Attentive Regularization to Improve Simultaneous Speech Translation Systems

Simultaneous translation systems start producing the output while processing the partial source sentence in the incoming input stream. These systems need to decide when to read more input and when to write the output. These decisions depend on the structure of source/target language and the information contained in the partial input sequence. Hence, read/write decision policy remains the same across different input modalities, i.e., speech and text. This motivates us to leverage the text transcripts corresponding to the speech input for improving simultaneous speech-to-text translation (SimulST). We propose Decision Attentive Regularization (DAR) to improve the decision policy of SimulST systems by using the simultaneous text-to-text translation (SimulMT) task. We also extend several techniques from the offline speech translation domain to explore the role of SimulMT task in improving SimulST performance. Overall, we achieve 34.66% / 4.5 BLEU improvement over the baseline model across different latency regimes for the MuST-C English-German (EnDe) SimulST task.

preprint2022arXiv

Exponential Mixing of Vlasov equations under the effect of Gravity and Boundary

In this paper, we study exponentially fast mixing induced/enhanced by gravity and stochastic boundary in the kinetic theory of Vlasov equations. We consider the Vlasov equations with and without a vertical magnetic field inside a horizontally-periodic 3D half-space equipped with a non-isothermal diffusive reflection boundary condition of bounded continuous boundary temperature at the bottom. We construct both stationary solutions and global-in-time dynamical solutions in $L^\infty$. We prove that moments of a dynamical fluctuation around the steady solutions decay exponentially fast in $L^\infty$. As a key of this proof, we establish a uniform bound of so-called residual measures independently of the bouncing number of stochastic characteristics, by constructing a continuous stationary outgoing boundary flux which is strictly positive almost everywhere.

preprint2022arXiv

Macro-block dropout for improved regularization in training end-to-end speech recognition models

This paper proposes a new regularization algorithm referred to as macro-block dropout. The overfitting issue has been a difficult problem in training large neural network models. The dropout technique has proven to be simple yet very effective for regularization by preventing complex co-adaptations during training. In our work, we define a macro-block that contains a large number of units from the input to a Recurrent Neural Network (RNN). Rather than applying dropout to each unit, we apply random dropout to each macro-block. This algorithm has the effect of applying different drop out rates for each layer even if we keep a constant average dropout rate, which has better regularization effects. In our experiments using Recurrent Neural Network-Transducer (RNN-T), this algorithm shows relatively 4.30 % and 6.13 % Word Error Rates (WERs) improvement over the conventional dropout on LibriSpeech test-clean and test-other. With an Attention-based Encoder-Decoder (AED) model, this algorithm shows relatively 4.36 % and 5.85 % WERs improvement over the conventional dropout on the same test sets.

preprint2022arXiv

Two-Pass End-to-End ASR Model Compression

Speech recognition on smart devices is challenging owing to the small memory footprint. Hence small size ASR models are desirable. With the use of popular transducer-based models, it has become possible to practically deploy streaming speech recognition models on small devices [1]. Recently, the two-pass model [2] combining RNN-T and LAS modules has shown exceptional performance for streaming on-device speech recognition. In this work, we propose a simple and effective approach to reduce the size of the two-pass model for memory-constrained devices. We employ a popular knowledge distillation approach in three stages using the Teacher-Student training technique. In the first stage, we use a trained RNN-T model as a teacher model and perform knowledge distillation to train the student RNN-T model. The second stage uses the shared encoder and trains a LAS rescorer for student model using the trained RNN-T+LAS teacher model. Finally, we perform deep-finetuning for the student model with a shared RNN-T encoder, RNN-T decoder, and LAS rescorer. Our experimental results on standard LibriSpeech dataset show that our system can achieve a high compression rate of 55% without significant degradation in the WER compared to the two-pass teacher model.

preprint2022arXiv

Vorticity convergence from Boltzmann to 2D incompressible Euler equations below Yudovich class

It is challenging to perform a multiscale analysis of mesoscopic systems exhibiting singularities at the macroscopic scale. In this paper, we study the hydrodynamic limit of the Boltzmann equations $$\mathrm{St} \partial_t F + v\cdot \nabla_x F = \frac{1}{\mathrm{Kn}} Q(F ,F ) $$ toward the singular solutions of 2D incompressible Euler equations whose vorticity is unbounded $$\partial_t u + u \cdot \nabla_x u + \nabla_x p = 0,\text{div }u =0.$$ We obtain a microscopic description of the singularity through the so-called kinetic vorticity and understand its behavior in the vicinity of the macroscopic singularity. As a consequence of our new analysis, we settle affirmatively an open problem of the hydrodynamic limit toward Lagrangian solutions of the 2D incompressible Euler equation whose vorticity is unbounded ($ω\in L^\mathfrak{p}$ for any fixed $1 \leq \mathfrak{p} < \infty$). Moreover, we prove the convergence of kinetic vorticities toward the vorticity of the Lagrangian solution of the Euler equation. In particular, we obtain the rate of convergence when the vorticity blows up moderately in $L^\mathfrak{p}$ as $\mathfrak{p} \rightarrow \infty$ (localized Yudovich class).

preprint2020arXiv

Attention based on-device streaming speech recognition with large speech corpus

In this paper, we present a new on-device automatic speech recognition (ASR) system based on monotonic chunk-wise attention (MoChA) models trained with large (> 10K hours) corpus. We attained around 90% of a word recognition rate for general domain mainly by using joint training of connectionist temporal classifier (CTC) and cross entropy (CE) losses, minimum word error rate (MWER) training, layer-wise pre-training and data augmentation methods. In addition, we compressed our models by more than 3.4 times smaller using an iterative hyper low-rank approximation (LRA) method while minimizing the degradation in recognition accuracy. The memory footprint was further reduced with 8-bit quantization to bring down the final model size to lower than 39 MB. For on-demand adaptation, we fused the MoChA models with statistical n-gram models, and we could achieve a relatively 36% improvement on average in word error rate (WER) for target domains including the general domain.

preprint2020arXiv

Data Efficient Direct Speech-to-Text Translation with Modality Agnostic Meta-Learning

End-to-end Speech Translation (ST) models have several advantages such as lower latency, smaller model size, and less error compounding over conventional pipelines that combine Automatic Speech Recognition (ASR) and text Machine Translation (MT) models. However, collecting large amounts of parallel data for ST task is more difficult compared to the ASR and MT tasks. Previous studies have proposed the use of transfer learning approaches to overcome the above difficulty. These approaches benefit from weakly supervised training data, such as ASR speech-to-transcript or MT text-to-text translation pairs. However, the parameters in these models are updated independently of each task, which may lead to sub-optimal solutions. In this work, we adopt a meta-learning algorithm to train a modality agnostic multi-task model that transfers knowledge from source tasks=ASR+MT to target task=ST where ST task severely lacks data. In the meta-learning phase, the parameters of the model are exposed to vast amounts of speech transcripts (e.g., English ASR) and text translations (e.g., English-German MT). During this phase, parameters are updated in such a way to understand speech, text representations, the relation between them, as well as act as a good initialization point for the target ST task. We evaluate the proposed meta-learning approach for ST tasks on English-German (En-De) and English-French (En-Fr) language pairs from the Multilingual Speech Translation Corpus (MuST-C). Our method outperforms the previous transfer learning approaches and sets new state-of-the-art results for En-De and En-Fr ST tasks by obtaining 9.18, and 11.76 BLEU point improvements, respectively.

preprint2020arXiv

Incompressible Euler limit from Boltzmann equation with Diffuse Boundary Condition for Analytic data

A rigorous derivation of the incompressible Euler equations with the no-penetration boundary condition from the Boltzmann equation with the diffuse reflection boundary condition has been a challenging open problem. We settle this open question in the affirmative when the initial data of fluid are well-prepared in a real analytic space, in 3D half space. As a key of this advance we capture the Navier-Stokes equations of $$\textit{viscosity} \sim \frac{\textit{Knudsen number}}{\textit{Mach number}}$$ satisfying the no-slip boundary condition, as an $\textit{intermediary}$ approximation of the Euler equations through a new Hilbert-type expansion of Boltzmann equation with the diffuse reflection boundary condition. Aiming to justify the approximation we establish a novel quantitative $L^p$-$L^\infty$ estimate of the Boltzmann perturbation around a local Maxwellian of such viscous approximation, along with the commutator estimates and the integrability gain of the hydrodynamic part in various spaces; we also establish direct estimates of the Navier-Stokes equations in higher regularity with the aid of the initial-boundary and boundary layer weights using a recent Green's function approach. The incompressible Euler limit follows as a byproduct of our framework.

preprint2020arXiv

Small energy masking for improved neural network training for end-to-end speech recognition

In this paper, we present a Small Energy Masking (SEM) algorithm, which masks inputs having values below a certain threshold. More specifically, a time-frequency bin is masked if the filterbank energy in this bin is less than a certain energy threshold. A uniform distribution is employed to randomly generate the ratio of this energy threshold to the peak filterbank energy of each utterance in decibels. The unmasked feature elements are scaled so that the total sum of the feature values remain the same through this masking procedure. This very simple algorithm shows relatively 11.2 % and 13.5 % Word Error Rate (WER) improvements on the standard LibriSpeech test-clean and test-other sets over the baseline end-to-end speech recognition system. Additionally, compared to the input dropout algorithm, SEM algorithm shows relatively 7.7 % and 11.6 % improvements on the same LibriSpeech test-clean and test-other sets. With a modified shallow-fusion technique with a Transformer LM, we obtained a 2.62 % WER on the LibriSpeech test-clean set and a 7.87 % WER on the LibriSpeech test-other set.

preprint2019arXiv

Improved Multi-Stage Training of Online Attention-based Encoder-Decoder Models

In this paper, we propose a refined multi-stage multi-task training strategy to improve the performance of online attention-based encoder-decoder (AED) models. A three-stage training based on three levels of architectural granularity namely, character encoder, byte pair encoding (BPE) based encoder, and attention decoder, is proposed. Also, multi-task learning based on two-levels of linguistic granularity namely, character and BPE, is used. We explore different pre-training strategies for the encoders including transfer learning from a bidirectional encoder. Our encoder-decoder models with online attention show 35% and 10% relative improvement over their baselines for smaller and bigger models, respectively. Our models achieve a word error rate (WER) of 5.04% and 4.48% on the Librispeech test-clean data for the smaller and bigger models respectively after fusion with long short-term memory (LSTM) based external language model (LM).

preprint2016arXiv

Dynamics and stability of surfactant-driven surface waves

In this paper we consider a layer of incompressible viscous fluid lying above a flat periodic surface in a uniform gravitational field. The upper boundary of the fluid is free and evolves in time. We assume that a mass of surfactants resides on the free surface and evolves in time with the fluid. The surfactants dynamics couple to the fluid dynamics by adjusting the surface tension coefficient on the interface and also through tangential Marangoni stresses caused by gradients in surfactant concentration. We prove that small perturbations of equilibria give rise to global-in-time solutions in an appropriate functional space, and we prove that the solutions return to equilibrium exponentially fast. In particular this proves the asymptotic stability of equilibria.

preprint2012arXiv

Fourier Law and Non-Isothermal Boundary in the Boltzmann Theory

In the study of the heat transfer in the Boltzmann theory, the basic problem is to construct solutions to the steady problem for the Boltzmann equation in a general bounded domain with diffuse reflection boundary conditions corresponding to a non isothermal temperature of the wall. Denoted by δthe size of the temperature oscillations on the boundary, we develop a theory to characterize such a solution mathematically. We construct a unique solution F_s to the Boltzmann equation, which is dynamically asymptotically stable with exponential decay rate. Moreover, if the domain is convex and the temperature of the wall is continuous we show that F_s is continuous away from the grazing set. If the domain is non-convex, discontinuities can form and then propagate along the forward characteristics. We show that they actually form for a suitable smooth temperature profile. We remark that this solution differs from a local equilibrium Maxwellian, hence it is a genuine non equilibrium stationary solution. Our analysis is based on recent studies of the boundary value problems for the Boltzmann equation but with new constructive coercivity estimates for both steady and dynamic cases. A natural question in this setup is to determine if the general Fourier law, stating that the heat conduction vector q is proportional to the temperature gradient, is valid. As an application of our result we establish an expansion in δfor F_s whose first order term F_1 satisfies a linear, parameter free equation. Consequently, we discover that if the Fourier law were valid for F_s, then the temperature of F_1 must be linear in a slab. Such a necessary condition contradicts available numerical simulations, leading to the prediction of break-down of the Fourier law in the kinetic regime.

preprint2011arXiv

Boltzmann Equation with a Large Potential in a Periodic Box

The stability of the Maxwellian of the Boltzmann equation with a large amplitude external potential $Φ$ has been an important open problem. In this paper, we resolve this problem with a large $C3-$potential in a periodic box $\mathbb{T}^d$, $d \geq 3$. We use [1] in $L^p-L^{\infty}$ framework to establish the well-posedness and the $L^{\infty}-$stability of the Maxwellian $μ_E(x,v)=\exp\{-\frac{|v|^2}{2}-Φ(x)\}$.

preprint2011arXiv

The Boltzmann equation near a rotational local Maxwellian

In rotationally symmetric domains, the Boltzmann equation with specular reflection boundary condition has a special type of equilibrium states called the rotational local Maxwellian which, unlike the uniform Maxwellian, has an additional term related to the angular momentum of the gas. In this paper, we consider the initial boundary value problem of the Boltzmann equation near the rotational local Maxwellian. Based on the L2-L1 framework of [12], we establish the global well-posedness and the convergence toward such equilibrium states.

preprint2011arXiv

The viscous surface-internal wave problem: global well-posedness and decay

We consider the free boundary problem for two layers of immiscible, viscous, incompressible fluid in a uniform gravitational field, lying above a general rigid bottom in a three-dimensional horizontally periodic setting. We establish the global well-posedness of the problem both with and without surface tension. We prove that without surface tension the solution decays to the equilibrium state at an almost exponential rate; with surface tension, we show that the solution decays at an exponential rate. Our results include the case in which a heavier fluid lies above a lighter one, provided that the surface tension at the free internal interface is above a critical value, which we identify. This means that sufficiently large surface tension stabilizes the Rayleigh-Taylor instability in the nonlinear setting. As a part of our analysis, we establish elliptic estimates for the two-phase stationary Stokes problem.

preprint2010arXiv

Formation and Propagation of Discontinuity for Boltzmann Equation in Non-Convex Domains

The formation and propagation of singularities for Boltzmann equation in bounded domains has been an important question in numerical studies as well as in theoretical studies. Consider the nonlinear Boltzmann solution near Maxwellians under in-flow, diffuse, or bounce-back boundary conditions. We demonstrate that discontinuity is created at the non-convex part of the grazing boundary, then propagates only along the forward characteristics inside the domain before it hits on the boundary again.

Chanwoo Kim

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

HiKE: Hierarchical Evaluation Framework for Korean-English Code-Switching Speech Recognition

Macroscopic estimate of the linear Boltzmann and Landau equations with Specular reflection boundary

Decision Attentive Regularization to Improve Simultaneous Speech Translation Systems

Exponential Mixing of Vlasov equations under the effect of Gravity and Boundary

Macro-block dropout for improved regularization in training end-to-end speech recognition models

Two-Pass End-to-End ASR Model Compression

Vorticity convergence from Boltzmann to 2D incompressible Euler equations below Yudovich class

Attention based on-device streaming speech recognition with large speech corpus

Data Efficient Direct Speech-to-Text Translation with Modality Agnostic Meta-Learning

Incompressible Euler limit from Boltzmann equation with Diffuse Boundary Condition for Analytic data

Small energy masking for improved neural network training for end-to-end speech recognition

Improved Multi-Stage Training of Online Attention-based Encoder-Decoder Models

Dynamics and stability of surfactant-driven surface waves

Fourier Law and Non-Isothermal Boundary in the Boltzmann Theory

Boltzmann Equation with a Large Potential in a Periodic Box

The Boltzmann equation near a rotational local Maxwellian

The viscous surface-internal wave problem: global well-posedness and decay

Formation and Propagation of Discontinuity for Boltzmann Equation in Non-Convex Domains