Researcher profile

Chanwoo Kim

Chanwoo Kim contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

HiKE: Hierarchical Evaluation Framework for Korean-English Code-Switching Speech Recognition

Despite advances in multilingual automatic speech recognition (ASR), code-switching (CS), the mixing of languages within an utterance common in daily speech, remains a severely underexplored challenge. In this paper, we introduce HiKE: the Hierarchical Korean-English code-switching benchmark, the first globally accessible non-synthetic evaluation framework for Korean-English CS, aiming to provide a means for the precise evaluation of multilingual ASR models and to foster research in the field. The proposed framework not only consists of high-quality, natural CS data across various topics, but also provides meticulous loanword labels and a hierarchical CS-level labeling scheme (word, phrase, and sentence) that together enable a systematic evaluation of a model's ability to handle each distinct level of code-switching. Through evaluations of diverse multilingual ASR models and fine-tuning experiments, this paper demonstrates that although most multilingual ASR models initially exhibit inadequate CS-ASR performance, this capability can be enabled through fine-tuning with synthetic CS data. HiKE is available at https://github.com/ThetaOne-AI/HiKE.

preprint2024arXiv

Macroscopic estimate of the linear Boltzmann and Landau equations with Specular reflection boundary

In this short note, we prove an $L^6$-control of the macroscopic part of the linear Boltzmann and Landau equations. This result is an extension of the test function method of Esposito-Guo-Kim-Marra~\cite{EGKM}\cite{EGKM2} to the specular reflection boundary condition, in which we crucially used the Korn's inequality \cite{DV2} and the system of symmetric Poisson equations \cite{Bernou}.

preprint2022arXiv

Decision Attentive Regularization to Improve Simultaneous Speech Translation Systems

Simultaneous translation systems start producing the output while processing the partial source sentence in the incoming input stream. These systems need to decide when to read more input and when to write the output. These decisions depend on the structure of source/target language and the information contained in the partial input sequence. Hence, read/write decision policy remains the same across different input modalities, i.e., speech and text. This motivates us to leverage the text transcripts corresponding to the speech input for improving simultaneous speech-to-text translation (SimulST). We propose Decision Attentive Regularization (DAR) to improve the decision policy of SimulST systems by using the simultaneous text-to-text translation (SimulMT) task. We also extend several techniques from the offline speech translation domain to explore the role of SimulMT task in improving SimulST performance. Overall, we achieve 34.66% / 4.5 BLEU improvement over the baseline model across different latency regimes for the MuST-C English-German (EnDe) SimulST task.

preprint2022arXiv

Exponential Mixing of Vlasov equations under the effect of Gravity and Boundary

In this paper, we study exponentially fast mixing induced/enhanced by gravity and stochastic boundary in the kinetic theory of Vlasov equations. We consider the Vlasov equations with and without a vertical magnetic field inside a horizontally-periodic 3D half-space equipped with a non-isothermal diffusive reflection boundary condition of bounded continuous boundary temperature at the bottom. We construct both stationary solutions and global-in-time dynamical solutions in $L^\infty$. We prove that moments of a dynamical fluctuation around the steady solutions decay exponentially fast in $L^\infty$. As a key of this proof, we establish a uniform bound of so-called residual measures independently of the bouncing number of stochastic characteristics, by constructing a continuous stationary outgoing boundary flux which is strictly positive almost everywhere.

preprint2022arXiv

Macro-block dropout for improved regularization in training end-to-end speech recognition models

This paper proposes a new regularization algorithm referred to as macro-block dropout. The overfitting issue has been a difficult problem in training large neural network models. The dropout technique has proven to be simple yet very effective for regularization by preventing complex co-adaptations during training. In our work, we define a macro-block that contains a large number of units from the input to a Recurrent Neural Network (RNN). Rather than applying dropout to each unit, we apply random dropout to each macro-block. This algorithm has the effect of applying different drop out rates for each layer even if we keep a constant average dropout rate, which has better regularization effects. In our experiments using Recurrent Neural Network-Transducer (RNN-T), this algorithm shows relatively 4.30 % and 6.13 % Word Error Rates (WERs) improvement over the conventional dropout on LibriSpeech test-clean and test-other. With an Attention-based Encoder-Decoder (AED) model, this algorithm shows relatively 4.36 % and 5.85 % WERs improvement over the conventional dropout on the same test sets.

preprint2022arXiv

Two-Pass End-to-End ASR Model Compression

Speech recognition on smart devices is challenging owing to the small memory footprint. Hence small size ASR models are desirable. With the use of popular transducer-based models, it has become possible to practically deploy streaming speech recognition models on small devices [1]. Recently, the two-pass model [2] combining RNN-T and LAS modules has shown exceptional performance for streaming on-device speech recognition. In this work, we propose a simple and effective approach to reduce the size of the two-pass model for memory-constrained devices. We employ a popular knowledge distillation approach in three stages using the Teacher-Student training technique. In the first stage, we use a trained RNN-T model as a teacher model and perform knowledge distillation to train the student RNN-T model. The second stage uses the shared encoder and trains a LAS rescorer for student model using the trained RNN-T+LAS teacher model. Finally, we perform deep-finetuning for the student model with a shared RNN-T encoder, RNN-T decoder, and LAS rescorer. Our experimental results on standard LibriSpeech dataset show that our system can achieve a high compression rate of 55% without significant degradation in the WER compared to the two-pass teacher model.

preprint2022arXiv

Vorticity convergence from Boltzmann to 2D incompressible Euler equations below Yudovich class

It is challenging to perform a multiscale analysis of mesoscopic systems exhibiting singularities at the macroscopic scale. In this paper, we study the hydrodynamic limit of the Boltzmann equations $$\mathrm{St} \partial_t F + v\cdot \nabla_x F = \frac{1}{\mathrm{Kn}} Q(F ,F ) $$ toward the singular solutions of 2D incompressible Euler equations whose vorticity is unbounded $$\partial_t u + u \cdot \nabla_x u + \nabla_x p = 0,\text{div }u =0.$$ We obtain a microscopic description of the singularity through the so-called kinetic vorticity and understand its behavior in the vicinity of the macroscopic singularity. As a consequence of our new analysis, we settle affirmatively an open problem of the hydrodynamic limit toward Lagrangian solutions of the 2D incompressible Euler equation whose vorticity is unbounded ($ω\in L^\mathfrak{p}$ for any fixed $1 \leq \mathfrak{p} < \infty$). Moreover, we prove the convergence of kinetic vorticities toward the vorticity of the Lagrangian solution of the Euler equation. In particular, we obtain the rate of convergence when the vorticity blows up moderately in $L^\mathfrak{p}$ as $\mathfrak{p} \rightarrow \infty$ (localized Yudovich class).

preprint2020arXiv

Attention based on-device streaming speech recognition with large speech corpus

In this paper, we present a new on-device automatic speech recognition (ASR) system based on monotonic chunk-wise attention (MoChA) models trained with large (> 10K hours) corpus. We attained around 90% of a word recognition rate for general domain mainly by using joint training of connectionist temporal classifier (CTC) and cross entropy (CE) losses, minimum word error rate (MWER) training, layer-wise pre-training and data augmentation methods. In addition, we compressed our models by more than 3.4 times smaller using an iterative hyper low-rank approximation (LRA) method while minimizing the degradation in recognition accuracy. The memory footprint was further reduced with 8-bit quantization to bring down the final model size to lower than 39 MB. For on-demand adaptation, we fused the MoChA models with statistical n-gram models, and we could achieve a relatively 36% improvement on average in word error rate (WER) for target domains including the general domain.

preprint2020arXiv

Data Efficient Direct Speech-to-Text Translation with Modality Agnostic Meta-Learning

End-to-end Speech Translation (ST) models have several advantages such as lower latency, smaller model size, and less error compounding over conventional pipelines that combine Automatic Speech Recognition (ASR) and text Machine Translation (MT) models. However, collecting large amounts of parallel data for ST task is more difficult compared to the ASR and MT tasks. Previous studies have proposed the use of transfer learning approaches to overcome the above difficulty. These approaches benefit from weakly supervised training data, such as ASR speech-to-transcript or MT text-to-text translation pairs. However, the parameters in these models are updated independently of each task, which may lead to sub-optimal solutions. In this work, we adopt a meta-learning algorithm to train a modality agnostic multi-task model that transfers knowledge from source tasks=ASR+MT to target task=ST where ST task severely lacks data. In the meta-learning phase, the parameters of the model are exposed to vast amounts of speech transcripts (e.g., English ASR) and text translations (e.g., English-German MT). During this phase, parameters are updated in such a way to understand speech, text representations, the relation between them, as well as act as a good initialization point for the target ST task. We evaluate the proposed meta-learning approach for ST tasks on English-German (En-De) and English-French (En-Fr) language pairs from the Multilingual Speech Translation Corpus (MuST-C). Our method outperforms the previous transfer learning approaches and sets new state-of-the-art results for En-De and En-Fr ST tasks by obtaining 9.18, and 11.76 BLEU point improvements, respectively.

preprint2020arXiv

Incompressible Euler limit from Boltzmann equation with Diffuse Boundary Condition for Analytic data

A rigorous derivation of the incompressible Euler equations with the no-penetration boundary condition from the Boltzmann equation with the diffuse reflection boundary condition has been a challenging open problem. We settle this open question in the affirmative when the initial data of fluid are well-prepared in a real analytic space, in 3D half space. As a key of this advance we capture the Navier-Stokes equations of $$\textit{viscosity} \sim \frac{\textit{Knudsen number}}{\textit{Mach number}}$$ satisfying the no-slip boundary condition, as an $\textit{intermediary}$ approximation of the Euler equations through a new Hilbert-type expansion of Boltzmann equation with the diffuse reflection boundary condition. Aiming to justify the approximation we establish a novel quantitative $L^p$-$L^\infty$ estimate of the Boltzmann perturbation around a local Maxwellian of such viscous approximation, along with the commutator estimates and the integrability gain of the hydrodynamic part in various spaces; we also establish direct estimates of the Navier-Stokes equations in higher regularity with the aid of the initial-boundary and boundary layer weights using a recent Green&#39;s function approach. The incompressible Euler limit follows as a byproduct of our framework.

preprint2020arXiv

Small energy masking for improved neural network training for end-to-end speech recognition

In this paper, we present a Small Energy Masking (SEM) algorithm, which masks inputs having values below a certain threshold. More specifically, a time-frequency bin is masked if the filterbank energy in this bin is less than a certain energy threshold. A uniform distribution is employed to randomly generate the ratio of this energy threshold to the peak filterbank energy of each utterance in decibels. The unmasked feature elements are scaled so that the total sum of the feature values remain the same through this masking procedure. This very simple algorithm shows relatively 11.2 % and 13.5 % Word Error Rate (WER) improvements on the standard LibriSpeech test-clean and test-other sets over the baseline end-to-end speech recognition system. Additionally, compared to the input dropout algorithm, SEM algorithm shows relatively 7.7 % and 11.6 % improvements on the same LibriSpeech test-clean and test-other sets. With a modified shallow-fusion technique with a Transformer LM, we obtained a 2.62 % WER on the LibriSpeech test-clean set and a 7.87 % WER on the LibriSpeech test-other set.

preprint2019arXiv

Improved Multi-Stage Training of Online Attention-based Encoder-Decoder Models

In this paper, we propose a refined multi-stage multi-task training strategy to improve the performance of online attention-based encoder-decoder (AED) models. A three-stage training based on three levels of architectural granularity namely, character encoder, byte pair encoding (BPE) based encoder, and attention decoder, is proposed. Also, multi-task learning based on two-levels of linguistic granularity namely, character and BPE, is used. We explore different pre-training strategies for the encoders including transfer learning from a bidirectional encoder. Our encoder-decoder models with online attention show 35% and 10% relative improvement over their baselines for smaller and bigger models, respectively. Our models achieve a word error rate (WER) of 5.04% and 4.48% on the Librispeech test-clean data for the smaller and bigger models respectively after fusion with long short-term memory (LSTM) based external language model (LM).