Source author record

Jaeyoung Kim

Jaeyoung Kim appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.AS Sound Artificial Intelligence Computation and Language Computer Vision cond-mat.mtrl-sci cond-mat.str-el eess.IV Machine Learning math-ph math.MP quant-ph Quantitative Methods

Catalog footprint

What is connected

7works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Relevance to Utility: Process-Supervised Rewrite for RAG

Retrieval-augmented generation systems often suffer from a gap between optimizing retrieval relevance and generative utility. With such a gap, retrieved documents may be topically relevant but still lack the content needed for effective reasoning during generation. While existing bridge modules attempt to rewrite the retrieved text for better generation, we show how they fail by not capturing "document utility". In this work, we propose R2U, with a key distinction of approximating true utility through joint observation of rewriting and answering in the reasoning process. To distill, R2U scale such supervision to enhance reliability in distillation. We further construct utility-improvement supervision by measuring the generator's gain of the answer under the rewritten context, yielding signals for fine-tuning and preference optimization. We evaluate our method across multiple open-domain question-answering benchmarks. The empirical results demonstrate consistent improvements over strong bridging baselines

preprint2022arXiv

AI-based automated Meibomian gland segmentation, classification and reflection correction in infrared Meibography

Purpose: Develop a deep learning-based automated method to segment meibomian glands (MG) and eyelids, quantitatively analyze the MG area and MG ratio, estimate the meiboscore, and remove specular reflections from infrared images. Methods: A total of 1600 meibography images were captured in a clinical setting. 1000 images were precisely annotated with multiple revisions by investigators and graded 6 times by meibomian gland dysfunction (MGD) experts. Two deep learning (DL) models were trained separately to segment areas of the MG and eyelid. Those segmentation were used to estimate MG ratio and meiboscores using a classification-based DL model. A generative adversarial network was implemented to remove specular reflections from original images. Results: The mean ratio of MG calculated by investigator annotation and DL segmentation was consistent 26.23% vs 25.12% in the upper eyelids and 32.34% vs. 32.29% in the lower eyelids, respectively. Our DL model achieved 73.01% accuracy for meiboscore classification on validation set and 59.17% accuracy when tested on images from independent center, compared to 53.44% validation accuracy by MGD experts. The DL-based approach successfully removes reflection from the original MG images without affecting meiboscore grading. Conclusions: DL with infrared meibography provides a fully automated, fast quantitative evaluation of MG morphology (MG Segmentation, MG area, MG ratio, and meiboscore) which are sufficiently accurate for diagnosing dry eye disease. Also, the DL removes specular reflection from images to be used by ophthalmologists for distraction-free assessment.

preprint2022arXiv

Contrastive Siamese Network for Semi-supervised Speech Recognition

This paper introduces contrastive siamese (c-siam) network, an architecture for leveraging unlabeled acoustic data in speech recognition. c-siam is the first network that extracts high-level linguistic information from speech by matching outputs of two identical transformer encoders. It contains augmented and target branches which are trained by: (1) masking inputs and matching outputs with a contrastive loss, (2) incorporating a stop gradient operation on the target branch, (3) using an extra learnable transformation on the augmented branch, (4) introducing new temporal augment functions to prevent the shortcut learning problem. We use the Libri-light 60k unsupervised data and the LibriSpeech 100hrs/960hrs supervised data to compare c-siam and other best-performing systems. Our experiments show that c-siam provides 20% relative word error rate improvement over wav2vec baselines. A c-siam network with 450M parameters achieves competitive results compared to the state-of-the-art networks with 600M parameters.

preprint2021arXiv

Irreducibly $SU(2)$-covariant quantum channels of low rank

We investigate information theoretic properties of low rank (less than or equal to 3) quantum channels with $SU(2)$-symmetry, where we have a complete description. We prove that PPT property coincides with entanglement-breaking property and that degradability seldomly holds in this class. In connection with these results we will demonstrate how we can compute Holevo and coherent information of those channels. In particular, we exhibit a strong form of additivity violation of coherent information, which resembles the superactivation of coherent information of depolarizing channels.

preprint2020arXiv

End-to-End Multi-Task Denoising for the Joint Optimization of Perceptual Speech Metrics

Although supervised learning based on a deep neural network has recently achieved substantial improvement on speech enhancement, the existing schemes have either of two critical issues: spectrum or metric mismatches. The spectrum mismatch is a well known issue that any spectrum modification after short-time Fourier transform (STFT), in general, cannot be fully recovered after inverse short-time Fourier transform (ISTFT). The metric mismatch is that a conventional mean square error (MSE) loss function is typically sub-optimal to maximize perceptual speech measure such as signal-to-distortion ratio (SDR), perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI). This paper presents a new end-to-end denoising framework. First, the network optimization is performed on the time-domain signals after ISTFT to avoid the spectrum mismatch. Second, three loss functions based on SDR, PESQ and STOI are proposed to minimize the metric mismatch. The experimental result showed the proposed denoising scheme significantly improved SDR, PESQ and STOI performance over the existing methods. Moreover, the proposed scheme also provided good generalization performance over generative denoising models on the perceptual speech metrics not used as a loss function during training.

preprint2020arXiv

Honeycomb-Lattice Mott insulator on Tantalum Disulphide

Effects of electron many-body interactions amplify in an electronic system with a narrow bandwidth opening a way to exotic physics. A narrow band in a two-dimensional (2D) honeycomb lattice is particularly intriguing as combined with Dirac bands and topological properties but the material realization of a strongly interacting honeycomb lattice described by the Kane-Mele-Hubbard model has not been identified. Here we report a novel approach to realize a 2D honeycomb-lattice narrow-band system with strongly interacting 5$d$ electrons. We engineer a well-known triangular lattice 2D Mott insulator 1T-TaS$_2$ into a honeycomb lattice utilizing an adsorbate superstructure. Potassium (K) adatoms at an optimum coverage deplete one-third of the unpaired $d$ electrons and the remaining electrons form a honeycomb lattice with a very small hopping. Ab initio calculations show extremely narrow Z$_2$ topological bands mimicking the Kane-Mele model. Electron spectroscopy detects an order of magnitude bigger charge gap confirming the substantial electron correlation as confirmed by dynamical mean field theory. It could be the first artificial Mott insulator with a finite spin Chern number.

preprint2020arXiv

T-GSA: Transformer with Gaussian-weighted self-attention for speech enhancement

Transformer neural networks (TNN) demonstrated state-of-art performance on many natural language processing (NLP) tasks, replacing recurrent neural networks (RNNs), such as LSTMs or GRUs. However, TNNs did not perform well in speech enhancement, whose contextual nature is different than NLP tasks, like machine translation. Self-attention is a core building block of the Transformer, which not only enables parallelization of sequence computation, but also provides the constant path length between symbols that is essential to learning long-range dependencies. In this paper, we propose a Transformer with Gaussian-weighted self-attention (T-GSA), whose attention weights are attenuated according to the distance between target and context symbols. The experimental results show that the proposed T-GSA has significantly improved speech-enhancement performance, compared to the Transformer and RNNs.

Jaeyoung Kim

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Relevance to Utility: Process-Supervised Rewrite for RAG

AI-based automated Meibomian gland segmentation, classification and reflection correction in infrared Meibography

Contrastive Siamese Network for Semi-supervised Speech Recognition

Irreducibly $SU(2)$-covariant quantum channels of low rank

End-to-End Multi-Task Denoising for the Joint Optimization of Perceptual Speech Metrics

Honeycomb-Lattice Mott insulator on Tantalum Disulphide

T-GSA: Transformer with Gaussian-weighted self-attention for speech enhancement