Source author record

Daeyoung Kim

Daeyoung Kim appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning Artificial Intelligence eess.IV Computation and Language Human-Computer Interaction Methodology

Catalog footprint

What is connected

14works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

FIBER: A Differentially Private Optimizer with Filter-Aware Innovation Bias Correction

Differentially private (DP) training protects individual examples by adding noise to gradients, but the injected noise interacts nontrivially with adaptive optimizers. Recent DP methods temporally filter privatized gradients to reduce variance; however, filtering also changes the DP noise statistics seen by AdamW's second-moment accumulator. As a result, bias corrections derived for unfiltered DP noise, such as subtracting sigma_w squared, can become miscalibrated when filtering is present. We propose FiBeR, a DP optimizer designed for temporally filtered privatized gradients. FiBeR (i) performs denoising in innovation space by filtering the residual stream and integrating it to form the filtered gradient estimate, (ii) decouples the two-point observation geometry from the innovation gain to enable independent tuning, and (iii) introduces a filter-aware second-moment calibration that subtracts the attenuated DP noise contribution A(omega) sigma_w squared, where A(omega) is derived in closed form for the innovation filter and can be computed for general stable linear filters. Across vision and language benchmarks, FiBeR consistently demonstrates substantial improvements in the performance of DP optimizers, surpassing state-of-the-art results under equivalent privacy constraints on multiple tasks.

preprint2026arXiv

TB-AVA: Text as a Semantic Bridge for Audio-Visual Parameter Efficient Finetuning

Audio-visual understanding requires effective alignment between heterogeneous modalities, yet cross-modal correspondence remains challenging when temporally aligned audio and visual signals lack clear semantic correspondence. We propose to use text as a semantic anchor for audio-visual representation learning. To this end, we introduce a parameter-efficient adaptation framework built on frozen audio and visual encoders, centered on Text-Bridged Audio-Visual Adapter (TB-AVA), which enables text-mediated interaction between audio and visual streams. At the core of TB-AVA, Gated Semantic Modulation (GSM) selectively modulates feature channels based on text-inferred semantic relevance. We evaluate the proposed approach on multiple benchmarks, including AVE, AVS, and AVVP, where the proposed framework achieves state-of-the-art performance, demonstrating text as an effective semantic anchor for parameter-efficient fine-tuning (PEFT) in audio-visual learning.

preprint2022arXiv

The effectiveness of feature attribution methods and its correlation with automatic evaluation scores

Explaining the decisions of an Artificial Intelligence (AI) model is increasingly critical in many real-world, high-stake applications. Hundreds of papers have either proposed new feature attribution methods, discussed or harnessed these tools in their work. However, despite humans being the target end-users, most attribution methods were only evaluated on proxy automatic-evaluation metrics (Zhang et al. 2018; Zhou et al. 2016; Petsiuk et al. 2018). In this paper, we conduct the first user study to measure attribution map effectiveness in assisting humans in ImageNet classification and Stanford Dogs fine-grained classification, and when an image is natural or adversarial (i.e., contains adversarial perturbations). Overall, feature attribution is surprisingly not more effective than showing humans nearest training-set examples. On a harder task of fine-grained dog categorization, presenting attribution maps to humans does not help, but instead hurts the performance of human-AI teams compared to AI alone. Importantly, we found automatic attribution-map evaluation measures to correlate poorly with the actual human-AI team performance. Our findings encourage the community to rigorously test their methods on the downstream human-in-the-loop applications and to rethink the existing evaluation metrics.

preprint2022arXiv

Uncertainty-Aware Text-to-Program for Question Answering on Structured Electronic Health Records

Question Answering on Electronic Health Records (EHR-QA) has a significant impact on the healthcare domain, and it is being actively studied. Previous research on structured EHR-QA focuses on converting natural language queries into query language such as SQL or SPARQL (NLQ2Query), so the problem scope is limited to pre-defined data types by the specific query language. In order to expand the EHR-QA task beyond this limitation to handle multi-modal medical data and solve complex inference in the future, more primitive systemic language is needed. In this paper, we design the program-based model (NLQ2Program) for EHR-QA as the first step towards the future direction. We tackle MIMICSPARQL*, the graph-based EHR-QA dataset, via a program-based approach in a semi-supervised manner in order to overcome the absence of gold programs. Without the gold program, our proposed model shows comparable performance to the previous state-of-the-art model, which is an NLQ2Query model (0.9% gain). In addition, for a reliable EHR-QA model, we apply the uncertainty decomposition method to measure the ambiguity in the input question. We empirically confirmed data uncertainty is most indicative of the ambiguity in the input question.

preprint2021arXiv

Simple Multi-Resolution Representation Learning for Human Pose Estimation

Human pose estimation - the process of recognizing human keypoints in a given image - is one of the most important tasks in computer vision and has a wide range of applications including movement diagnostics, surveillance, or self-driving vehicle. The accuracy of human keypoint prediction is increasingly improved thanks to the burgeoning development of deep learning. Most existing methods solved human pose estimation by generating heatmaps in which the ith heatmap indicates the location confidence of the ith keypoint. In this paper, we introduce novel network structures referred to as multi-resolution representation learning for human keypoint prediction. At different resolutions in the learning process, our networks branch off and use extra layers to learn heatmap generation. We firstly consider the architectures for generating the multi-resolution heatmaps after obtaining the lowest-resolution feature maps. Our second approach allows learning during the process of feature extraction in which the heatmaps are generated at each resolution of the feature extractor. The first and second approaches are referred to as multi-resolution heatmap learning and multi-resolution feature map learning respectively. Our architectures are simple yet effective, achieving good performance. We conducted experiments on two common benchmarks for human pose estimation: MSCOCO and MPII dataset. The code is made publicly available at https://github.com/tqtrunghnvn/SimMRPose.

preprint2020arXiv

Adversarial Optimal Transport Through The Convolution Of Kernels With Evolving Measures

A novel algorithm is proposed to solve the sample-based optimal transport problem. An adversarial formulation of the push-forward condition uses a test function built as a convolution between an adaptive kernel and an evolving probability distribution $ν$ over a latent variable $b$. Approximating this convolution by its simulation over evolving samples $b^i(t)$ of $ν$, the parameterization of the test function reduces to determining the flow of these samples. This flow, discretized over discrete time steps $t_n$, is built from the composition of elementary maps. The optimal transport also follows a flow that, by duality, must follow the gradient of the test function. The representation of the test function as the Monte Carlo simulation of a distribution makes the algorithm robust to dimensionality, and its evolution under a memory-less flow produces rich, complex maps from simple parametric transformations. The algorithm is illustrated with numerical examples.

preprint2020arXiv

Applying Tensor Decomposition to image for Robustness against Adversarial Attack

Nowadays the deep learning technology is growing faster and shows dramatic performance in computer vision areas. However, it turns out a deep learning based model is highly vulnerable to some small perturbation called an adversarial attack. It can easily fool the deep learning model by adding small perturbations. On the other hand, tensor decomposition method widely uses for compressing the tensor data, including data matrix, image, etc. In this paper, we suggest combining tensor decomposition for defending the model against adversarial example. We verify this idea is simple and effective to resist adversarial attack. In addition, this method rarely degrades the original performance of clean data. We experiment on MNIST, CIFAR10 and ImageNet data and show our method robust on state-of-the-art attack methods.

preprint2020arXiv

ContCap: A scalable framework for continual image captioning

While advanced image captioning systems are increasingly describing images coherently and exactly, recent progress in continual learning allows deep learning models to avoid catastrophic forgetting. However, the domain where image captioning working with continual learning has not yet been explored. We define the task in which we consolidate continual learning and image captioning as continual image captioning. In this work, we propose ContCap, a framework generating captions over a series of new tasks coming, seamlessly integrating continual learning into image captioning besides addressing catastrophic forgetting. After proving forgetting in image captioning, we propose various techniques to overcome the forgetting dilemma by taking a simple fine-tuning schema as the baseline. We split MS-COCO 2014 dataset to perform experiments in class-incremental settings without revisiting dataset of previously provided tasks. Experiments show remarkable improvements in the performance on the old tasks while the figures for the new surprisingly surpass fine-tuning. Our framework also offers a scalable solution for continual image or video captioning.

preprint2020arXiv

DAPAS : Denoising Autoencoder to Prevent Adversarial attack in Semantic Segmentation

Nowadays, Deep learning techniques show dramatic performance on computer vision area, and they even outperform human. But it is also vulnerable to some small perturbation called an adversarial attack. This is a problem combined with the safety of artificial intelligence, which has recently been studied a lot. These attacks have shown that they can fool models of image classification, semantic segmentation, and object detection. We point out this attack can be protected by denoise autoencoder, which is used for denoising the perturbation and restoring the original images. We experiment with various noise distributions and verify the effect of denoise autoencoder against adversarial attack in semantic segmentation.

preprint2020arXiv

Dissecting Catastrophic Forgetting in Continual Learning by Deep Visualization

Interpreting the behaviors of Deep Neural Networks (usually considered as a black box) is critical especially when they are now being widely adopted over diverse aspects of human life. Taking the advancements from Explainable Artificial Intelligent, this paper proposes a novel technique called Auto DeepVis to dissect catastrophic forgetting in continual learning. A new method to deal with catastrophic forgetting named critical freezing is also introduced upon investigating the dilemma by Auto DeepVis. Experiments on a captioning model meticulously present how catastrophic forgetting happens, particularly showing which components are forgetting or changing. The effectiveness of our technique is then assessed; and more precisely, critical freezing claims the best performance on both previous and coming tasks over baselines, proving the capability of the investigation. Our techniques could not only be supplementary to existing solutions for completely eradicating catastrophic forgetting for life-long learning but also explainable.

preprint2020arXiv

T-Net: Nested encoder-decoder architecture for the main vessel segmentation in coronary angiography

In this paper, we proposed T-Net containing a small encoder-decoder inside the encoder-decoder structure (EDiED). T-Net overcomes the limitation that U-Net can only have a single set of the concatenate layer between encoder and decoder block. To be more precise, the U-Net symmetrically forms the concatenate layers, so the low-level feature of the encoder is connected to the latter part of the decoder, and the high-level feature is connected to the beginning of the decoder. T-Net arranges the pooling and up-sampling appropriately during the encoder process, and likewise during the decoding process so that feature-maps of various sizes are obtained in a single block. As a result, all features from the low-level to the high-level extracted from the encoder are delivered from the beginning of the decoder to predict a more accurate mask. We evaluated T-Net for the problem of segmenting three main vessels in coronary angiography images. The experiment consisted of a comparison of U-Net and T-Nets under the same conditions, and an optimized T-Net for the main vessel segmentation. As a result, T-Net recorded a Dice Similarity Coefficient score (DSC) of 0.815, 0.095 higher than that of U-Net, and the optimized T-Net recorded a DSC of 0.890 which was 0.170 higher than that of U-Net. In addition, we visualized the weight activation of the convolutional layer of T-Net and U-Net to show that T-Net actually predicts the mask from earlier decoders. Therefore, we expect that T-Net can be effectively applied to other similar medical image segmentation problems.

preprint2020arXiv

Unbalanced GANs: Pre-training the Generator of Generative Adversarial Network using Variational Autoencoder

We propose Unbalanced GANs, which pre-trains the generator of the generative adversarial network (GAN) using variational autoencoder (VAE). We guarantee the stable training of the generator by preventing the faster convergence of the discriminator at early epochs. Furthermore, we balance between the generator and the discriminator at early epochs and thus maintain the stabilized training of GANs. We apply Unbalanced GANs to well known public datasets and find that Unbalanced GANs reduce mode collapses. We also show that Unbalanced GANs outperform ordinary GANs in terms of stabilized learning, faster convergence and better image quality at early epochs.

preprint2018arXiv

Automated detection of vulnerable plaque in intravascular ultrasound images

Acute Coronary Syndrome (ACS) is a syndrome caused by a decrease in blood flow in the coronary arteries. The ACS is usually related to coronary thrombosis and is primarily caused by plaque rupture followed by plaque erosion and calcified nodule. Thin-cap fibroatheroma (TCFA) is known to be the most similar lesion morphologically to a plaque rupture. In this paper, we propose methods to classify TCFA using various machine learning classifiers including Feed-forward Neural Network (FNN), K-Nearest Neighbor (KNN), Random Forest (RF) and Convolutional Neural Network (CNN) to figure out a classifier that shows optimal TCFA classification accuracy. In addition, we suggest pixel range based feature extraction method to extract the ratio of pixels in the different region of interests to reflect the physician's TCFA discrimination criteria. A total of 12,325 IVUS images were labeled with corresponding OCT images to train and evaluate the classifiers. We achieved 0.884, 0.890, 0.878 and 0.933 Area Under the ROC Curve (AUC) in the order of using FNN, KNN, RF and CNN classifier. As a result, the CNN classifier performed best and the top 10 features of the feature-based classifiers (FNN, KNN, RF) were found to be similar to the physician's TCFA diagnostic criteria.

preprint2016arXiv

Parallel Computing for Copula Parameter Estimation with Big Data: A Simulation Study

Copula-based modeling has seen rapid advances in recent years. However, in big data applications, the lengthy computation time for estimating copula parameters is a major difficulty. Here, we develop a novel method to speed computation time in estimating copula parameters, using communication-free parallel computing. Our procedure partitions full data sets into disjoint independent subsets, performs copula parameter estimation on the subsets, and combines the results to produce an approximation to the full data copula parameter. We show in simulation studies that the computation time is greatly reduced through our method, using three well-known one-parameter bivariate copulas within the elliptical and Archimedean families: Gaussian, Frank and Gumbel. In addition, our simulation studies find small values for estimated bias, estimated mean squared error, and estimated relative L1 and L2 errors for our method, when compared to the full data parameter estimates.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Computer Vision Machine Learning Artificial Intelligence eess.IV Computation and Language Human-Computer Interaction Methodology

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2605.03425:author:5:daeyoung-kim

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.11572:author:6:daeyoung-kim

Imported May 20, 2026Synced May 20, 2026

7 works

Tae Joon Jun

Researcher

Tae Joon Jun contributes to research discovery and scholarly infrastructure.

Open to collaborate

3 works

Giang Nguyen

Researcher

Giang Nguyen contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Jihoon Kweon

Researcher

Jihoon Kweon contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Seungju Cho

Researcher

Seungju Cho contributes to research discovery and scholarly infrastructure.

Open to collaborate

Daeyoung Kim

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

FIBER: A Differentially Private Optimizer with Filter-Aware Innovation Bias Correction

TB-AVA: Text as a Semantic Bridge for Audio-Visual Parameter Efficient Finetuning

The effectiveness of feature attribution methods and its correlation with automatic evaluation scores

Uncertainty-Aware Text-to-Program for Question Answering on Structured Electronic Health Records

Simple Multi-Resolution Representation Learning for Human Pose Estimation

Adversarial Optimal Transport Through The Convolution Of Kernels With Evolving Measures

Applying Tensor Decomposition to image for Robustness against Adversarial Attack

ContCap: A scalable framework for continual image captioning

DAPAS : Denoising Autoencoder to Prevent Adversarial attack in Semantic Segmentation

Dissecting Catastrophic Forgetting in Continual Learning by Deep Visualization

T-Net: Nested encoder-decoder architecture for the main vessel segmentation in coronary angiography

Unbalanced GANs: Pre-training the Generator of Generative Adversarial Network using Variational Autoencoder

Automated detection of vulnerable plaque in intravascular ultrasound images

Parallel Computing for Copula Parameter Estimation with Big Data: A Simulation Study