Researcher profile

Daeyoung Kim

Daeyoung Kim contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2026arXiv

FIBER: A Differentially Private Optimizer with Filter-Aware Innovation Bias Correction

Differentially private (DP) training protects individual examples by adding noise to gradients, but the injected noise interacts nontrivially with adaptive optimizers. Recent DP methods temporally filter privatized gradients to reduce variance; however, filtering also changes the DP noise statistics seen by AdamW's second-moment accumulator. As a result, bias corrections derived for unfiltered DP noise, such as subtracting sigma_w squared, can become miscalibrated when filtering is present. We propose FiBeR, a DP optimizer designed for temporally filtered privatized gradients. FiBeR (i) performs denoising in innovation space by filtering the residual stream and integrating it to form the filtered gradient estimate, (ii) decouples the two-point observation geometry from the innovation gain to enable independent tuning, and (iii) introduces a filter-aware second-moment calibration that subtracts the attenuated DP noise contribution A(omega) sigma_w squared, where A(omega) is derived in closed form for the innovation filter and can be computed for general stable linear filters. Across vision and language benchmarks, FiBeR consistently demonstrates substantial improvements in the performance of DP optimizers, surpassing state-of-the-art results under equivalent privacy constraints on multiple tasks.

preprint2026arXiv

TB-AVA: Text as a Semantic Bridge for Audio-Visual Parameter Efficient Finetuning

Audio-visual understanding requires effective alignment between heterogeneous modalities, yet cross-modal correspondence remains challenging when temporally aligned audio and visual signals lack clear semantic correspondence. We propose to use text as a semantic anchor for audio-visual representation learning. To this end, we introduce a parameter-efficient adaptation framework built on frozen audio and visual encoders, centered on Text-Bridged Audio-Visual Adapter (TB-AVA), which enables text-mediated interaction between audio and visual streams. At the core of TB-AVA, Gated Semantic Modulation (GSM) selectively modulates feature channels based on text-inferred semantic relevance. We evaluate the proposed approach on multiple benchmarks, including AVE, AVS, and AVVP, where the proposed framework achieves state-of-the-art performance, demonstrating text as an effective semantic anchor for parameter-efficient fine-tuning (PEFT) in audio-visual learning.

preprint2022arXiv

The effectiveness of feature attribution methods and its correlation with automatic evaluation scores

Explaining the decisions of an Artificial Intelligence (AI) model is increasingly critical in many real-world, high-stake applications. Hundreds of papers have either proposed new feature attribution methods, discussed or harnessed these tools in their work. However, despite humans being the target end-users, most attribution methods were only evaluated on proxy automatic-evaluation metrics (Zhang et al. 2018; Zhou et al. 2016; Petsiuk et al. 2018). In this paper, we conduct the first user study to measure attribution map effectiveness in assisting humans in ImageNet classification and Stanford Dogs fine-grained classification, and when an image is natural or adversarial (i.e., contains adversarial perturbations). Overall, feature attribution is surprisingly not more effective than showing humans nearest training-set examples. On a harder task of fine-grained dog categorization, presenting attribution maps to humans does not help, but instead hurts the performance of human-AI teams compared to AI alone. Importantly, we found automatic attribution-map evaluation measures to correlate poorly with the actual human-AI team performance. Our findings encourage the community to rigorously test their methods on the downstream human-in-the-loop applications and to rethink the existing evaluation metrics.

preprint2022arXiv

Uncertainty-Aware Text-to-Program for Question Answering on Structured Electronic Health Records

Question Answering on Electronic Health Records (EHR-QA) has a significant impact on the healthcare domain, and it is being actively studied. Previous research on structured EHR-QA focuses on converting natural language queries into query language such as SQL or SPARQL (NLQ2Query), so the problem scope is limited to pre-defined data types by the specific query language. In order to expand the EHR-QA task beyond this limitation to handle multi-modal medical data and solve complex inference in the future, more primitive systemic language is needed. In this paper, we design the program-based model (NLQ2Program) for EHR-QA as the first step towards the future direction. We tackle MIMICSPARQL*, the graph-based EHR-QA dataset, via a program-based approach in a semi-supervised manner in order to overcome the absence of gold programs. Without the gold program, our proposed model shows comparable performance to the previous state-of-the-art model, which is an NLQ2Query model (0.9% gain). In addition, for a reliable EHR-QA model, we apply the uncertainty decomposition method to measure the ambiguity in the input question. We empirically confirmed data uncertainty is most indicative of the ambiguity in the input question.

preprint2021arXiv

Simple Multi-Resolution Representation Learning for Human Pose Estimation

Human pose estimation - the process of recognizing human keypoints in a given image - is one of the most important tasks in computer vision and has a wide range of applications including movement diagnostics, surveillance, or self-driving vehicle. The accuracy of human keypoint prediction is increasingly improved thanks to the burgeoning development of deep learning. Most existing methods solved human pose estimation by generating heatmaps in which the ith heatmap indicates the location confidence of the ith keypoint. In this paper, we introduce novel network structures referred to as multi-resolution representation learning for human keypoint prediction. At different resolutions in the learning process, our networks branch off and use extra layers to learn heatmap generation. We firstly consider the architectures for generating the multi-resolution heatmaps after obtaining the lowest-resolution feature maps. Our second approach allows learning during the process of feature extraction in which the heatmaps are generated at each resolution of the feature extractor. The first and second approaches are referred to as multi-resolution heatmap learning and multi-resolution feature map learning respectively. Our architectures are simple yet effective, achieving good performance. We conducted experiments on two common benchmarks for human pose estimation: MSCOCO and MPII dataset. The code is made publicly available at https://github.com/tqtrunghnvn/SimMRPose.

preprint2020arXiv

Adversarial Optimal Transport Through The Convolution Of Kernels With Evolving Measures

A novel algorithm is proposed to solve the sample-based optimal transport problem. An adversarial formulation of the push-forward condition uses a test function built as a convolution between an adaptive kernel and an evolving probability distribution $ν$ over a latent variable $b$. Approximating this convolution by its simulation over evolving samples $b^i(t)$ of $ν$, the parameterization of the test function reduces to determining the flow of these samples. This flow, discretized over discrete time steps $t_n$, is built from the composition of elementary maps. The optimal transport also follows a flow that, by duality, must follow the gradient of the test function. The representation of the test function as the Monte Carlo simulation of a distribution makes the algorithm robust to dimensionality, and its evolution under a memory-less flow produces rich, complex maps from simple parametric transformations. The algorithm is illustrated with numerical examples.

preprint2020arXiv

Applying Tensor Decomposition to image for Robustness against Adversarial Attack

Nowadays the deep learning technology is growing faster and shows dramatic performance in computer vision areas. However, it turns out a deep learning based model is highly vulnerable to some small perturbation called an adversarial attack. It can easily fool the deep learning model by adding small perturbations. On the other hand, tensor decomposition method widely uses for compressing the tensor data, including data matrix, image, etc. In this paper, we suggest combining tensor decomposition for defending the model against adversarial example. We verify this idea is simple and effective to resist adversarial attack. In addition, this method rarely degrades the original performance of clean data. We experiment on MNIST, CIFAR10 and ImageNet data and show our method robust on state-of-the-art attack methods.

preprint2020arXiv

ContCap: A scalable framework for continual image captioning

While advanced image captioning systems are increasingly describing images coherently and exactly, recent progress in continual learning allows deep learning models to avoid catastrophic forgetting. However, the domain where image captioning working with continual learning has not yet been explored. We define the task in which we consolidate continual learning and image captioning as continual image captioning. In this work, we propose ContCap, a framework generating captions over a series of new tasks coming, seamlessly integrating continual learning into image captioning besides addressing catastrophic forgetting. After proving forgetting in image captioning, we propose various techniques to overcome the forgetting dilemma by taking a simple fine-tuning schema as the baseline. We split MS-COCO 2014 dataset to perform experiments in class-incremental settings without revisiting dataset of previously provided tasks. Experiments show remarkable improvements in the performance on the old tasks while the figures for the new surprisingly surpass fine-tuning. Our framework also offers a scalable solution for continual image or video captioning.

preprint2020arXiv

DAPAS : Denoising Autoencoder to Prevent Adversarial attack in Semantic Segmentation

Nowadays, Deep learning techniques show dramatic performance on computer vision area, and they even outperform human. But it is also vulnerable to some small perturbation called an adversarial attack. This is a problem combined with the safety of artificial intelligence, which has recently been studied a lot. These attacks have shown that they can fool models of image classification, semantic segmentation, and object detection. We point out this attack can be protected by denoise autoencoder, which is used for denoising the perturbation and restoring the original images. We experiment with various noise distributions and verify the effect of denoise autoencoder against adversarial attack in semantic segmentation.

preprint2020arXiv

Dissecting Catastrophic Forgetting in Continual Learning by Deep Visualization

Interpreting the behaviors of Deep Neural Networks (usually considered as a black box) is critical especially when they are now being widely adopted over diverse aspects of human life. Taking the advancements from Explainable Artificial Intelligent, this paper proposes a novel technique called Auto DeepVis to dissect catastrophic forgetting in continual learning. A new method to deal with catastrophic forgetting named critical freezing is also introduced upon investigating the dilemma by Auto DeepVis. Experiments on a captioning model meticulously present how catastrophic forgetting happens, particularly showing which components are forgetting or changing. The effectiveness of our technique is then assessed; and more precisely, critical freezing claims the best performance on both previous and coming tasks over baselines, proving the capability of the investigation. Our techniques could not only be supplementary to existing solutions for completely eradicating catastrophic forgetting for life-long learning but also explainable.

preprint2020arXiv

T-Net: Nested encoder-decoder architecture for the main vessel segmentation in coronary angiography

In this paper, we proposed T-Net containing a small encoder-decoder inside the encoder-decoder structure (EDiED). T-Net overcomes the limitation that U-Net can only have a single set of the concatenate layer between encoder and decoder block. To be more precise, the U-Net symmetrically forms the concatenate layers, so the low-level feature of the encoder is connected to the latter part of the decoder, and the high-level feature is connected to the beginning of the decoder. T-Net arranges the pooling and up-sampling appropriately during the encoder process, and likewise during the decoding process so that feature-maps of various sizes are obtained in a single block. As a result, all features from the low-level to the high-level extracted from the encoder are delivered from the beginning of the decoder to predict a more accurate mask. We evaluated T-Net for the problem of segmenting three main vessels in coronary angiography images. The experiment consisted of a comparison of U-Net and T-Nets under the same conditions, and an optimized T-Net for the main vessel segmentation. As a result, T-Net recorded a Dice Similarity Coefficient score (DSC) of 0.815, 0.095 higher than that of U-Net, and the optimized T-Net recorded a DSC of 0.890 which was 0.170 higher than that of U-Net. In addition, we visualized the weight activation of the convolutional layer of T-Net and U-Net to show that T-Net actually predicts the mask from earlier decoders. Therefore, we expect that T-Net can be effectively applied to other similar medical image segmentation problems.

preprint2020arXiv

Unbalanced GANs: Pre-training the Generator of Generative Adversarial Network using Variational Autoencoder

We propose Unbalanced GANs, which pre-trains the generator of the generative adversarial network (GAN) using variational autoencoder (VAE). We guarantee the stable training of the generator by preventing the faster convergence of the discriminator at early epochs. Furthermore, we balance between the generator and the discriminator at early epochs and thus maintain the stabilized training of GANs. We apply Unbalanced GANs to well known public datasets and find that Unbalanced GANs reduce mode collapses. We also show that Unbalanced GANs outperform ordinary GANs in terms of stabilized learning, faster convergence and better image quality at early epochs.

preprint2018arXiv

Automated detection of vulnerable plaque in intravascular ultrasound images

Acute Coronary Syndrome (ACS) is a syndrome caused by a decrease in blood flow in the coronary arteries. The ACS is usually related to coronary thrombosis and is primarily caused by plaque rupture followed by plaque erosion and calcified nodule. Thin-cap fibroatheroma (TCFA) is known to be the most similar lesion morphologically to a plaque rupture. In this paper, we propose methods to classify TCFA using various machine learning classifiers including Feed-forward Neural Network (FNN), K-Nearest Neighbor (KNN), Random Forest (RF) and Convolutional Neural Network (CNN) to figure out a classifier that shows optimal TCFA classification accuracy. In addition, we suggest pixel range based feature extraction method to extract the ratio of pixels in the different region of interests to reflect the physician's TCFA discrimination criteria. A total of 12,325 IVUS images were labeled with corresponding OCT images to train and evaluate the classifiers. We achieved 0.884, 0.890, 0.878 and 0.933 Area Under the ROC Curve (AUC) in the order of using FNN, KNN, RF and CNN classifier. As a result, the CNN classifier performed best and the top 10 features of the feature-based classifiers (FNN, KNN, RF) were found to be similar to the physician's TCFA diagnostic criteria.