Researcher profile

Hyunwoo Kim

Hyunwoo Kim contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2026arXiv

DeltaPrompts: Escaping the Zero-Delta Trap in Multimodal Distillation

Distillation enables compact Vision-Language Models (VLMs) to obtain strong reasoning capabilities, yet the prompts driving this process are typically chosen via simple heuristics or aggregated from off-the-shelf datasets. We reveal a critical inefficiency in this approach: up to 69% of the prompts in standard chart / document reasoning datasets are effectively zero-delta, meaning the teacher and student already induce the exact same answer distribution. Training on these prompts provides minimal learning signal, causing student improvement to rapidly saturate regardless of data scale. To escape the zero-delta trap, we return to first principles: distillation fundamentally minimizes distributional divergence, and thus a prompt is valuable only if it exposes a functional capability gap between the teacher and student. We quantify this gap through answer divergence ($Δ$), demonstrating that non-zero divergence is critical for effective scaling. Building on this insight, we propose a staged synthesis pipeline that repurposes existing datasets as seeds, actively targeting student failure modes to produce better prompts. The result is DeltaPrompts, a diverse dataset of 200k synthetic, high-divergence reasoning problems. We evaluate DeltaPrompts across three distinct settings: on-policy distillation with the target teacher-student pair, transfer to a novel model family without regenerating the data, and off-policy fine-tuning of a non-reasoning model. Across all scenarios, DeltaPrompts drives substantial gains, yielding up to 15% relative improvement even on top of a highly-optimized reasoning model (e.g., Qwen3-VL-8B-Thinking) -- averaged over 10 benchmarks spanning chart, document and perception-centric reasoning.

preprint2022arXiv

Bridging the Gap between Classification and Localization for Weakly Supervised Object Localization

Weakly supervised object localization aims to find a target object region in a given image with only weak supervision, such as image-level labels. Most existing methods use a class activation map (CAM) to generate a localization map; however, a CAM identifies only the most discriminative parts of a target object rather than the entire object region. In this work, we find the gap between classification and localization in terms of the misalignment of the directions between an input feature and a class-specific weight. We demonstrate that the misalignment suppresses the activation of CAM in areas that are less discriminative but belong to the target object. To bridge the gap, we propose a method to align feature directions with a class-specific weight. The proposed method achieves a state-of-the-art localization performance on the CUB-200-2011 and ImageNet-1K benchmarks.

preprint2022arXiv

Conformational heterogeneity of molecules physisorbed on a gold surface at room temperature

A quantitative single-molecule tip-enhanced Raman spectroscopy (TERS) study at room temperature remained a challenge due to the rapid structural dynamics of molecules exposed to air. Here, we demonstrate the hyperspectral TERS imaging of single or a few brilliant cresyl blue (BCB) molecules at room temperature, along with quantitative spectral analyses. Robust chemical imaging is enabled by the freeze-frame approach using a thin Al$_{2}$O$_{3}$ capping layer, which suppresses spectral diffusions and inhibits chemical reactions and contaminations in air. For the molecules resolved spatially in the TERS image, a clear Raman peak variation up to 7.5 cm$^{-1}$ is observed, which cannot be found in molecular ensembles. From density functional theory-based quantitative analyses of the varied TERS peaks, we reveal the conformational heterogeneity at the single-molecule level. This work provides a facile way to investigate the single-molecule properties in interacting media, expanding the scope of single-molecule vibrational spectroscopy studies.

preprint2022arXiv

Freeze-frame approach for robust single-molecule tip-enhnaced Raman spectroscopy at room temperature

A quantitative single-molecule tip-enhanced Raman spectroscopy (TERS) study at room temperature remained a challenge due to the rapid structural dynamics of molecules exposed to air. Here, we demonstrate the single-molecule level hyperspectral TERS imaging of brilliant cresyl blue (BCB) at room temperature for the first time, along with quantitative spectral analyses. Freeze-frame approach using a thin Al2O3 capping layer, which suppresses spectral diffusions and inhibits chemical reactions and contaminations in air, enabled reliable and robust chemical imaging. For the molecules resolved spatially in the TERS image, a clear Raman peak variation up to 7.5 cm-1 is observed, which cannot be found in molecular ensembles. From density functional theory-based quantitative analyses of the varied TERS peaks, we reveal the conformational heterogeneity at the single-molecule level. This work provides a facile way to investigate the single-molecule properties in interacting media, expanding the scope of single-molecule vibrational spectroscopy.

preprint2022arXiv

Perception Prioritized Training of Diffusion Models

Diffusion models learn to restore noisy data, which is corrupted with different levels of noise, by optimizing the weighted sum of the corresponding loss terms, i.e., denoising score matching loss. In this paper, we show that restoring data corrupted with certain noise levels offers a proper pretext task for the model to learn rich visual concepts. We propose to prioritize such noise levels over other levels during training, by redesigning the weighting scheme of the objective function. We show that our simple redesign of the weighting scheme significantly improves the performance of diffusion models regardless of the datasets, architectures, and sampling strategies.

preprint2022arXiv

VISOLO: Grid-Based Space-Time Aggregation for Efficient Online Video Instance Segmentation

For online video instance segmentation (VIS), fully utilizing the information from previous frames in an efficient manner is essential for real-time applications. Most previous methods follow a two-stage approach requiring additional computations such as RPN and RoIAlign, and do not fully exploit the available information in the video for all subtasks in VIS. In this paper, we propose a novel single-stage framework for online VIS built based on the grid structured feature representation. The grid-based features allow us to employ fully convolutional networks for real-time processing, and also to easily reuse and share features within different components. We also introduce cooperatively operating modules that aggregate information from available frames, in order to enrich the features for all subtasks in VIS. Our design fully takes advantage of previous information in a grid form for all tasks in VIS in an efficient way, and we achieved the new state-of-the-art accuracy (38.6 AP and 36.9 AP) and speed (40.0 FPS) on YouTube-VIS 2019 and 2021 datasets among online VIS methods. The code is available at https://github.com/SuHoHan95/VISOLO.

preprint2021arXiv

Ada-SISE: Adaptive Semantic Input Sampling for Efficient Explanation of Convolutional Neural Networks

Explainable AI (XAI) is an active research area to interpret a neural network's decision by ensuring transparency and trust in the task-specified learned models. Recently, perturbation-based model analysis has shown better interpretation, but backpropagation techniques are still prevailing because of their computational efficiency. In this work, we combine both approaches as a hybrid visual explanation algorithm and propose an efficient interpretation method for convolutional neural networks. Our method adaptively selects the most critical features that mainly contribute towards a prediction to probe the model by finding the activated features. Experimental results show that the proposed method can reduce the execution time up to 30% while enhancing competitive interpretability without compromising the quality of explanation generated.

preprint2021arXiv

Integrated Grad-CAM: Sensitivity-Aware Visual Explanation of Deep Convolutional Networks via Integrated Gradient-Based Scoring

Visualizing the features captured by Convolutional Neural Networks (CNNs) is one of the conventional approaches to interpret the predictions made by these models in numerous image recognition applications. Grad-CAM is a popular solution that provides such a visualization by combining the activation maps obtained from the model. However, the average gradient-based terms deployed in this method underestimates the contribution of the representations discovered by the model to its predictions. Addressing this problem, we introduce a solution to tackle this issue by computing the path integral of the gradient-based terms in Grad-CAM. We conduct a thorough analysis to demonstrate the improvement achieved by our method in measuring the importance of the extracted representations for the CNN's predictions, which yields to our method's administration in object localization and model interpretation.

preprint2020arXiv

Batch-level Experience Replay with Review for Continual Learning

Continual learning is a branch of deep learning that seeks to strike a balance between learning stability and plasticity. The CVPR 2020 CLVision Continual Learning for Computer Vision challenge is dedicated to evaluating and advancing the current state-of-the-art continual learning methods using the CORe50 dataset with three different continual learning scenarios. This paper presents our approach, called Batch-level Experience Replay with Review, to this challenge. Our team achieved the 1'st place in all three scenarios out of 79 participated teams. The codebase of our implementation is publicly available at https://github.com/RaptorMai/CVPR20_CLVision_challenge