Source author record

Hyunwoo Kim

Hyunwoo Kim appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning Artificial Intelligence physics.chem-ph Computation and Language hep-ex physics.optics

Catalog footprint

What is connected

10works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

DeltaPrompts: Escaping the Zero-Delta Trap in Multimodal Distillation

Distillation enables compact Vision-Language Models (VLMs) to obtain strong reasoning capabilities, yet the prompts driving this process are typically chosen via simple heuristics or aggregated from off-the-shelf datasets. We reveal a critical inefficiency in this approach: up to 69% of the prompts in standard chart / document reasoning datasets are effectively zero-delta, meaning the teacher and student already induce the exact same answer distribution. Training on these prompts provides minimal learning signal, causing student improvement to rapidly saturate regardless of data scale. To escape the zero-delta trap, we return to first principles: distillation fundamentally minimizes distributional divergence, and thus a prompt is valuable only if it exposes a functional capability gap between the teacher and student. We quantify this gap through answer divergence ($Δ$), demonstrating that non-zero divergence is critical for effective scaling. Building on this insight, we propose a staged synthesis pipeline that repurposes existing datasets as seeds, actively targeting student failure modes to produce better prompts. The result is DeltaPrompts, a diverse dataset of 200k synthetic, high-divergence reasoning problems. We evaluate DeltaPrompts across three distinct settings: on-policy distillation with the target teacher-student pair, transfer to a novel model family without regenerating the data, and off-policy fine-tuning of a non-reasoning model. Across all scenarios, DeltaPrompts drives substantial gains, yielding up to 15% relative improvement even on top of a highly-optimized reasoning model (e.g., Qwen3-VL-8B-Thinking) -- averaged over 10 benchmarks spanning chart, document and perception-centric reasoning.

preprint2022arXiv

Bridging the Gap between Classification and Localization for Weakly Supervised Object Localization

Weakly supervised object localization aims to find a target object region in a given image with only weak supervision, such as image-level labels. Most existing methods use a class activation map (CAM) to generate a localization map; however, a CAM identifies only the most discriminative parts of a target object rather than the entire object region. In this work, we find the gap between classification and localization in terms of the misalignment of the directions between an input feature and a class-specific weight. We demonstrate that the misalignment suppresses the activation of CAM in areas that are less discriminative but belong to the target object. To bridge the gap, we propose a method to align feature directions with a class-specific weight. The proposed method achieves a state-of-the-art localization performance on the CUB-200-2011 and ImageNet-1K benchmarks.

preprint2022arXiv

Conformational heterogeneity of molecules physisorbed on a gold surface at room temperature

A quantitative single-molecule tip-enhanced Raman spectroscopy (TERS) study at room temperature remained a challenge due to the rapid structural dynamics of molecules exposed to air. Here, we demonstrate the hyperspectral TERS imaging of single or a few brilliant cresyl blue (BCB) molecules at room temperature, along with quantitative spectral analyses. Robust chemical imaging is enabled by the freeze-frame approach using a thin Al$_{2}$O$_{3}$ capping layer, which suppresses spectral diffusions and inhibits chemical reactions and contaminations in air. For the molecules resolved spatially in the TERS image, a clear Raman peak variation up to 7.5 cm$^{-1}$ is observed, which cannot be found in molecular ensembles. From density functional theory-based quantitative analyses of the varied TERS peaks, we reveal the conformational heterogeneity at the single-molecule level. This work provides a facile way to investigate the single-molecule properties in interacting media, expanding the scope of single-molecule vibrational spectroscopy studies.

preprint2022arXiv

Freeze-frame approach for robust single-molecule tip-enhnaced Raman spectroscopy at room temperature

A quantitative single-molecule tip-enhanced Raman spectroscopy (TERS) study at room temperature remained a challenge due to the rapid structural dynamics of molecules exposed to air. Here, we demonstrate the single-molecule level hyperspectral TERS imaging of brilliant cresyl blue (BCB) at room temperature for the first time, along with quantitative spectral analyses. Freeze-frame approach using a thin Al2O3 capping layer, which suppresses spectral diffusions and inhibits chemical reactions and contaminations in air, enabled reliable and robust chemical imaging. For the molecules resolved spatially in the TERS image, a clear Raman peak variation up to 7.5 cm-1 is observed, which cannot be found in molecular ensembles. From density functional theory-based quantitative analyses of the varied TERS peaks, we reveal the conformational heterogeneity at the single-molecule level. This work provides a facile way to investigate the single-molecule properties in interacting media, expanding the scope of single-molecule vibrational spectroscopy.

preprint2022arXiv

Perception Prioritized Training of Diffusion Models

Diffusion models learn to restore noisy data, which is corrupted with different levels of noise, by optimizing the weighted sum of the corresponding loss terms, i.e., denoising score matching loss. In this paper, we show that restoring data corrupted with certain noise levels offers a proper pretext task for the model to learn rich visual concepts. We propose to prioritize such noise levels over other levels during training, by redesigning the weighting scheme of the objective function. We show that our simple redesign of the weighting scheme significantly improves the performance of diffusion models regardless of the datasets, architectures, and sampling strategies.

preprint2022arXiv

VISOLO: Grid-Based Space-Time Aggregation for Efficient Online Video Instance Segmentation

For online video instance segmentation (VIS), fully utilizing the information from previous frames in an efficient manner is essential for real-time applications. Most previous methods follow a two-stage approach requiring additional computations such as RPN and RoIAlign, and do not fully exploit the available information in the video for all subtasks in VIS. In this paper, we propose a novel single-stage framework for online VIS built based on the grid structured feature representation. The grid-based features allow us to employ fully convolutional networks for real-time processing, and also to easily reuse and share features within different components. We also introduce cooperatively operating modules that aggregate information from available frames, in order to enrich the features for all subtasks in VIS. Our design fully takes advantage of previous information in a grid form for all tasks in VIS in an efficient way, and we achieved the new state-of-the-art accuracy (38.6 AP and 36.9 AP) and speed (40.0 FPS) on YouTube-VIS 2019 and 2021 datasets among online VIS methods. The code is available at https://github.com/SuHoHan95/VISOLO.

preprint2021arXiv

Ada-SISE: Adaptive Semantic Input Sampling for Efficient Explanation of Convolutional Neural Networks

Explainable AI (XAI) is an active research area to interpret a neural network's decision by ensuring transparency and trust in the task-specified learned models. Recently, perturbation-based model analysis has shown better interpretation, but backpropagation techniques are still prevailing because of their computational efficiency. In this work, we combine both approaches as a hybrid visual explanation algorithm and propose an efficient interpretation method for convolutional neural networks. Our method adaptively selects the most critical features that mainly contribute towards a prediction to probe the model by finding the activated features. Experimental results show that the proposed method can reduce the execution time up to 30% while enhancing competitive interpretability without compromising the quality of explanation generated.

preprint2021arXiv

Integrated Grad-CAM: Sensitivity-Aware Visual Explanation of Deep Convolutional Networks via Integrated Gradient-Based Scoring

Visualizing the features captured by Convolutional Neural Networks (CNNs) is one of the conventional approaches to interpret the predictions made by these models in numerous image recognition applications. Grad-CAM is a popular solution that provides such a visualization by combining the activation maps obtained from the model. However, the average gradient-based terms deployed in this method underestimates the contribution of the representations discovered by the model to its predictions. Addressing this problem, we introduce a solution to tackle this issue by computing the path integral of the gradient-based terms in Grad-CAM. We conduct a thorough analysis to demonstrate the improvement achieved by our method in measuring the importance of the extracted representations for the CNN's predictions, which yields to our method's administration in object localization and model interpretation.

preprint2020arXiv

Batch-level Experience Replay with Review for Continual Learning

Continual learning is a branch of deep learning that seeks to strike a balance between learning stability and plasticity. The CVPR 2020 CLVision Continual Learning for Computer Vision challenge is dedicated to evaluating and advancing the current state-of-the-art continual learning methods using the CORe50 dataset with three different continual learning scenarios. This paper presents our approach, called Batch-level Experience Replay with Review, to this challenge. Our team achieved the 1'st place in all three scenarios out of 79 participated teams. The codebase of our implementation is publicly available at https://github.com/RaptorMai/CVPR20_CLVision_challenge

preprint2008arXiv

Heavy Flavor Physics through e-Science

Heavy flavor physics is an important element in understanding the nature of physics. The accurate knowledge of properties of heavy flavor physics plays an essential role for the determination of the Cabibbo-Kobayashi-Maskawa (CKM) matrix. Asymmetric-energy e+e- B factories (BaBar and Belle) run their operation and will upgrade B factories to become super Belle. The size of available B meson samples will be dramatically increased. Also the data size of Tevatron experiments (CDF, D0) are on the order of PetaByte. Therefore we use new concept of e-Science for heavy flavor physics. This concept is about studying heavy flavor physics anytime and anywhere even if we are not on-site of accelerator laboratories and data size is immense. The component of this concept is data production, data processing and data analysis anytime and anywhere. We apply this concept to current CDF experiment at Tevatron. We will expand this concept to Super Belle and LHC (Large Hadron Collider) experiments which will achieve an accuracy of measurements in the next decades.

Hyunwoo Kim

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

DeltaPrompts: Escaping the Zero-Delta Trap in Multimodal Distillation

Bridging the Gap between Classification and Localization for Weakly Supervised Object Localization

Conformational heterogeneity of molecules physisorbed on a gold surface at room temperature

Freeze-frame approach for robust single-molecule tip-enhnaced Raman spectroscopy at room temperature

Perception Prioritized Training of Diffusion Models

VISOLO: Grid-Based Space-Time Aggregation for Efficient Online Video Instance Segmentation

Ada-SISE: Adaptive Semantic Input Sampling for Efficient Explanation of Convolutional Neural Networks

Integrated Grad-CAM: Sensitivity-Aware Visual Explanation of Deep Convolutional Networks via Integrated Gradient-Based Scoring

Batch-level Experience Replay with Review for Continual Learning

Heavy Flavor Physics through e-Science