Source author record

Yujin Kim

Yujin Kim appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning Artificial Intelligence Computation and Language cs.CY hep-th

Catalog footprint

What is connected

5works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

EggHand: A Multimodal Foundation Model for Egocentric Hand Pose Forecasting

Forecasting future 3D hand pose sequences from egocentric video is essential for understanding human intention and enabling embodied applications such as AR/VR assistance and human-robot interaction. However, this task remains a highly challenging problem because egocentric hand motion is driven by complex human intent, exhibits highly dexterous articulations, and is observed under drastic viewpoint shifts induced by ego-motion. In this work, we introduce EggHand, a foundation-model-based framework for egocentric hand pose forecasting that unifies multimodal semantic reasoning with dynamic motion modeling. Our approach couples an action decoder from a Vision-Language-Action (VLA) model, which captures the structured temporal dynamics of hand motion, with an egocentric video-text encoder that provides viewpoint-aware contextual information learned from large-scale first-person video. Together, these components overcome the brittleness of generic visual encoders under ego-motion and enable joint reasoning over motion, context, and high-level intent-without relying on body pose or external tracking. Experiments on the EgoExo4D dataset show that EggHand sets a new state of the art in forecasting accuracy, remains robust under severe ego-motion, and further enables controllable prediction via language-based task prompts. Project page: https://jyoun9.github.io/EggHand

preprint2022arXiv

How to Fine-tune Models with Few Samples: Update, Data Augmentation, and Test-time Augmentation

Most of the recent few-shot learning (FSL) algorithms are based on transfer learning, where a model is pre-trained using a large amount of source data, and the pre-trained model is fine-tuned using a small amount of target data. In transfer learning-based FSL, sophisticated pre-training methods have been widely studied for universal representation. Therefore, it has become more important to utilize the universal representation for downstream tasks, but there are few studies on fine-tuning in FSL. In this paper, we focus on how to transfer pre-trained models to few-shot downstream tasks from the three perspectives: update, data augmentation, and test-time augmentation. First, we compare the two popular update methods, full fine-tuning (i.e., updating the entire network, FT) and linear probing (i.e., updating only a linear classifier, LP). We find that LP is better than FT with extremely few samples, whereas FT outperforms LP as training samples increase. Next, we show that data augmentation cannot guarantee few-shot performance improvement and investigate the effectiveness of data augmentation based on the intensity of augmentation. Finally, we adopt augmentation to both a support set for update (i.e., data augmentation) as well as a query set for prediction (i.e., test-time augmentation), considering support-query distribution shifts, and improve few-shot performance. The code is available at https://github.com/kimyuji/updating_FSL.

preprint2022arXiv

MonaCoBERT: Monotonic attention based ConvBERT for Knowledge Tracing

Knowledge tracing (KT) is a field of study that predicts the future performance of students based on prior performance datasets collected from educational applications such as intelligent tutoring systems, learning management systems, and online courses. Some previous studies on KT have concentrated only on the interpretability of the model, whereas others have focused on enhancing the performance. Models that consider both interpretability and the performance improvement have been insufficient. Moreover, models that focus on performance improvements have not shown an overwhelming performance compared with existing models. In this study, we propose MonaCoBERT, which achieves the best performance on most benchmark datasets and has significant interpretability. MonaCoBERT uses a BERT-based architecture with monotonic convolutional multihead attention, which reflects forgetting behavior of the students and increases the representation power of the model. We can also increase the performance and interpretability using a classical test-theory-based (CTT-based) embedding strategy that considers the difficulty of the question. To determine why MonaCoBERT achieved the best performance and interpret the results quantitatively, we conducted ablation studies and additional analyses using Grad-CAM, UMAP, and various visualization techniques. The analysis results demonstrate that both attention components complement one another and that CTT-based embedding represents information on both global and local difficulties. We also demonstrate that our model represents the relationship between concepts.

preprint2022arXiv

There is no rose without a thorn: Finding weaknesses on BlenderBot 2.0 in terms of Model, Data and User-Centric Approach

BlenderBot 2.0 is a dialogue model that represents open-domain chatbots by reflecting real-time information and remembering user information for an extended period using an internet search module and multi-session. Nonetheless, the model still has room for improvement. To this end, we examine BlenderBot 2.0 limitations and errors from three perspectives: model, data, and user. From the data point of view, we highlight the unclear guidelines provided to workers during the crowdsourcing process, as well as a lack of a process for refining hate speech in the collected data and verifying the accuracy of internet-based information. From a user perspective, we identify nine types of limitations of BlenderBot 2.0, and their causes are thoroughly investigated. Furthermore, for each point of view, we propose practical improvement methods and discuss several potential future research directions.

preprint2021arXiv

Unfolding Conformal Geometry

Conformal geometry is studied using the unfolded formulation à la Vasiliev. Analyzing the first-order consistency of the unfolded equations, we identify the content of zero-forms as the spin-two off-shell Fradkin-Tseytlin module of $\mathfrak{so}(2,d)$. We sketch the nonlinear structure of the equations and explain how Weyl invariant densities, which Type-B Weyl anomaly consist of, could be systematically computed within the unfolded formulation. The unfolded equation for conformal geometry is also shown to be reduced to various on-shell gravitational systems by requiring additional algebraic constraints.

Yujin Kim

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

EggHand: A Multimodal Foundation Model for Egocentric Hand Pose Forecasting

How to Fine-tune Models with Few Samples: Update, Data Augmentation, and Test-time Augmentation

MonaCoBERT: Monotonic attention based ConvBERT for Knowledge Tracing

There is no rose without a thorn: Finding weaknesses on BlenderBot 2.0 in terms of Model, Data and User-Centric Approach

Unfolding Conformal Geometry