Source author record

Taekyung Ki

Taekyung Ki appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning Computation and Language Computer Vision Human-Computer Interaction math.NA Multimedia Numerical Analysis

Catalog footprint

What is connected

3works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation

Talking head generation creates lifelike avatars from static portraits for virtual communication and content creation. However, current models do not yet convey the feeling of truly interactive communication, often generating one-way responses that lack emotional engagement. We identify two key challenges toward truly interactive avatars: generating motion in real-time under causal constraints and learning expressive, vibrant reactions without additional labeled data. To address these challenges, we propose Avatar Forcing, a new framework for interactive head avatar generation that models real-time user-avatar interactions through diffusion forcing. This design allows the avatar to process real-time multimodal inputs, including the user's audio and motion, with low latency for instant reactions to both verbal and non-verbal cues such as speech, nods, and laughter. Furthermore, we introduce a direct preference optimization method that leverages synthetic losing samples constructed by dropping user conditions, enabling label-free learning of expressive interaction. Experimental results demonstrate that our framework enables real-time interaction with low latency (approximately 500ms), achieving 6.8X speedup compared to the baseline, and produces reactive and expressive avatar motion, which is preferred over 80% against the baseline.

preprint2026arXiv

HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents

Training long-horizon LLM agents with reinforcement learning is challenging because sparse outcome rewards reveal whether a task succeeds, but not which intermediate actions caused the outcome or how they should be corrected. Recent methods alleviate this issue by generating rewards or textual hints from turn-level action-output signals, or by using feedback-conditioned self-distillation. However, generating feedback at every turn is inefficient when many intermediate turns are already successful or neutral, and applying feedback at a fixed or misaligned turn often fails to supervise the actions that contributed to the failure. To bridge this gap, we propose HINT-SD, a targeted self-distillation framework that uses full-trajectory hindsight to select failure-relevant actions and applies feedback-conditioned distillation only on targeted action spans. Experiments on BFCL v3 and AppWorld show that our method improves over the dense per-turn feedback baseline by up to 18.80 percent while achieving 2.26$\times$ lower time per training step, suggesting that selecting where to distill is a key factor for both effective and efficient long-horizon agent training.

preprint2021arXiv

Deep Scattering Network with Max-pooling

Scattering network is a convolutional network, consisting of cascading convolutions using pre-defined wavelets followed by the modulus operator. Since its introduction in 2012, the scattering network is used as one of few mathematical tools explaining components of the convolutional neural networks (CNNs). However, a pooling operator, which is one of main components of conventional CNNs, is not considered in the original scattering network. In this paper, we propose a new network, called scattering-maxp network, integrating the scattering network with the max-pooling operator. We model continuous max-pooling, apply it to the scattering network, and get the scattering-maxp network. We show that the scattering-maxp network shares many useful properties of the scattering network including translation invariance, but with much smaller number of parameters. Numerical experiments show that the use of scattering-maxp does not degrade the performance too much and is much faster than the original one, in image classification tasks.