Researcher profile

Ji-Hoon Kim

Ji-Hoon Kim contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2026arXiv

Probing Cross-modal Information Hubs in Audio-Visual LLMs

Audio-visual large language models (AVLLMs) have recently emerged as a powerful architecture capable of jointly reasoning over audio, visual, and textual modalities. In AVLLMs, the bidirectional interaction between audio and video modalities introduces intricate processing dynamics, necessitating a deeper understanding of their internal mechanisms. However, unlike extensively studied text-only or large vision language models, the internal workings of AVLLMs remain largely unexplored. In this paper, we focus on cross-modal information flow between audio and visual modalities in AVLLMs, investigating where information derived from one modality is encoded within the token representations of the other modality. Through an analysis of multiple recent AVLLMs, we uncover two common findings. First, AVLLMs primarily encode integrated audio-visual information in sink tokens. Second, sink tokens do not uniformly hold cross-modal information. Instead, a distinct subset of sink tokens, which we term cross-modal sink tokens, specializes in storing such information. Based on these findings, we further propose a simple training-free hallucination mitigation method by encouraging reliance on integrated cross-modal information within cross-modal sink tokens. Our code is available at https://github.com/kaistmm/crossmodal-hub.

preprint2024arXiv

Let There Be Sound: Reconstructing High Quality Speech from Silent Videos

The goal of this work is to reconstruct high quality speech from lip motions alone, a task also known as lip-to-speech. A key challenge of lip-to-speech systems is the one-to-many mapping caused by (1) the existence of homophenes and (2) multiple speech variations, resulting in a mispronounced and over-smoothed speech. In this paper, we propose a novel lip-to-speech system that significantly improves the generation quality by alleviating the one-to-many mapping problem from multiple perspectives. Specifically, we incorporate (1) self-supervised speech representations to disambiguate homophenes, and (2) acoustic variance information to model diverse speech styles. Additionally, to better solve the aforementioned problem, we employ a flow based post-net which captures and refines the details of the generated speech. We perform extensive experiments on two datasets, and demonstrate that our method achieves the generation quality close to that of real human utterance, outperforming existing methods in terms of speech naturalness and intelligibility by a large margin. Synthesised samples are available at our demo page: https://mm.kaist.ac.kr/projects/LTBS.

preprint2022arXiv

Accelerating Large-Scale Graph-based Nearest Neighbor Search on a Computational Storage Platform

K-nearest neighbor search is one of the fundamental tasks in various applications and the hierarchical navigable small world (HNSW) has recently drawn attention in large-scale cloud services, as it easily scales up the database while offering fast search. On the other hand, a computational storage device (CSD) that combines programmable logic and storage modules on a single board becomes popular to address the data bandwidth bottleneck of modern computing systems. In this paper, we propose a computational storage platform that can accelerate a large-scale graph-based nearest neighbor search algorithm based on SmartSSD CSD. To this end, we modify the algorithm more amenable on the hardware and implement two types of accelerators using HLS- and RTL-based methodology with various optimization methods. In addition, we scale up the proposed platform to have 4 SmartSSDs and apply graph parallelism to boost the system performance further. As a result, the proposed computational storage platform achieves 75.59 query per second throughput for the SIFT1B dataset at 258.66W power dissipation, which is 12.83x and 17.91x faster and 10.43x and 24.33x more energy efficient than the conventional CPU-based and GPU-based server platform, respectively. With multi-terabyte storage and custom acceleration capability, we believe that the proposed computational storage platform is a promising solution for cost-sensitive cloud datacenters.

preprint2022arXiv

Demystifying the Neural Tangent Kernel from a Practical Perspective: Can it be trusted for Neural Architecture Search without training?

In Neural Architecture Search (NAS), reducing the cost of architecture evaluation remains one of the most crucial challenges. Among a plethora of efforts to bypass training of each candidate architecture to convergence for evaluation, the Neural Tangent Kernel (NTK) is emerging as a promising theoretical framework that can be utilized to estimate the performance of a neural architecture at initialization. In this work, we revisit several at-initialization metrics that can be derived from the NTK and reveal their key shortcomings. Then, through the empirical analysis of the time evolution of NTK, we deduce that modern neural architectures exhibit highly non-linear characteristics, making the NTK-based metrics incapable of reliably estimating the performance of an architecture without some amount of training. To take such non-linear characteristics into account, we introduce Label-Gradient Alignment (LGA), a novel NTK-based metric whose inherent formulation allows it to capture the large amount of non-linear advantage present in modern neural architectures. With minimal amount of training, LGA obtains a meaningful level of rank correlation with the post-training test accuracy of an architecture. Lastly, we demonstrate that LGA, complemented with few epochs of training, successfully guides existing search algorithms to achieve competitive search performances with significantly less search cost. The code is available at: https://github.com/nutellamok/DemystifyingNTK.

preprint2022arXiv

Two-Step Question Retrieval for Open-Domain QA

The retriever-reader pipeline has shown promising performance in open-domain QA but suffers from a very slow inference speed. Recently proposed question retrieval models tackle this problem by indexing question-answer pairs and searching for similar questions. These models have shown a significant increase in inference speed, but at the cost of lower QA performance compared to the retriever-reader models. This paper proposes a two-step question retrieval model, SQuID (Sequential Question-Indexed Dense retrieval) and distant supervision for training. SQuID uses two bi-encoders for question retrieval. The first-step retriever selects top-k similar questions, and the second-step retriever finds the most similar question from the top-k questions. We evaluate the performance and the computational efficiency of SQuID. The results show that SQuID significantly increases the performance of existing question retrieval models with a negligible loss on inference speed.

preprint2020arXiv

Dark Matter Deficient Galaxies Produced Via High-velocity Galaxy Collisions In High-resolution Numerical Simulations

The recent discovery of diffuse dwarf galaxies that are deficient in dark matter appears to challenge the current paradigm of structure formation in our Universe. We describe the numerical experiments to determine if the so-called dark matter deficient galaxies (DMDGs) could be produced when two gas-rich, dwarf-sized galaxies collide with a high relative velocity of $\sim 300\,{\rm kms^{-1}}$. Using idealized high-resolution simulations with both mesh-based and particle-based gravito-hydrodynamics codes, we find that DMDGs can form as high-velocity galaxy collisions separate dark matter from the warm disk gas which subsequently is compressed by shock and tidal interaction to form stars. Then using a large simulated universe IllustrisTNG, we discover a number of high-velocity galaxy collision events in which DMDGs are expected to form. However, we did not find evidence that these types of collisions actually produced DMDGs in the TNG100-1 run. We argue that the resolution of the numerical experiment is critical to realize the "collision-induced" DMDG formation scenario. Our results demonstrate one of many routes in which galaxies could form with unconventional dark matter fractions.

preprint2020arXiv

Self-consistent proto-globular cluster formation in cosmological simulations of high-redshift galaxies

We report the formation of bound star clusters in a sample of high-resolution cosmological zoom-in simulations of z>5 galaxies from the FIRE project. We find that bound clusters preferentially form in high-pressure clouds with gas surface densities over 10^4 Msun pc^-2, where the cloud-scale star formation efficiency is near unity and young stars born in these regions are gravitationally bound at birth. These high-pressure clouds are compressed by feedback-driven winds and/or collisions of smaller clouds/gas streams in highly gas-rich, turbulent environments. The newly formed clusters follow a power-law mass function of dN/dM~M^-2. The cluster formation efficiency is similar across galaxies with stellar masses of ~10^7-10^10 Msun at z>5. The age spread of cluster stars is typically a few Myrs and increases with cluster mass. The metallicity dispersion of cluster members is ~0.08 dex in [Z/H] and does not depend on cluster mass significantly. Our findings support the scenario that present-day old globular clusters (GCs) were formed during relatively normal star formation in high-redshift galaxies. Simulations with a stricter/looser star formation model form a factor of a few more/fewer bound clusters per stellar mass formed, while the shape of the mass function is unchanged. Simulations with a lower local star formation efficiency form more stars in bound clusters. The simulated clusters are larger than observed GCs due to finite resolution. Our simulations are among the first cosmological simulations that form bound clusters self-consistently in a wide range of high-redshift galaxies.

preprint2020arXiv

The AGORA high-resolution galaxy simulations comparison project: Public data release

As part of the AGORA High-resolution Galaxy Simulations Comparison Project (Kim et al. 2014, 2016) we have generated a suite of isolated Milky Way-mass galaxy simulations using 9 state-of-the-art gravito-hydrodynamics codes widely used in the numerical galaxy formation community. In these simulations we adopted identical galactic disk initial conditions, and common physics models (e.g., radiative cooling and ultraviolet background by a standardized package). Subgrid physics models such as Jeans pressure floor, star formation, supernova feedback energy, and metal production were carefully constrained. Here we release the simulation data to be freely used by the community. In this release we include the disk snapshots at 0 and 500Myr of evolution per each code as used in Kim et al. (2016), from simulations with and without star formation and feedback. We encourage any member of the numerical galaxy formation community to make use of these resources for their research - for example, compare their own simulations with the AGORA galaxies, with the common analysis yt scripts used to obtain the plots shown in our papers, also available in this release.

preprint2019arXiv

High-redshift Galaxy Formation with Self-consistently Modeled Stars and Massive Black Holes: Stellar Feedback and Quasar Growth

As computational resolution of modern cosmological simulations reach ever so close to resolving individual star-forming clumps in a galaxy, a need for "resolution-appropriate" physics for a galaxy-scale simulation has never been greater. To this end, we introduce a self-consistent numerical framework that includes explicit treatments of feedback from star-forming molecular clouds (SFMCs) and massive black holes (MBHs). In addition to the thermal supernovae feedback from SFMC particles, photoionizing radiation from both SFMCs and MBHs is tracked through full 3-dimensional ray tracing. A mechanical feedback channel from MBHs is also considered. Using our framework, we perform a state-of-the-art cosmological simulation of a quasar-host galaxy at z~7.5 for ~25 Myrs with all relevant galactic components such as dark matter, gas, SFMCs, and an embedded MBH seed of ~> 1e6 Ms. We find that feedback from SFMCs and an accreting MBH suppresses runaway star formation locally in the galactic core region. Newly included radiation feedback from SFMCs, combined with feedback from the MBH, helps the MBH grow faster by retaining gas that eventually accretes on to the MBH. Our experiment demonstrates that previously undiscussed types of interplay between gas, SFMCs, and a MBH may hold important clues about the growth and feedback of quasars and their host galaxies in the high-redshift Universe.