Source author record

Young Jin Kim

Young Jin Kim appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computation and Language physics.soc-ph Computer Science and Game Theory cond-mat.mes-hall cond-mat.mtrl-sci nucl-ex physics.geo-ph physics.ins-det

Catalog footprint

What is connected

8works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

WaveDiffusion: Joint Latent Diffusion for Physically Consistent Seismic and Velocity Generation

Full Waveform Inversion (FWI) is a critical technique in subsurface imaging, aiming to reconstruct high-resolution subsurface properties from surface measurements. Acoustic FWI involves two physical modalities, seismic waveforms and velocity maps, which are governed by the acoustic wave equation. Prior works primarily focus on the inverse problem, modeling the relationship between seismic and velocity as an image-to-image translation task. In this work, we study their relationship from a generative perspective. Our aim is to explore and characterize the latent space structure, and identify latent vectors that generate seismic-velocity pairs consistent with the governing partial differential equation (PDE). Specifically, we model seismic and velocity data jointly from a shared latent space via a diffusion process. In experiments, we find that diffusion progressively refines arbitrary latent vectors into ones that yield approximately physics-consistent seismic-velocity pairs, even without explicit physics constraints. This provides empirical evidence of PDE-consistency in latent diffusion, where sampling is biased toward PDE-valid solutions. In latent space, satisfying the acoustic wave equation can be approximated through sampling and gradient descent. We formalize this physics-consistent latent modeling task and quantify it through extensive experiments. On large-scale OpenFWI benchmarks, our approach produces high-fidelity, diverse, and physically consistent seismic-velocity pairs, demonstrating the potential of a data-driven latent diffusion for physically consistent generation in a complex scientific domain.

preprint2022arXiv

Fast Vocabulary Projection Method via Clustering for Multilingual Machine Translation on GPU

Multilingual Neural Machine Translation has been showing great success using transformer models. Deploying these models is challenging because they usually require large vocabulary (vocab) sizes for various languages. This limits the speed of predicting the output tokens in the last vocab projection layer. To alleviate these challenges, this paper proposes a fast vocabulary projection method via clustering which can be used for multilingual transformers on GPUs. First, we offline split the vocab search space into disjoint clusters given the hidden context vector of the decoder output, which results in much smaller vocab columns for vocab projection. Second, at inference time, the proposed method predicts the clusters and candidate active tokens for hidden context vectors at the vocab projection. This paper also includes analysis of different ways of building these clusters in multilingual settings. Our results show end-to-end speed gains in float16 GPU inference up to 25% while maintaining the BLEU score and slightly increasing memory cost. The proposed method speeds up the vocab projection step itself by up to 2.6x. We also conduct an extensive human evaluation to verify the proposed method preserves the quality of the translations from the original model.

preprint2022arXiv

Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers

Sparsely activated transformers, such as Mixture of Experts (MoE), have received great interest due to their outrageous scaling capability which enables dramatical increases in model size without significant increases in computational cost. To achieve this, MoE models replace the feedforward sub-layer with Mixture-of-Experts sub-layer in transformers and use a gating network to route each token to its assigned experts. Since the common practice for efficient training of such models requires distributing experts and tokens across different machines, this routing strategy often incurs huge cross-machine communication cost because tokens and their assigned experts likely reside in different machines. In this paper, we propose \emph{Gating Dropout}, which allows tokens to ignore the gating network and stay at their local machines, thus reducing the cross-machine communication. Similar to traditional dropout, we also show that Gating Dropout has a regularization effect during training, resulting in improved generalization performance. We validate the effectiveness of Gating Dropout on multilingual machine translation tasks. Our results demonstrate that Gating Dropout improves a state-of-the-art MoE model with faster wall-clock time convergence rates and better BLEU scores for a variety of model sizes and datasets.

preprint2022arXiv

Taming Sparsely Activated Transformer with Stochastic Experts

Sparsely activated models (SAMs), such as Mixture-of-Experts (MoE), can easily scale to have outrageously large amounts of parameters without significant increase in computational cost. However, SAMs are reported to be parameter inefficient such that larger models do not always lead to better performance. While most on-going research focuses on improving SAMs models by exploring methods of routing inputs to experts, our analysis reveals that such research might not lead to the solution we expect, i.e., the commonly-used routing methods based on gating mechanisms do not work better than randomly routing inputs to experts. In this paper, we propose a new expert-based model, THOR (Transformer witH StOchastic ExpeRts). Unlike classic expert-based models, such as the Switch Transformer, experts in THOR are randomly activated for each input during training and inference. THOR models are trained using a consistency regularized loss, where experts learn not only from training data but also from other experts as teachers, such that all the experts make consistent predictions. We validate the effectiveness of THOR on machine translation tasks. Results show that THOR models are more parameter efficient in that they significantly outperform the Transformer and MoE models across various settings. For example, in multilingual translation, THOR outperforms the Switch Transformer by 2 BLEU scores, and obtains the same BLEU score as that of a state-of-the-art MoE model that is 18 times larger. Our code is publicly available at: https://github.com/microsoft/Stochastic-Mixture-of-Experts.

preprint2015arXiv

Dynamic Motifs of Strategies in Prisoner's Dilemma Games

We investigate the win-lose relations between strategies of iterated prisoner's dilemma games by using a directed network concept to display the replicator dynamics results. In the giant strongly-connected component of the win/lose network, we find win-lose circulations similar to rock-paper-scissors and analyze the fixed point and its stability. Applying the network motif concept, we introduce dynamic motifs, which describe the population dynamics relations among the three strategies. Through exact enumeration, we find 22 dynamic motifs and display their phase portraits. Visualization using directed networks and motif analysis is a useful method to make complex dynamic behavior simple in order to understand it more intuitively. Dynamic motifs can be building blocks for dynamic behavior among strategies when they are applied to other types of games.

preprint2014arXiv

Network Structures between Strategies in Iterated Prisoners' Dilemma Games

We use replicator dynamics to study an iterated prisoners' dilemma game with memory. In this study, we investigate the characteristics of all 32 possible strategies with a single-step memory by observing the results when each strategy encounters another one. Based on these results, we define similarity measures between the 32 strategies and perform a network analysis of the relationship between the strategies by constructing a strategies network. Interestingly, we find that a win-lose circulation, like rock-paper-scissors, exists between strategies and that the circulation results from one unusual strategy.

preprint2011arXiv

A high dynamic range data acquisition system for a solid-state electron Electric Dipole Moment experiment

We have built a high precision (24-bit) data acquisition (DAQ) system with eight simultaneously sampling input channels for the measurement of the electric dipole moment (EDM) of the electron. The DAQ system consists of two main components, a master board and eight individual analog-to-digital converter (ADC) boards. This custom DAQ system provides galvanic isolation, with fiber optic communication, between the master board and each ADC board to reduce the possibility of ground loop pickups. In addition, each ADC board is enclosed in its own heavy-duty radio frequency shielding enclosure and powered by DC batteries, to attain the ultimate low levels of channel cross-talk. In this paper, we describe the implementation of the DAQ system and scrutinize its performance.

preprint2010arXiv

30 inch Roll-Based Production of High-Quality Graphene Films for Flexible Transparent Electrodes

We report that 30-inch scale multiple roll-to-roll transfer and wet chemical doping considerably enhance the electrical properties of the graphene films grown on roll-type Cu substrates by chemical vapor deposition. The resulting graphene films shows a sheet resistance as low as ~30 Ohm/sq at ~90 % transparency which is superior to commercial transparent electrodes such as indium tin oxides (ITO). The monolayer of graphene shows sheet resistances as low as ~125 Ohm/sq with 97.4% optical transmittance and half-integer quantum Hall effect, indicating the high-quality of these graphene films. As a practical application, we also fabricated a touch screen panel device based on the graphene transparent electrodes, showing extraordinary mechanical and electrical performances.

Young Jin Kim

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

WaveDiffusion: Joint Latent Diffusion for Physically Consistent Seismic and Velocity Generation

Fast Vocabulary Projection Method via Clustering for Multilingual Machine Translation on GPU

Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers

Taming Sparsely Activated Transformer with Stochastic Experts

Dynamic Motifs of Strategies in Prisoner's Dilemma Games

Network Structures between Strategies in Iterated Prisoners' Dilemma Games

A high dynamic range data acquisition system for a solid-state electron Electric Dipole Moment experiment

30 inch Roll-Based Production of High-Quality Graphene Films for Flexible Transparent Electrodes