Source author record

Minjung Kim

Minjung Kim appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.mes-hall Artificial Intelligence Machine Learning cond-mat.mtrl-sci Robotics

Catalog footprint

What is connected

10works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

OTT-Vid: Optimal Transport Temporal Token Compression for Video Large Language Models

As Video Large Language Models (Video-LLMs) scale to longer and more complex videos, their inference cost grows rapidly due to the large volume of visual tokens accumulated across frames. Training-free token compression has emerged as a practical solution to this bottleneck. However, existing temporal compression methods rely primarily on cross-frame token similarity or segmentation heuristics, overlooking each token's semantic role within its frame and failing to adapt compression strength to the compressibility of each frame pair. In this work, we propose OTT-Vid, a transport-derived allocation framework for temporal token compression. Our approach consists of two stages: spatial pruning identifies representative content within each frame, and optimal transport (OT) is then solved between neighboring frames to estimate temporal compressibility. We formulate this OT with non-uniform token mass, which protects semantically important tokens from aggressive compression, and a locality-aware cost that captures both feature and spatial disparities. The resulting transport plan jointly balances token importance and matching cost, while its total cost defines the transport difficulty of each frame pair, which we use to allocate compression budgets dynamically. Experiments on six benchmarks spanning video question answering and temporal grounding show that OTT-Vid preserves 95.8% of VQA and 73.9% of VTG performance while retaining only 10% of tokens, consistently outperforming existing state-of-the-art training-free compression methods.

preprint2023arXiv

Regular Time-series Generation using SGM

Score-based generative models (SGMs) are generative models that are in the spotlight these days. Time-series frequently occurs in our daily life, e.g., stock data, climate data, and so on. Especially, time-series forecasting and classification are popular research topics in the field of machine learning. SGMs are also known for outperforming other generative models. As a result, we apply SGMs to synthesize time-series data by learning conditional score functions. We propose a conditional score network for the time-series generation domain. Furthermore, we also derive the loss function between the score matching and the denoising score matching in the time-series generation domain. Finally, we achieve state-of-the-art results on real-world datasets in terms of sampling diversity and quality.

preprint2022arXiv

3D-GIF: 3D-Controllable Object Generation via Implicit Factorized Representations

While NeRF-based 3D-aware image generation methods enable viewpoint control, limitations still remain to be adopted to various 3D applications. Due to their view-dependent and light-entangled volume representation, the 3D geometry presents unrealistic quality and the color should be re-rendered for every desired viewpoint. To broaden the 3D applicability from 3D-aware image generation to 3D-controllable object generation, we propose the factorized representations which are view-independent and light-disentangled, and training schemes with randomly sampled light conditions. We demonstrate the superiority of our method by visualizing factorized representations, re-lighted images, and albedo-textured meshes. In addition, we show that our approach improves the quality of the generated geometry via visualization and quantitative comparison. To the best of our knowledge, this is the first work that extracts albedo-textured meshes with unposed 2D images without any additional labels or assumptions.

preprint2022arXiv

SOS: Score-based Oversampling for Tabular Data

Score-based generative models (SGMs) are a recent breakthrough in generating fake images. SGMs are known to surpass other generative models, e.g., generative adversarial networks (GANs) and variational autoencoders (VAEs). Being inspired by their big success, in this work, we fully customize them for generating fake tabular data. In particular, we are interested in oversampling minor classes since imbalanced classes frequently lead to sub-optimal training outcomes. To our knowledge, we are the first presenting a score-based tabular data oversampling method. Firstly, we re-design our own score network since we have to process tabular data. Secondly, we propose two options for our generation method: the former is equivalent to a style transfer for tabular data and the latter uses the standard generative policy of SGMs. Lastly, we define a fine-tuning method, which further enhances the oversampling quality. In our experiments with 6 datasets and 10 baselines, our method outperforms other oversampling methods in all cases.

preprint2022arXiv

Tackling Background Distraction in Video Object Segmentation

Semi-supervised video object segmentation (VOS) aims to densely track certain designated objects in videos. One of the main challenges in this task is the existence of background distractors that appear similar to the target objects. We propose three novel strategies to suppress such distractors: 1) a spatio-temporally diversified template construction scheme to obtain generalized properties of the target objects; 2) a learnable distance-scoring function to exclude spatially-distant distractors by exploiting the temporal consistency between two consecutive frames; 3) swap-and-attach augmentation to force each object to have unique features by providing training samples containing entangled objects. On all public benchmark datasets, our model achieves a comparable performance to contemporary state-of-the-art approaches, even with real-time performance. Qualitative results also demonstrate the superiority of our approach over existing methods. We believe our approach will be widely used for future VOS research.

preprint2022arXiv

X-MAS: Extremely Large-Scale Multi-Modal Sensor Dataset for Outdoor Surveillance in Real Environments

In robotics and computer vision communities, extensive studies have been widely conducted regarding surveillance tasks, including human detection, tracking, and motion recognition with a camera. Additionally, deep learning algorithms are widely utilized in the aforementioned tasks as in other computer vision tasks. Existing public datasets are insufficient to develop learning-based methods that handle various surveillance for outdoor and extreme situations such as harsh weather and low illuminance conditions. Therefore, we introduce a new large-scale outdoor surveillance dataset named eXtremely large-scale Multi-modAl Sensor dataset (X-MAS) containing more than 500,000 image pairs and the first-person view data annotated by well-trained annotators. Moreover, a single pair contains multi-modal data (e.g. an IR image, an RGB image, a thermal image, a depth image, and a LiDAR scan). This is the first large-scale first-person view outdoor multi-modal dataset focusing on surveillance tasks to the best of our knowledge. We present an overview of the proposed dataset with statistics and present methods of exploiting our dataset with deep learning-based algorithms. The latest information on the dataset and our study are available at https://github.com/lge-robot-navi, and the dataset will be available for download through a server.

preprint2018arXiv

Imaginary time, shredded propagator method for large-scale GW calculations

The GW method is a many-body approach capable of providing quasiparticle bands for realistic systems spanning physics, chemistry, and materials science. Despite its power, GW is not routinely applied to large complex materials due to its computational expense. We perform an exact recasting of the GW polarizability and the self-energy as Laplace integrals over imaginary time propagators. We then "shred" the propagators (via energy windowing) and approximate them in a controlled manner by using Gauss-Laguerre quadrature and discrete variable methods to treat the imaginary time propagators in real space. The resulting cubic scaling GW method has a sufficiently small prefactor to outperform standard quartic scaling methods on small systems (>=10 atoms) and also represents a substantial improvement over several other cubic methods tested. This approach is useful for evaluating quantum mechanical response function involving large sums containing energy (difference) denominators.

preprint2016arXiv

Determination of the thickness and orientation of few-layer tungsten ditelluride using polarized Raman spectroscopy

Orthorhombic tungsten ditelluride (WTe2), with a distorted 1T structure, exhibits a large magnetoresistance that depends on the orientation, and its electrical characteristics changes rom semimetallic to insulating as the thickness decreases. Through polarized Raman spectroscopy in combination with transmission electron diffraction, we establish a reliable method to determine the thickness and crystallographic orientation of few-layer WTe2. The Raman spectrum shows a pronounced dependence on the polarization of the excitation laser. We found that the separation between two Raman peaks at ~90 cm-1 and at 80-86 cm-1, depending on thickness, is a reliable fingerprint for determination of the thickness. For determination of the crystallographic orientation, the polarization dependence of the A1 modes, measured with the 632.8-nm excitation, turns out to be the most reliable. We also discovered that the polarization behaviors of some of the Raman peaks depend on the excitation wavelength as well as thickness, indicating a close interplay between the band structure and anisotropic Raman scattering cross section.

preprint2015arXiv

Photocurrent generation at ABA/ABC lateral junction in tri-layer graphene photodetector

Metal-graphene-metal photodetectors utilize photocurrent generated near the graphene/metal junctions and have many advantages including high speed and broad-band operation. Here, we report on photocurrent generation at ABA/ABC stacking domain junctions in tri-layer graphene with a responsivity of 0.18 A/W. Unlike usual metal-graphene-metal devices, the photocurrent is generated in the middle of the graphene channel, not confined to the vicinity of the metal electrodes. The magnitude and the direction of the photocurrent depend on the back-gate bias. Theoretical calculations show that there is a built-in band offset between the two stacking domains, and the dominant mechanism of the photocurrent is the photo-thermoelectric effect due to the Seebeck coefficient difference.

preprint2012arXiv

Polarization dependence of photocurrent in a metal-graphene-metal device

The dependence of the photocurrent generated in a Pd/graphene/Ti junction device on the incident photon polarization is studied. Spatially resolved photocurrent images were obtained as the incident photon polarization is varied. The photocurrent is maximum when the polarization direction is perpendicular to the graphene channel direction and minimum when the two directions are parallel. This polarization dependence can be explained as being due to the anisotropic electron-photon interaction of Dirac electrons in graphene.

Minjung Kim

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

OTT-Vid: Optimal Transport Temporal Token Compression for Video Large Language Models

Regular Time-series Generation using SGM

3D-GIF: 3D-Controllable Object Generation via Implicit Factorized Representations

SOS: Score-based Oversampling for Tabular Data

Tackling Background Distraction in Video Object Segmentation

X-MAS: Extremely Large-Scale Multi-Modal Sensor Dataset for Outdoor Surveillance in Real Environments

Imaginary time, shredded propagator method for large-scale GW calculations

Determination of the thickness and orientation of few-layer tungsten ditelluride using polarized Raman spectroscopy

Photocurrent generation at ABA/ABC lateral junction in tri-layer graphene photodetector

Polarization dependence of photocurrent in a metal-graphene-metal device