Researcher profile

Sunwoo Kim

Sunwoo Kim contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

Large Multimodal Model-Aided Scheduling for 6G Autonomous Communications

Recently, large language models (LLMs) have gained significant attention for their ability to generate fast and accurate answer to the given query. These models have evolved into large multimodal models (LMMs), which can interpret and analyze multimodal inputs such as images and text. With the exponential growth of AI functionalities in autonomous devices, the central unit (CU), a digital processing unit performing AI inference, needs to handle LMMs to effectively control these devices. To ensure seamless command delivery to devices, the CU must perform the scheduling, which involves resource block (RB) allocation for data transmission and modulation and coding scheme (MCS) index selection based on the channel conditions. This task is challenging in many practical environments in 6G, where even small user movement can cause abrupt channel changes. In this paper, we propose a novel LMM-based scheduling technique to address this challenge. Our key idea is to leverage LMM to predict future channel parameters (e.g., distance, angles, and path gain) by analyzing the visual sensing information as well as pilot signals. By exploiting LMMs to predict the presence of reliable path and geometric information of users from the visual sensing information, and then combining these with past channel states from pilot signals, we can accurately predict future channel parameters. Using these predictions, we can preemptively make channel-aware scheduling decisions. From the numerical evaluations, we show that the proposed technique achieves more than 30% throughput gain over the conventional scheduling techniques.

preprint2026arXiv

System-Level Comparison of Multimodal and In-Band mmWave Sensing for Beam Prediction in 6G ISAC

Integrated sensing and communication (ISAC) can reduce beam-training overhead in mmWave vehicle-to-infrastructure (V2I) links by enabling in-band sensing-based beam prediction, while exteroceptive sensors can further enhance the prediction accuracy. This work develop a system-level framework that evaluates camera, LiDAR, radar, GPS, and in-band mmWave power, both individually and in multimodal fusion using the DeepSense-6G Scenario-33 dataset. A latency-aware neural network composed of lightweight convolutional (CNN) and multilayer-perceptron (MLP) encoders predict a 64-beam index. We assess performance using Top-k accuracy alongside spectral-efficiency (SE) gap, signal-to-noise-ratio (SNR) gap, rate loss, and end-to-end latency. Results show that the mmWave power vector is a strong standalone predictor, and fusing exteroceptive sensors with it preserves high performance: mmWave alone and mmWave+LiDAR/GPS/Radar achieve 98% Top-5 accuracy, while mmWave+camera achieves 94% Top-5 accuracy. The proposed framework establishes calibrated baselines for 6G ISAC-assisted beam prediction in V2I systems.

preprint2022arXiv

BLOOM-Net: Blockwise Optimization for Masking Networks Toward Scalable and Efficient Speech Enhancement

In this paper, we present a blockwise optimization method for masking-based networks (BLOOM-Net) for training scalable speech enhancement networks. Here, we design our network with a residual learning scheme and train the internal separator blocks sequentially to obtain a scalable masking-based deep neural network for speech enhancement. Its scalability lets it dynamically adjust the run-time complexity depending on the test time environment. To this end, we modularize our models in that they can flexibly accommodate varying needs for enhancement performance and constraints on the resources, incurring minimal memory or training overhead due to the added scalability. Our experiments on speech enhancement demonstrate that the proposed blockwise optimization method achieves the desired scalability with only a slight performance degradation compared to corresponding models trained end-to-end.

preprint2022arXiv

Deep Translation Prior: Test-time Training for Photorealistic Style Transfer

Recent techniques to solve photorealistic style transfer within deep convolutional neural networks (CNNs) generally require intensive training from large-scale datasets, thus having limited applicability and poor generalization ability to unseen images or styles. To overcome this, we propose a novel framework, dubbed Deep Translation Prior (DTP), to accomplish photorealistic style transfer through test-time training on given input image pair with untrained networks, which learns an image pair-specific translation prior and thus yields better performance and generalization. Tailored for such test-time training for style transfer, we present novel network architectures, with two sub-modules of correspondence and generation modules, and loss functions consisting of contrastive content, style, and cycle consistency losses. Our framework does not require offline training phase for style transfer, which has been one of the main challenges in existing methods, but the networks are to be solely learned during test-time. Experimental results prove that our framework has a better generalization ability to unseen image pairs and even outperforms the state-of-the-art methods.

preprint2022arXiv

Human Motion Control of Quadrupedal Robots using Deep Reinforcement Learning

A motion-based control interface promises flexible robot operations in dangerous environments by combining user intuitions with the robot's motor capabilities. However, designing a motion interface for non-humanoid robots, such as quadrupeds or hexapods, is not straightforward because different dynamics and control strategies govern their movements. We propose a novel motion control system that allows a human user to operate various motor tasks seamlessly on a quadrupedal robot. We first retarget the captured human motion into the corresponding robot motion with proper semantics using supervised learning and post-processing techniques. Then we apply the motion imitation learning with curriculum learning to develop a control policy that can track the given retargeted reference. We further improve the performance of both motion retargeting and motion imitation by training a set of experts. As we demonstrate, a user can execute various motor tasks using our system, including standing, sitting, tilting, manipulating, walking, and turning, on simulated and real quadrupeds. We also conduct a set of studies to analyze the performance gain induced by each component.

preprint2022arXiv

Meta Distribution of SIR in the Internet of Things Modelled as a Euclidean Matching

The Poisson bipolar model considers user-base station pairs distributed at random on a flat domain, similar to matchsticks scattered onto a table. Though this is a simple and tractable setting in which to study dense networks, it doesn't properly characterise the stochastic geometry of user-base station interactions in some dense deployment scenarios, which may involve short and long range links, with some paired very nearby optimally, and others sub-optimally due to local crowding. Since the users will pair one-to-one with base stations, we can consider using the popular bipartite Euclidean matching (BEM) from spatial combinatorics, and study the corresponding (meta) distribution of the signal-to-interference-ratio (SIR). This provides detailed information about the proportion of links in the network meeting a target reliability constraint. We can then observe via comparison the impact of taking into account the variable/correlated short-range distances between the transmitter-receiver pairs on the communication statistics. We illustrate and quantify how the widely-accepted bipolar model fails to capture the network-wide reliability of communication in a typical ultra-dense setting based on a binomial point process. We also show how assuming a Gamma distribution for link distances may be a simple improvement on the bipolar model. Overall, BEMs provide good grounds for understanding more sophisticated pairing features in ultra-dense networks.

preprint2022arXiv

PMBM-based SLAM Filters in 5G mmWave Vehicular Networks

Radio-based vehicular simultaneous localization and mapping (SLAM) aims to localize vehicles while mapping the landmarks in the environment. We propose a sequence of three Poisson multi-Bernoulli mixture (PMBM) based SLAM filters, which handle the entire SLAM problem in a theoretically optimal manner. The complexity of the three proposed SLAM filters is progressively reduced while sustaining high accuracy by deriving SLAM density approximation with the marginalization of nuisance parameters (either vehicle state or data association). Firstly, the PMBM SLAM filter serves as the foundation, for which we provide the first complete description based on a Rao-Blackwellized particle filter. Secondly, the Poisson multi-Bernoulli (PMB) SLAM filter is based on the standard reduction from PMBM to PMB, but involves a novel interpretation based on auxiliary variables and a relation to Bethe free energy. Finally, using the same auxiliary variable argument, we derive a marginalized PMB SLAM filter, which avoids particles and is instead implemented with a low-complexity cubature Kalman filter. We evaluate the three proposed SLAM filters in comparison with the probability hypothesis density (PHD) SLAM filter in 5G mmWave vehicular networks and show the computation-performance trade-off between them.

preprint2022arXiv

REVECA -- Rich Encoder-decoder framework for Video Event CAptioner

We describe an approach used in the Generic Boundary Event Captioning challenge at the Long-Form Video Understanding Workshop held at CVPR 2022. We designed a Rich Encoder-decoder framework for Video Event CAptioner (REVECA) that utilizes spatial and temporal information from the video to generate a caption for the corresponding the event boundary. REVECA uses frame position embedding to incorporate information before and after the event boundary. Furthermore, it employs features extracted using the temporal segment network and temporal-based pairwise difference method to learn temporal information. A semantic segmentation mask for the attentional pooling process is adopted to learn the subject of an event. Finally, LoRA is applied to fine-tune the image encoder to enhance the learning efficiency. REVECA yielded an average score of 50.97 on the Kinetics-GEBC test data, which is an improvement of 10.17 over the baseline method. Our code is available in https://github.com/TooTouch/REVECA.

preprint2020arXiv

5G mmWave Cooperative Positioning and Mapping using Multi-Model PHD Filter and Map Fusion

5G millimeter wave (mmWave) signals can enable accurate positioning in vehicular networks when the base station and vehicles are equipped with large antenna arrays. However, radio-based positioning suffers from multipath signals generated by different types of objects in the physical environment. Multipath can be turned into a benefit, by building up a radio map (comprising the number of objects, object type, and object state) and using this map to exploit all available signal paths for positioning. We propose a new method for cooperative vehicle positioning and mapping of the radio environment, comprising a multiple-model probability hypothesis density filter and a map fusion routine, which is able to consider different types of objects and different fields of views. Simulation results demonstrate the performance of the proposed method.

preprint2020arXiv

Boosted Locality Sensitive Hashing: Discriminative Binary Codes for Source Separation

Speech enhancement tasks have seen significant improvements with the advance of deep learning technology, but with the cost of increased computational complexity. In this study, we propose an adaptive boosting approach to learning locality sensitive hash codes, which represent audio spectra efficiently. We use the learned hash codes for single-channel speech denoising tasks as an alternative to a complex machine learning model, particularly to address the resource-constrained environments. Our adaptive boosting algorithm learns simple logistic regressors as the weak learners. Once trained, their binary classification results transform each spectrum of test noisy speech into a bit string. Simple bitwise operations calculate Hamming distance to find the K-nearest matching frames in the dictionary of training noisy speech spectra, whose associated ideal binary masks are averaged to estimate the denoising mask for that test mixture. Our proposed learning algorithm differs from AdaBoost in the sense that the projections are trained to minimize the distances between the self-similarity matrix of the hash codes and that of the original spectra, rather than the misclassification rate. We evaluate our discriminative hash codes on the TIMIT corpus with various noise types, and show comparative performance to deep learning methods in terms of denoising performance and complexity.

preprint2020arXiv

Experimental Demonstration of Location-aware Beam Alignment

The main focus of beam alignment is to find the optimal beam which yields the largest received signal strength (RSS) with faster speed.In this paper, we demonstrate an efficient beam alignment scheme with our testbed. The algorithm we experiment uses the location information for the computation efficient beam alignment.The testbed transmits and receives the 13.8 GHz signal and steers a beam on both transmitter and receiver with various radio frequency (RF) components. The location information is estimated with the indoor positioning module. The experiment shows that the location-aware algorithm significantly reduces the time consumption for beam alignment than the exhaustive search.

preprint2020arXiv

Transaction-level Model Simulator for Communication-Limited Accelerators

Rapid design space exploration in early design stage is critical to algorithm-architecture co-design for accelerators. In this work, a pre-RTL cycle-accurate accelerator simulator based on SystemC transaction-level modeling (TLM), AccTLMSim, is proposed for convolutional neural network (CNN) accelerators. The accelerator simulator keeps track of each bus transaction between accelerator and DRAM, taking into account the communication bandwidth. The simulation results are validated against the implementation results on the Xilinx Zynq. Using the proposed simulator, it is shown that the communication bandwidth is severely affected by DRAM latency and bus protocol overhead. In addition, the loop tiling is optimized to maximize the performance under the constraint of on-chip SRAM size. Furthermore, a new performance estimation model is proposed to speed up the design space exploration. Thanks to the proposed simulator and performance estimation model, it is possible to explore a design space of millions of architectural options within a few tens of minutes.