Source author record

Jong-Seok Lee

Jong-Seok Lee appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.IV Machine Learning Multimedia eess.SP Human-Computer Interaction Information Theory math.IT

Catalog footprint

What is connected

14works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Coarse-to-Fine: Progressive Image Compression for Semantically Hierarchical Classification

Recent advances in learned image compression (LIC) have enabled practical deployments, spurring active research into image compression for machines and progressive coding schemes. However, their integration remains under-explored: prior works on progressive machine codec predominantly target sample-level difficulty adaptation (i.e., easy-to-hard), without considering semantic-level scalability. In this work, we introduce a semantic hierarchy-aware progressive codec that enables semantic scalability (i.e., coarse-to-fine) from a single bitstream. We first systematically categorize ImageNet-1K classes into CLIP embedding-based semantic hierarchies. Based on a channel-wise autoregressive framework, we decompose latent representations into hierarchically ordered channel blocks, each explicitly optimized for a corresponding semantic hierarchy. Extensive experiments demonstrate that our approach substantially improves coarse-level recognition at low bitrates while maintaining fine-grained accuracy at higher bitrates. By reframing progressive transmission through the lens of semantic scalability, our work provides an efficient and interpretable solution for task-adaptive image coding, outperforming existing progressive codecs under hierarchical evaluation.

preprint2022arXiv

Deep Image Destruction: Vulnerability of Deep Image-to-Image Models against Adversarial Attacks

Recently, the vulnerability of deep image classification models to adversarial attacks has been investigated. However, such an issue has not been thoroughly studied for image-to-image tasks that take an input image and generate an output image (e.g., colorization, denoising, deblurring, etc.) This paper presents comprehensive investigations into the vulnerability of deep image-to-image models to adversarial attacks. For five popular image-to-image tasks, 16 deep models are analyzed from various standpoints such as output quality degradation due to attacks, transferability of adversarial examples across different tasks, and characteristics of perturbations. We show that unlike image classification tasks, the performance degradation on image-to-image tasks largely differs depending on various factors, e.g., attack methods and task objectives. In addition, we analyze the effectiveness of conventional defense methods used for classification models in improving the robustness of the image-to-image models.

preprint2022arXiv

Joint Global and Local Hierarchical Priors for Learned Image Compression

Recently, learned image compression methods have outperformed traditional hand-crafted ones including BPG. One of the keys to this success is learned entropy models that estimate the probability distribution of the quantized latent representation. Like other vision tasks, most recent learned entropy models are based on convolutional neural networks (CNNs). However, CNNs have a limitation in modeling long-range dependencies due to their nature of local connectivity, which can be a significant bottleneck in image compression where reducing spatial redundancy is a key point. To overcome this issue, we propose a novel entropy model called Information Transformer (Informer) that exploits both global and local information in a content-dependent manner using an attention mechanism. Our experiments show that Informer improves rate--distortion performance over the state-of-the-art methods on the Kodak and Tecnick datasets without the quadratic computational complexity problem. Our source code is available at https://github.com/naver-ai/informer.

preprint2022arXiv

Modeling, Quantifying, and Predicting Subjectivity of Image Aesthetics

Assessing image aesthetics is a challenging computer vision task. One reason is that aesthetic preference is highly subjective and may vary significantly among people for certain images. Thus, it is important to properly model and quantify such \textit{subjectivity}, but there has not been much effort to resolve this issue. In this paper, we propose a novel unified probabilistic framework that can model and quantify subjective aesthetic preference based on the subjective logic. In this framework, the rating distribution is modeled as a beta distribution, from which the probabilities of being definitely pleasing, being definitely unpleasing, and being uncertain can be obtained. We use the probability of being uncertain to define an intuitive metric of subjectivity. Furthermore, we present a method to learn deep neural networks for prediction of image aesthetics, which is shown to be effective in improving the performance of subjectivity prediction via experiments. We also present an application scenario where the framework is beneficial for aesthetics-based image recommendation.

preprint2022arXiv

TREND: Truncated Generalized Normal Density Estimation of Inception Embeddings for GAN Evaluation

Evaluating image generation models such as generative adversarial networks (GANs) is a challenging problem. A common approach is to compare the distributions of the set of ground truth images and the set of generated test images. The Frechét Inception distance is one of the most widely used metrics for evaluation of GANs, which assumes that the features from a trained Inception model for a set of images follow a normal distribution. In this paper, we argue that this is an over-simplified assumption, which may lead to unreliable evaluation results, and more accurate density estimation can be achieved using a truncated generalized normal distribution. Based on this, we propose a novel metric for accurate evaluation of GANs, named TREND (TRuncated gEneralized Normal Density estimation of inception embeddings). We demonstrate that our approach significantly reduces errors of density estimation, which consequently eliminates the risk of faulty evaluation results. Furthermore, we show that the proposed metric significantly improves robustness of evaluation results against variation of the number of image samples.

preprint2021arXiv

Ambiguity of Objective Image Quality Metrics: A New Methodology for Performance Evaluation

Objective image quality metrics try to estimate the perceptual quality of the given image by considering the characteristics of the human visual system. However, it is possible that the metrics produce different quality scores even for two images that are perceptually indistinguishable by human viewers, which have not been considered in the existing studies related to objective quality assessment. In this paper, we address the issue of ambiguity of objective image quality assessment. We propose an approach to obtain an ambiguity interval of an objective metric, within which the quality score difference is not perceptually significant. In particular, we use the visual difference predictor, which can consider viewing conditions that are important for visual quality perception. In order to demonstrate the usefulness of the proposed approach, we conduct experiments with 33 state-of-the-art image quality metrics in the viewpoint of their accuracy and ambiguity for three image quality databases. The results show that the ambiguity intervals can be applied as an additional figure of merit when conventional performance measurement does not determine superiority between the metrics. The effect of the viewing distance on the ambiguity interval is also shown.

preprint2021arXiv

Emotional EEG Classification using Connectivity Features and Convolutional Neural Networks

Convolutional neural networks (CNNs) are widely used to recognize the user's state through electroencephalography (EEG) signals. In the previous studies, the EEG signals are usually fed into the CNNs in the form of high-dimensional raw data. However, this approach makes it difficult to exploit the brain connectivity information that can be effective in describing the functional brain network and estimating the perceptual state of the user. We introduce a new classification system that utilizes brain connectivity with a CNN and validate its effectiveness via the emotional video classification by using three different types of connectivity measures. Furthermore, two data-driven methods to construct the connectivity matrix are proposed to maximize classification performance. Further analysis reveals that the level of concentration of the brain connectivity related to the emotional property of the target video is correlated with classification performance.

preprint2021arXiv

Local Critic Training for Model-Parallel Learning of Deep Neural Networks

In this paper, we propose a novel model-parallel learning method, called local critic training, which trains neural networks using additional modules called local critic networks. The main network is divided into several layer groups and each layer group is updated through error gradients estimated by the corresponding local critic network. We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs). In addition, we demonstrate that the proposed method is guaranteed to converge to a critical point. We also show that trained networks by the proposed method can be used for structural optimization. Experimental results show that our method achieves satisfactory performance, reduces training time greatly, and decreases memory consumption per machine. Code is available at https://github.com/hjdw2/Local-critic-training.

preprint2021arXiv

Wide Color Gamut Image Content Characterization: Method, Evaluation, and Applications

In this paper, we propose a novel framework to characterize a wide color gamut image content based on perceived quality due to the processes that change color gamut, and demonstrate two practical use cases where the framework can be applied. We first introduce the main framework and implementation details. Then, we provide analysis for understanding of existing wide color gamut datasets with quantitative characterization criteria on their characteristics, where four criteria, i.e., coverage, total coverage, uniformity, and total uniformity, are proposed. Finally, the framework is applied to content selection in a gamut mapping evaluation scenario in order to enhance reliability and robustness of the evaluation results. As a result, the framework fulfils content characterization for studies where quality of experience of wide color gamut stimuli is involved.

preprint2020arXiv

AIM 2020 Challenge on Efficient Super-Resolution: Methods and Results

This paper reviews the AIM 2020 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor x4 based on a set of prior examples of low and corresponding high resolution images. The goal is to devise a network that reduces one or several aspects such as runtime, parameter count, FLOPs, activations, and memory consumption while at least maintaining PSNR of MSRResNet. The track had 150 registered participants, and 25 teams submitted the final results. They gauge the state-of-the-art in efficient single image super-resolution.

preprint2020arXiv

EmbraceNet for Activity: A Deep Multimodal Fusion Architecture for Activity Recognition

Human activity recognition using multiple sensors is a challenging but promising task in recent decades. In this paper, we propose a deep multimodal fusion model for activity recognition based on the recently proposed feature fusion architecture named EmbraceNet. Our model processes each sensor data independently, combines the features with the EmbraceNet architecture, and post-processes the fused feature to predict the activity. In addition, we propose additional processes to boost the performance of our model. We submit the results obtained from our proposed model to the SHL recognition challenge with the team name "Yonsei-MCML."

preprint2020arXiv

MAMNet: Multi-path Adaptive Modulation Network for Image Super-Resolution

In recent years, single image super-resolution (SR) methods based on deep convolutional neural networks (CNNs) have made significant progress. However, due to the non-adaptive nature of the convolution operation, they cannot adapt to various characteristics of images, which limits their representational capability and, consequently, results in unnecessarily large model sizes. To address this issue, we propose a novel multi-path adaptive modulation network (MAMNet). Specifically, we propose a multi-path adaptive modulation block (MAMB), which is a lightweight yet effective residual block that adaptively modulates residual feature responses by fully exploiting their information via three paths. The three paths model three types of information suitable for SR: 1) channel-specific information (CSI) using global variance pooling, 2) inter-channel dependencies (ICD) based on the CSI, 3) and channel-specific spatial dependencies (CSD) via depth-wise convolution. We demonstrate that the proposed MAMB is effective and parameter-efficient for image SR than other feature modulation methods. In addition, experimental results show that our MAMNet outperforms most of the state-of-the-art methods with a relatively small number of parameters.

preprint2020arXiv

SRZoo: An integrated repository for super-resolution using deep learning

Deep learning-based image processing algorithms, including image super-resolution methods, have been proposed with significant improvement in performance in recent years. However, their implementations and evaluations are dispersed in terms of various deep learning frameworks and various evaluation criteria. In this paper, we propose an integrated repository for the super-resolution tasks, named SRZoo, to provide state-of-the-art super-resolution models in a single place. Our repository offers not only converted versions of existing pre-trained models, but also documentation and toolkits for converting other models. In addition, SRZoo provides platform-agnostic image reconstruction tools to obtain super-resolved images and evaluate the performance in place. It also brings the opportunity of extension to advanced image-based researches and other image processing models. The software, documentation, and pre-trained models are publicly available on GitHub.

preprint2016arXiv

QoE-aware Scalable Video Transmission in MIMO~Systems

An important concept in wireless systems has been quality of experience (QoE)-aware video transmission. Such communications are considered not only connection-based communications but also content-aware communications, since the video quality is closely related to the content itself. It becomes necessary therefore for video communications to utilize a cross-layer design (also known as joint source and channel coding). To provide efficient methods of allocating network resources, the wireless network uses its cross-layer knowledge to perform unequal error protection (UEP) solutions. In this article, we summarize the latest video transmission technologies that are based on scalable video coding (SVC) over multiple-input multiple-output (MIMO) systems with cross-layer designs. To provide insight into video transmission in wireless networks, we investigate UEP solutions in the delivering of video over massive MIMO systems. Our results show that in terms of quality of experience (QoE), SVC layer prioritization, which was considered important in the prior work, is not always beneficial in massive MIMO systems; consideration must be given to the content characteristics.

Jong-Seok Lee

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Coarse-to-Fine: Progressive Image Compression for Semantically Hierarchical Classification

Deep Image Destruction: Vulnerability of Deep Image-to-Image Models against Adversarial Attacks

Joint Global and Local Hierarchical Priors for Learned Image Compression

Modeling, Quantifying, and Predicting Subjectivity of Image Aesthetics

TREND: Truncated Generalized Normal Density Estimation of Inception Embeddings for GAN Evaluation

Ambiguity of Objective Image Quality Metrics: A New Methodology for Performance Evaluation

Emotional EEG Classification using Connectivity Features and Convolutional Neural Networks

Local Critic Training for Model-Parallel Learning of Deep Neural Networks

Wide Color Gamut Image Content Characterization: Method, Evaluation, and Applications

AIM 2020 Challenge on Efficient Super-Resolution: Methods and Results

EmbraceNet for Activity: A Deep Multimodal Fusion Architecture for Activity Recognition

MAMNet: Multi-path Adaptive Modulation Network for Image Super-Resolution

SRZoo: An integrated repository for super-resolution using deep learning

QoE-aware Scalable Video Transmission in MIMO~Systems