Source author record

Jaesik Choi

Jaesik Choi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision Artificial Intelligence Robotics Computation and Language Cryptography and Security Distributed, Parallel, and Cluster Computing Information Theory math.IT Neural and Evolutionary Computing

Catalog footprint

What is connected

21works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

LoCO: Low-rank Compositional Rotation Fine-tuning

Parameter-efficient fine-tuning (PEFT) has emerged as an critical technique for adapting large-scale foundation models across natural language processing and computer vision. While existing methods such as low-rank adaptations achieve parameter efficiency via low-rank weight updates, they are limited in their ability to preserve the geometric structure of pretrained representations. We introduce Low-rank Compositional Orthogonal fine-tuning (LoCO), a novel PEFT method that constructs orthogonal transformations through low-rank skew-symmetric matrices and compositional rotation chains. We propose an approximation scheme that enables fully parallel computation of compositional rotations, making the approach practical for high-dimensional feature spaces. Our method maintains low computational complexity while maintaining orthogonality with controlled approximation error. We validate LoCO across diverse domains, including diffusion transformer fine-tuning, vision transformer adaptation, and language model adaptation. Our method demonstrates superior or competitive performance compared to both existing orthogonal and non-orthogonal methods.

preprint2026arXiv

Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution

Feature attribution is central to diagnosing and trusting deep neural networks, and Integrated Gradients (IG) is widely used due to its axiomatic properties. However, IG can yield unreliable explanations when the integration path between a baseline and the input passes through regions with noisy gradients. While Guided Integrated Gradients reduces this sensitivity by adaptively updating low-gradient-magnitude features, input-space guidance still produces intermediate inputs that deviate from the data manifold. To address this limitation, we propose \emph{Manifold-Aligned Guided Integrated Gradients} (MA-GIG), which constructs attribution paths in the latent space of a pre-trained variational autoencoder. By decoding intermediate latent states, MA-GIG biases the path toward the learned generative manifold and reduces exposure to implausible input-space regions. Through qualitative and quantitative evaluations, we demonstrate that MA-GIG produces faithful explanations by aggregating gradients on path features proximal to the input. Consequently, our method reduces off-manifold noise and outperforms prior path-based attribution methods across multiple datasets and classifiers. Our code is available at https://github.com/leekwoon/ma-gig/.

preprint2026arXiv

Refining Compositional Diffusion for Reliable Long-Horizon Planning

Compositional diffusion planning generates long-horizon trajectories by stitching together overlapping short-horizon segments through score composition. However, when local plan distributions are multimodal, existing compositional methods suffer from mode-averaging, where averaging incompatible local modes leads to plans that are neither locally feasible nor globally coherent. We propose Refining Compositional Diffusion (RCD), a training-free guidance method that steers compositional sampling toward high-density, globally coherent plans. RCD leverages the self-reconstruction error of a pretrained diffusion model as a proxy for the log-density of composed plans, combined with an overlap consistency term that enforces consistency at segment boundaries. We show that the combined guidance concentrates sampling on high-density plans that mitigate mode-averaging. Experiments on challenging long-horizon tasks from OGBench, including locomotion, object manipulation, and pixel-based observations, demonstrate that RCD consistently outperforms existing methods.

preprint2026arXiv

Spectral Integrated Gradients for Coarse-to-Fine Feature Attribution

Integrated Gradients (IG) is a widely adopted feature attribution method that satisfies desirable axiomatic properties. However, the choice of integration path significantly affects the quality of attributions, and the standard straight-line path introduces all input features simultaneously, often accumulating noisy gradients along the way. To address this limitation, we propose Spectral Integrated Gradients, which constructs integration paths based on singular value decomposition (SVD) of the baseline-to-input difference. By progressively activating singular components from largest to smallest, SIG introduces global structure before fine-grained details, naturally following a coarse-to-fine progression. Through extensive evaluation across diverse image classification datasets, we demonstrate that SIG produces cleaner attribution maps with reduced noise and achieves improved quantitative performance compared to existing path-based attribution methods. Our code is available at https://github.com/leekwoon/sig/.

preprint2022arXiv

Can We Find Neurons that Cause Unrealistic Images in Deep Generative Networks?

Even though Generative Adversarial Networks (GANs) have shown a remarkable ability to generate high-quality images, GANs do not always guarantee the generation of photorealistic images. Occasionally, they generate images that have defective or unnatural objects, which are referred to as 'artifacts'. Research to investigate why these artifacts emerge and how they can be detected and removed has yet to be sufficiently carried out. To analyze this, we first hypothesize that rarely activated neurons and frequently activated neurons have different purposes and responsibilities for the progress of generating images. In this study, by analyzing the statistics and the roles for those neurons, we empirically show that rarely activated neurons are related to the failure results of making diverse objects and inducing artifacts. In addition, we suggest a correction method, called 'Sequential Ablation', to repair the defective part of the generated images without high computational cost and manual efforts.

preprint2022arXiv

Rarity Score : A New Metric to Evaluate the Uncommonness of Synthesized Images

Evaluation metrics in image synthesis play a key role to measure performances of generative models. However, most metrics mainly focus on image fidelity. Existing diversity metrics are derived by comparing distributions, and thus they cannot quantify the diversity or rarity degree of each generated image. In this work, we propose a new evaluation metric, called `rarity score', to measure the individual rarity of each image synthesized by generative models. We first show empirical observation that common samples are close to each other and rare samples are far from each other in nearest-neighbor distances of feature space. We then use our metric to demonstrate that the extent to which different generative models produce rare images can be effectively compared. We also propose a method to compare rarities between datasets that share the same concept such as CelebA-HQ and FFHQ. Finally, we analyze the use of metrics in different designs of feature spaces to better understand the relationship between feature spaces and resulting sparse images. Code will be publicly available online for the research community.

preprint2022arXiv

Why Do Neural Language Models Still Need Commonsense Knowledge to Handle Semantic Variations in Question Answering?

Many contextualized word representations are now learned by intricate neural network models, such as masked neural language models (MNLMs) which are made up of huge neural network structures and trained to restore the masked text. Such representations demonstrate superhuman performance in some reading comprehension (RC) tasks which extract a proper answer in the context given a question. However, identifying the detailed knowledge trained in MNLMs is challenging owing to numerous and intermingled model parameters. This paper provides new insights and empirical analyses on commonsense knowledge included in pretrained MNLMs. First, we use a diagnostic test that evaluates whether commonsense knowledge is properly trained in MNLMs. We observe that a large proportion of commonsense knowledge is not appropriately trained in MNLMs and MNLMs do not often understand the semantic meaning of relations accurately. In addition, we find that the MNLM-based RC models are still vulnerable to semantic variations that require commonsense knowledge. Finally, we discover the fundamental reason why some knowledge is not trained. We further suggest that utilizing an external commonsense knowledge repository can be an effective solution. We exemplify the possibility to overcome the limitations of the MNLM-based RC models by enriching text with the required knowledge from an external commonsense knowledge repository in controlled experiments.

preprint2021arXiv

Learning Compositional Sparse Gaussian Processes with a Shrinkage Prior

Choosing a proper set of kernel functions is an important problem in learning Gaussian Process (GP) models since each kernel structure has different model complexity and data fitness. Recently, automatic kernel composition methods provide not only accurate prediction but also attractive interpretability through search-based methods. However, existing methods suffer from slow kernel composition learning. To tackle large-scaled data, we propose a new sparse approximate posterior for GPs, MultiSVGP, constructed from groups of inducing points associated with individual additive kernels in compositional kernels. We demonstrate that this approximation provides a better fit to learn compositional kernels given empirical observations. We also provide theoretically justification on error bound when compared to the traditional sparse GP. In contrast to the search-based approach, we present a novel probabilistic algorithm to learn a kernel composition by handling the sparsity in the kernel selection with Horseshoe prior. We demonstrate that our model can capture characteristics of time series with significant reductions in computational time and have competitive regression performance on real-world data sets.

preprint2020arXiv

Confirmatory Bayesian Online Change Point Detection in the Covariance Structure of Gaussian Processes

In the analysis of sequential data, the detection of abrupt changes is important in predicting future changes. In this paper, we propose statistical hypothesis tests for detecting covariance structure changes in locally smooth time series modeled by Gaussian Processes (GPs). We provide theoretically justified thresholds for the tests, and use them to improve Bayesian Online Change Point Detection (BOCPD) by confirming statistically significant changes and non-changes. Our Confirmatory BOCPD (CBOCPD) algorithm finds multiple structural breaks in GPs even when hyperparameters are not tuned precisely. We also provide conditions under which CBOCPD provides the lower prediction error compared to BOCPD. Experimental results on synthetic and real-world datasets show that our new tests correctly detect changes in the covariance structure in GPs. The proposed algorithm also outperforms existing methods for the prediction of nonstationarity in terms of both regression error and log likelihood.

preprint2020arXiv

HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism

Deep Neural Network (DNN) models have continuously been growing in size in order to improve the accuracy and quality of the models. Moreover, for training of large DNN models, the use of heterogeneous GPUs is inevitable due to the short release cycle of new GPU architectures. In this paper, we investigate how to enable training of large DNN models on a heterogeneous GPU cluster that possibly includes whimpy GPUs that, as a standalone, could not be used for training. We present a DNN training system, HetPipe (Heterogeneous Pipeline), that integrates pipelined model parallelism (PMP) with data parallelism (DP). In HetPipe, a group of multiple GPUs, called a virtual worker, processes minibatches in a pipelined manner, and multiple such virtual workers employ data parallelism for higher performance. We also propose a novel parameter synchronization model, which we refer to as Wave Synchronous Parallel (WSP) to accommodate both PMP and DP for virtual workers, and provide convergence proof of WSP. Our experimental results on a given heterogeneous setting show that with HetPipe, DNN models converge up to 49% faster compared to the state-of-the-art DP technique.

preprint2020arXiv

Interpretation of Deep Temporal Representations by Selective Visualization of Internally Activated Nodes

Recently deep neural networks demonstrate competitive performances in classification and regression tasks for many temporal or sequential data. However, it is still hard to understand the classification mechanisms of temporal deep neural networks. In this paper, we propose two new frameworks to visualize temporal representations learned from deep neural networks. Given input data and output, our algorithm interprets the decision of temporal neural network by extracting highly activated periods and visualizes a sub-sequence of input data which contributes to activate the units. Furthermore, we characterize such sub-sequences with clustering and calculate the uncertainty of the suggested type and actual data. We also suggest Layer-wise Relevance from the output of a unit, not from the final output, with backward Monte-Carlo dropout to show the relevance scores of each input point to activate units with providing a visual representation of the uncertainty about this impact.

preprint2020arXiv

Make Hawkes Processes Explainable by Decomposing Self-Triggering Kernels

Hawkes Processes capture self-excitation and mutual-excitation between events when the arrival of an event makes future events more likely to happen. Identification of such temporal covariance can reveal the underlying structure to better predict future events. In this paper, we present a new framework to decompose discrete events with a composition of multiple self-triggering kernels. The composition scheme allows us to decompose empirical covariance densities into the sum or the product of base kernels which are easily interpretable. Here, we present the first multiplicative kernel composition methods for Hawkes Processes. We demonstrate that the new automatic kernel decomposition procedure outperforms the existing methods on the prediction of discrete events in real-world data.

preprint2016arXiv

An End-to-End Robot Architecture to Manipulate Non-Physical State Changes of Objects

With the advance in robotic hardware and intelligent software, humanoid robot plays an important role in various tasks including service for human assistance and heavy job for hazardous industry. Recent advances in task learning enable humanoid robots to conduct dexterous manipulation tasks such as grasping objects and assembling parts of furniture. Operating objects without physical movements is an even more challenging task for humanoid robot because effects of actions may not be clearly seen in the physical configuration space and meaningful actions could be very complex in a long time horizon. As an example, playing a mobile game in a smart device has such challenges because both swipe actions and complex state transitions inside the smart devices in a long time horizon. In this paper, we solve this problem by introducing an integrated architecture which connects end-to-end dataflow from sensors to actuators in a humanoid robot to operate smart devices. We implement our integrated architecture in the Baxter Research Robot and experimentally demonstrate that the robot with our architecture could play a challenging mobile game, the 2048 game, as accurate as in a simulated environment.

preprint2016arXiv

Automatic Generation of Probabilistic Programming from Time Series Data

Probabilistic programming languages represent complex data with intermingled models in a few lines of code. Efficient inference algorithms in probabilistic programming languages make possible to build unified frameworks to compute interesting probabilities of various large, real-world problems. When the structure of model is given, constructing a probabilistic program is rather straightforward. Thus, main focus have been to learn the best model parameters and compute marginal probabilities. In this paper, we provide a new perspective to build expressive probabilistic program from continue time series data when the structure of model is not given. The intuition behind of our method is to find a descriptive covariance structure of time series data in nonparametric Gaussian process regression. We report that such descriptive covariance structure efficiently derives a probabilistic programming description accurately.

preprint2016arXiv

Global Deconvolutional Networks for Semantic Segmentation

Semantic image segmentation is a principal problem in computer vision, where the aim is to correctly classify each individual pixel of an image into a semantic label. Its widespread use in many areas, including medical imaging and autonomous driving, has fostered extensive research in recent years. Empirical improvements in tackling this task have primarily been motivated by successful exploitation of Convolutional Neural Networks (CNNs) pre-trained for image classification and object recognition. However, the pixel-wise labelling with CNNs has its own unique challenges: (1) an accurate deconvolution, or upsampling, of low-resolution output into a higher-resolution segmentation mask and (2) an inclusion of global information, or context, within locally extracted features. To address these issues, we propose a novel architecture to conduct the equivalent of the deconvolution operation globally and acquire dense predictions. We demonstrate that it leads to improved performance of state-of-the-art semantic segmentation models on the PASCAL VOC 2012 benchmark, reaching 74.0% mean IU accuracy on the test set.

preprint2016arXiv

Improving Imprecise Compressive Sensing Models

Random sampling in compressive sensing (CS) enables the compression of large amounts of input signals in an efficient manner, which is useful for many applications. CS reconstructs the compressed signals exactly with overwhelming probability when incoming data can be sparsely represented with a few components. However, the theory of CS framework including random sampling has been focused on exact recovery of signal; impreciseness in signal recovery has been neglected. This can be problematic when there is uncertainty in the number of sparse components such as signal sparsity in dynamic systems that can change over time. We present a new theoretical framework that handles uncertainty in signal recovery from the perspective of recovery success and quality. We show that the signal recovery success in our model is more accurate than the success probability analysis in the CS framework. Our model is then extended to the case where the success or failure of signal recovery can be relaxed. We represent the number of components included in signal recovery with a right-tailed distribution and focus on recovery quality. Experimental results confirm the accuracy of our model in dynamic systems.

preprint2016arXiv

Searching for Topological Symmetry in Data Haystack

Finding interesting symmetrical topological structures in high-dimensional systems is an important problem in statistical machine learning. Limited amount of available high-dimensional data and its sensitivity to noise pose computational challenges to find symmetry. Our paper presents a new method to find local symmetries in a low-dimensional 2-D grid structure which is embedded in high-dimensional structure. To compute the symmetry in a grid structure, we introduce three legal grid moves (i) Commutation (ii) Cyclic Permutation (iii) Stabilization on sets of local grid squares, grid blocks. The three grid moves are legal transformations as they preserve the statistical distribution of hamming distances in each grid block. We propose and coin the term of grid symmetry of data on the 2-D data grid as the invariance of statistical distributions of hamming distance are preserved after a sequence of grid moves. We have computed and analyzed the grid symmetry of data on multivariate Gaussian distributions and Gamma distributions with noise.

preprint2016arXiv

The Automatic Statistician: A Relational Perspective

Gaussian Processes (GPs) provide a general and analytically tractable way of modeling complex time-varying, nonparametric functions. The Automatic Bayesian Covariance Discovery (ABCD) system constructs natural-language description of time-series data by treating unknown time-series data nonparametrically using GP with a composite covariance kernel function. Unfortunately, learning a composite covariance kernel with a single time-series data set often results in less informative kernel that may not give qualitative, distinctive descriptions of data. We address this challenge by proposing two relational kernel learning methods which can model multiple time-series data sets by finding common, shared causes of changes. We show that the relational kernel learning methods find more accurate models for regression problems on several real-world data sets; US stock data, US house price index data and currency exchange rate data.

preprint2015arXiv

Learning Execution Contexts from System Call Distributions for Intrusion Detection in Embedded Systems

Existing techniques used for intrusion detection do not fully utilize the intrinsic properties of embedded systems. In this paper, we propose a lightweight method for detecting anomalous executions using a distribution of system call frequencies. We use a cluster analysis to learn the legitimate execution contexts of embedded applications and then monitor them at run-time to capture abnormal executions. We also present an architectural framework with minor processor modifications to aid in this process. Our prototype shows that the proposed method can effectively detect anomalous executions without relying on sophisticated analyses or affecting the critical execution paths.

preprint2012arXiv

Lifted Inference for Relational Continuous Models

Relational Continuous Models (RCMs) represent joint probability densities over attributes of objects, when the attributes have continuous domains. With relational representations, they can model joint probability distributions over large numbers of variables compactly in a natural way. This paper presents a new exact lifted inference algorithm for RCMs, thus it scales up to large models of real world applications. The algorithm applies to Relational Pairwise Models which are (relational) products of potentials of arity 2. Our algorithm is unique in two ways. First, it substantially improves the efficiency of lifted inference with variables of continuous domains. When a relational model has Gaussian potentials, it takes only linear-time compared to cubic time of previous methods. Second, it is the first exact inference algorithm which handles RCMs in a lifted way. The algorithm is illustrated over an example from econometrics. Experimental results show that our algorithm outperforms both a groundlevel inference algorithm and an algorithm built with previously-known lifted methods.

preprint2012arXiv

Lifted Relational Variational Inference

Hybrid continuous-discrete models naturally represent many real-world applications in robotics, finance, and environmental engineering. Inference with large-scale models is challenging because relational structures deteriorate rapidly during inference with observations. The main contribution of this paper is an efficient relational variational inference algorithm that factors largescale probability models into simpler variational models, composed of mixtures of iid (Bernoulli) random variables. The algorithm takes probability relational models of largescale hybrid systems and converts them to a close-to-optimal variational models. Then, it efficiently calculates marginal probabilities on the variational models by using a latent (or lifted) variable elimination or a lifted stochastic sampling. This inference is unique because it maintains the relational structure upon individual observations and during inference steps.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2605.02167:author:4:jaesik-choi

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.03075:author:4:jaesik-choi

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.19607:author:4:jaesik-choi

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.15916:author:2:jaesik-choi

Imported May 20, 2026Synced May 21, 2026

7 works

Anh Tong

Researcher

Anh Tong contributes to research discovery and scholarly infrastructure.

Open to collaborate

4 works

Kyowoon Lee

Researcher

Kyowoon Lee contributes to research discovery and scholarly infrastructure.

Open to collaborate

3 works

Jiyeon Han

Researcher

Jiyeon Han contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Eyal Amir

Researcher

Eyal Amir contributes to research discovery and scholarly infrastructure.

Open to collaborate

Jaesik Choi

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

LoCO: Low-rank Compositional Rotation Fine-tuning

Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution

Refining Compositional Diffusion for Reliable Long-Horizon Planning

Spectral Integrated Gradients for Coarse-to-Fine Feature Attribution

Can We Find Neurons that Cause Unrealistic Images in Deep Generative Networks?

Rarity Score : A New Metric to Evaluate the Uncommonness of Synthesized Images

Why Do Neural Language Models Still Need Commonsense Knowledge to Handle Semantic Variations in Question Answering?

Learning Compositional Sparse Gaussian Processes with a Shrinkage Prior

Confirmatory Bayesian Online Change Point Detection in the Covariance Structure of Gaussian Processes

HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism

Interpretation of Deep Temporal Representations by Selective Visualization of Internally Activated Nodes

Make Hawkes Processes Explainable by Decomposing Self-Triggering Kernels

An End-to-End Robot Architecture to Manipulate Non-Physical State Changes of Objects

Automatic Generation of Probabilistic Programming from Time Series Data

Global Deconvolutional Networks for Semantic Segmentation

Improving Imprecise Compressive Sensing Models

Searching for Topological Symmetry in Data Haystack

The Automatic Statistician: A Relational Perspective

Learning Execution Contexts from System Call Distributions for Intrusion Detection in Embedded Systems

Lifted Inference for Relational Continuous Models

Lifted Relational Variational Inference