Source author record

Dongjin Lee

Dongjin Lee appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Databases math.NA Neural and Evolutionary Computing Numerical Analysis Social and Information Networks Artificial Intelligence cond-mat.mtrl-sci eess.AS physics.comp-ph Sound Systems and Control

Catalog footprint

What is connected

11works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Data-driven dimensionally decomposed generalized polynomial chaos expansion for forward uncertainty quantification

Dimensionally decomposed generalized polynomial chaos expansion (DD-GPCE) efficiently performs forward uncertainty quantification (UQ) in complex engineering systems with high-dimensional random inputs of arbitrary distributions. However, constructing the measure-consistent orthonormal polynomial bases in DD-GPCE requires prior knowledge of input distributions, which is often unavailable in practice. This work introduces a data-driven DD-GPCE method that eliminates the need for such prior knowledge, extending its applicability to UQ with high-dimensional inputs. Input distributions are inferred directly from sample data using smoothed-bootstrap kernel density estimation (KDE), while the DD-GPCE framework enables KDE to handle high-dimensional inputs through low-dimensional marginal estimation. We then use the estimated input distributions to perform a whitening transformation via Monte Carlo Simulation, which enables generation of measure-consistent orthonormal basis functions. We demonstrate the accuracy of the proposed method in both mathematical examples and stochastic dynamic analysis for a practical three-dimensional mobility design involving twenty random inputs. The results indicate that the proposed method produces more accurate estimates of the output mean and variance compared to the conventional data-driven approach that assumes Gaussian input distributions.

preprint2023arXiv

Bi-fidelity conditional value-at-risk estimation by dimensionally decomposed generalized polynomial chaos expansion

Digital twin models allow us to continuously assess the possible risk of damage and failure of a complex system. Yet high-fidelity digital twin models can be computationally expensive, making quick-turnaround assessment challenging. Towards this goal, this article proposes a novel bi-fidelity method for estimating the conditional value-at-risk (CVaR) for nonlinear systems subject to dependent and high-dimensional inputs. For models that can be evaluated fast, a method that integrates the dimensionally decomposed generalized polynomial chaos expansion (DD-GPCE) approximation with a standard sampling-based CVaR estimation is proposed. For expensive-to-evaluate models, a new bi-fidelity method is proposed that couples the DD-GPCE with a Fourier-polynomial expansion of the mapping between the stochastic low-fidelity and high-fidelity output data to ensure computational efficiency. The method employs measure-consistent orthonormal polynomials in the random variable of the low-fidelity output to approximate the high-fidelity output. Numerical results for a structural mechanics truss with 36-dimensional (dependent random variable) inputs indicate that the DD-GPCE method provides very accurate CVaR estimates that require much lower computational effort than standard GPCE approximations. A second example considers the realistic problem of estimating the risk of damage to a fiber-reinforced composite laminate. The high-fidelity model is a finite element simulation that is prohibitively expensive for risk analysis, such as CVaR computation. Here, the novel bi-fidelity method can accurately estimate CVaR as it includes low-fidelity models in the estimation procedure and uses only a few high-fidelity model evaluations to significantly increase accuracy.

preprint2023arXiv

Graphlets over Time: A New Lens for Temporal Network Analysis

Graphs are widely used for modeling various types of interactions, such as email communications and online discussions. Many of such real-world graphs are temporal, and specifically, they grow over time with new nodes and edges. Counting the instances of each graphlet (i.e., an induced subgraph isomorphism class) has been successful in characterizing local structures of graphs, with many applications. While graphlets have been extended for temporal graphs, the extensions are designed for examining temporally-local subgraphs composed of edges with close arrival times, instead of long-term changes in local structures. In this paper, as a new lens for temporal graph analysis, we study the evolution of distributions of graphlet instances over time in real-world graphs at three different levels (graphs, nodes, and edges). At the graph level, we first discover that the evolution patterns are significantly different from those in random graphs. Then, we suggest a graphlet transition graph for measuring the similarity of the evolution patterns of graphs, and we find out a surprising similarity between the graphs from the same domain. At the node and edge levels, we demonstrate that the local structures around nodes and edges in their early stage provide a strong signal regarding their future importance. In particular, we significantly improve the predictability of the future importance of nodes and edges using the counts of the roles (a.k.a., orbits) that they take within graphlets.

preprint2023arXiv

I'm Me, We're Us, and I'm Us: Tri-directional Contrastive Learning on Hypergraphs

Although machine learning on hypergraphs has attracted considerable attention, most of the works have focused on (semi-)supervised learning, which may cause heavy labeling costs and poor generalization. Recently, contrastive learning has emerged as a successful unsupervised representation learning method. Despite the prosperous development of contrastive learning in other domains, contrastive learning on hypergraphs remains little explored. In this paper, we propose TriCL (Tri-directional Contrastive Learning), a general framework for contrastive learning on hypergraphs. Its main idea is tri-directional contrast, and specifically, it aims to maximize in two augmented views the agreement (a) between the same node, (b) between the same group of nodes, and (c) between each group and its members. Together with simple but surprisingly effective data augmentation and negative sampling schemes, these three forms of contrast enable TriCL to capture both microscopic and mesoscopic structural information in node embeddings. Our extensive experiments using 13 baseline approaches, five datasets, and two tasks demonstrate the effectiveness of TriCL, and most noticeably, TriCL consistently outperforms not just unsupervised competitors but also (semi-)supervised competitors mostly by significant margins for node classification. The code and datasets are available at https://github.com/wooner49/TriCL.

preprint2022arXiv

AutoSNN: Towards Energy-Efficient Spiking Neural Networks

Spiking neural networks (SNNs) that mimic information transmission in the brain can energy-efficiently process spatio-temporal information through discrete and sparse spikes, thereby receiving considerable attention. To improve accuracy and energy efficiency of SNNs, most previous studies have focused solely on training methods, and the effect of architecture has rarely been studied. We investigate the design choices used in the previous studies in terms of the accuracy and number of spikes and figure out that they are not best-suited for SNNs. To further improve the accuracy and reduce the spikes generated by SNNs, we propose a spike-aware neural architecture search framework called AutoSNN. We define a search space consisting of architectures without undesirable design choices. To enable the spike-aware architecture search, we introduce a fitness that considers both the accuracy and number of spikes. AutoSNN successfully searches for SNN architectures that outperform hand-crafted SNNs in accuracy and energy efficiency. We thoroughly demonstrate the effectiveness of AutoSNN on various datasets including neuromorphic datasets.

preprint2022arXiv

Energy-efficient Knowledge Distillation for Spiking Neural Networks

Spiking neural networks (SNNs) have been gaining interest as energy-efficient alternatives of conventional artificial neural networks (ANNs) due to their event-driven computation. Considering the future deployment of SNN models to constrained neuromorphic devices, many studies have applied techniques originally used for ANN model compression, such as network quantization, pruning, and knowledge distillation, to SNNs. Among them, existing works on knowledge distillation reported accuracy improvements of student SNN model. However, analysis on energy efficiency, which is also an important feature of SNN, was absent. In this paper, we thoroughly analyze the performance of the distilled SNN model in terms of accuracy and energy efficiency. In the process, we observe a substantial increase in the number of spikes, leading to energy inefficiency, when using the conventional knowledge distillation methods. Based on this analysis, to achieve energy efficiency, we propose a novel knowledge distillation method with heterogeneous temperature parameters. We evaluate our method on two different datasets and show that the resulting SNN student satisfies both accuracy improvement and reduction of the number of spikes. On MNIST dataset, our proposed student SNN achieves up to 0.09% higher accuracy and produces 65% less spikes compared to the student SNN trained with conventional knowledge distillation method. We also compare the results with other SNN compression techniques and training methods.

preprint2022arXiv

Machine Composition of Korean Music via Topological Data Analysis and Artificial Neural Network

Common AI music composition algorithms based on artificial neural networks are to train a machine by feeding a large number of music pieces and create artificial neural networks that can produce music similar to the input music data. This approach is a blackbox optimization, that is, the underlying composition algorithm is, in general, not known to users. In this paper, we present a way of machine composition that trains a machine the composition principle embedded in the given music data instead of directly feeding music pieces. We propose this approach by using the concept of {\color{black}{Overlap}} matrix proposed in \cite{TPJ}. In \cite{TPJ}, a type of Korean music, so-called the {\it Dodeuri} music such as Suyeonjangjigok has been analyzed using topological data analysis (TDA), particularly using persistent homology. As the raw music data is not suitable for TDA analysis, the music data is first reconstructed as a graph. The node of the graph is defined as a two-dimensional vector composed of the pitch and duration of each music note. The edge between two nodes is created when those nodes appear consecutively in the music flow. Distance is defined based on the frequency of such appearances. Through TDA on the constructed graph, a unique set of cycles is found for the given music. In \cite{TPJ}, the new concept of the {\it {\color{black}{Overlap}} matrix} has been proposed, which visualizes how those cycles are interconnected over the music flow, in a matrix form. In this paper, we explain how we use the {\color{black}{Overlap}} matrix for machine composition. The {\color{black}{Overlap}} matrix makes it possible to compose a new music piece algorithmically and also provide a seed music towards the desired artificial neural network. In this paper, we use the {\it Dodeuri} music and explain detailed steps.

preprint2022arXiv

Robust Factorization of Real-world Tensor Streams with Patterns, Missing Values, and Outliers

Consider multiple seasonal time series being collected in real-time, in the form of a tensor stream. Real-world tensor streams often include missing entries (e.g., due to network disconnection) and at the same time unexpected outliers (e.g., due to system errors). Given such a real-world tensor stream, how can we estimate missing entries and predict future evolution accurately in real-time? In this work, we answer this question by introducing SOFIA, a robust factorization method for real-world tensor streams. In a nutshell, SOFIA smoothly and tightly integrates tensor factorization, outlier removal, and temporal-pattern detection, which naturally reinforce each other. Moreover, SOFIA integrates them in linear time, in an online manner, despite the presence of missing entries. We experimentally show that SOFIA is (a) robust and accurate: yielding up to 76% lower imputation error and 71% lower forecasting error; (b) fast: up to 935X faster than the second-most accurate competitor; and (c) scalable: scaling linearly with the number of new entries per time step.

preprint2021arXiv

SliceNStitch: Continuous CP Decomposition of Sparse Tensor Streams

Consider traffic data (i.e., triplets in the form of source-destination-timestamp) that grow over time. Tensors (i.e., multi-dimensional arrays) with a time mode are widely used for modeling and analyzing such multi-aspect data streams. In such tensors, however, new entries are added only once per period, which is often an hour, a day, or even a year. This discreteness of tensors has limited their usage for real-time applications, where new data should be analyzed instantly as it arrives. How can we analyze time-evolving multi-aspect sparse data 'continuously' using tensors where time is'discrete'? We propose SLICENSTITCH for continuous CANDECOMP/PARAFAC (CP) decomposition, which has numerous time-critical applications, including anomaly detection, recommender systems, and stock market prediction. SLICENSTITCH changes the starting point of each period adaptively, based on the current time, and updates factor matrices (i.e., outputs of CP decomposition) instantly as new data arrives. We show, theoretically and experimentally, that SLICENSTITCH is (1) 'Any time': updating factor matrices immediately without having to wait until the current time period ends, (2) Fast: with constant-time updates up to 464x faster than online methods, and (3) Accurate: with fitness comparable (specifically, 72 ~ 100%) to offline methods.

preprint2020arXiv

Evaluating reliability of complex systems for Predictive maintenance

Predictive Maintenance (PdM) can only be implemented when the online knowledge of system condition is available, and this has become available with deployment of on-equipment sensors. To date, most studies on predicting the remaining useful lifetime of a system have been focusing on either single-component systems or systems with deterministic reliability structures. This assumption is not applicable on some realistic problems, where there exist uncertainties in reliability structures of complex systems. In this paper, a PdM scheme is developed by employing a Discrete Time Markov Chain (DTMC) for forecasting the health of monitored components and a Bayesian Network (BN) for modeling the multi-component system reliability. Therefore, probabilistic inferences on both the system and its components status can be made and PdM can be scheduled on both levels.

preprint2014arXiv

Novel linear algebraic theory and one-hundred-million-atom quantum material simulations on the K computer

The present paper gives a review of our recent progress and latest results for novel linear-algebraic algorithms and its application to large-scale quantum material simulations or electronic structure calculations. The algorithms are Krylov-subspace (iterative) solvers for generalized shifted linear equations, in the form of (zS-H)x=b,in stead of conventional generalized eigen-value equation. The method was implemented in our order-$N$ calculation code ELSES (http://www.elses.jp/) with modelled systems based on ab initio calculations. The code realized one-hundred-million-atom, or 100-nm-scale, quantum material simulations on the K computer in a high parallel efficiency with up to all the built-in processor cores. The present paper also explains several methodological aspects, such as use of XML files and 'novice' mode for general users. A sparse matrix data library in our real problems (http://www.elses.jp/matrix/) was prepared. Internal eigen-value problem is discussed as a general need from the quantum material simulation. The present study is a interdisciplinary one and is sometimes called 'Application-Algorithm-Architecture co-design'. The co-design will play a crucial role in exa-scale scientific computations.

Dongjin Lee

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Data-driven dimensionally decomposed generalized polynomial chaos expansion for forward uncertainty quantification

Bi-fidelity conditional value-at-risk estimation by dimensionally decomposed generalized polynomial chaos expansion

Graphlets over Time: A New Lens for Temporal Network Analysis

I'm Me, We're Us, and I'm Us: Tri-directional Contrastive Learning on Hypergraphs

AutoSNN: Towards Energy-Efficient Spiking Neural Networks

Energy-efficient Knowledge Distillation for Spiking Neural Networks

Machine Composition of Korean Music via Topological Data Analysis and Artificial Neural Network

Robust Factorization of Real-world Tensor Streams with Patterns, Missing Values, and Outliers

SliceNStitch: Continuous CP Decomposition of Sparse Tensor Streams

Evaluating reliability of complex systems for Predictive maintenance

Novel linear algebraic theory and one-hundred-million-atom quantum material simulations on the K computer