Researcher profile

Juho Lee

Juho Lee contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2022arXiv

Improving Ensemble Distillation With Weight Averaging and Diversifying Perturbation

Ensembles of deep neural networks have demonstrated superior performance, but their heavy computational cost hinders applying them for resource-limited environments. It motivates distilling knowledge from the ensemble teacher into a smaller student network, and there are two important design choices for this ensemble distillation: 1) how to construct the student network, and 2) what data should be shown during training. In this paper, we propose a weight averaging technique where a student with multiple subnetworks is trained to absorb the functional diversity of ensemble teachers, but then those subnetworks are properly averaged for inference, giving a single student network with no additional inference cost. We also propose a perturbation strategy that seeks inputs from which the diversities of teachers can be better transferred to the student. Combining these two, our method significantly improves upon previous methods on various image classification tasks.

preprint2022arXiv

Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Uncertainty

Numerous recent works utilize bi-Lipschitz regularization of neural network layers to preserve relative distances between data instances in the feature spaces of each layer. This distance sensitivity with respect to the data aids in tasks such as uncertainty calibration and out-of-distribution (OOD) detection. In previous works, features extracted with a distance sensitive model are used to construct feature covariance matrices which are used in deterministic uncertainty estimation or OOD detection. However, in cases where there is a distribution over tasks, these methods result in covariances which are sub-optimal, as they may not leverage all of the meta information which can be shared among tasks. With the use of an attentive set encoder, we propose to meta learn either diagonal or diagonal plus low-rank factors to efficiently construct task specific covariance matrices. Additionally, we propose an inference procedure which utilizes scaled energy to achieve a final predictive distribution which is well calibrated under a distributional dataset shift.

preprint2022arXiv

PolarDenseNet: A Deep Learning Model for CSI Feedback in MIMO Systems

In multiple-input multiple-output (MIMO) systems, the high-resolution channel information (CSI) is required at the base station (BS) to ensure optimal performance, especially in the case of multi-user MIMO (MU-MIMO) systems. In the absence of channel reciprocity in frequency division duplex (FDD) systems, the user needs to send the CSI to the BS. Often the large overhead associated with this CSI feedback in FDD systems becomes the bottleneck in improving the system performance. In this paper, we propose an AI-based CSI feedback based on an auto-encoder architecture that encodes the CSI at UE into a low-dimensional latent space and decodes it back at the BS by effectively reducing the feedback overhead while minimizing the loss during recovery. Our simulation results show that the AI-based proposed architecture outperforms the state-of-the-art high-resolution linear combination codebook using the DFT basis adopted in the 5G New Radio (NR) system.

preprint2022arXiv

Scale Mixtures of Neural Network Gaussian Processes

Recent works have revealed that infinitely-wide feed-forward or recurrent neural networks of any architecture correspond to Gaussian processes referred to as Neural Network Gaussian Processes (NNGPs). While these works have extended the class of neural networks converging to Gaussian processes significantly, however, there has been little focus on broadening the class of stochastic processes that such neural networks converge to. In this work, inspired by the scale mixture of Gaussian random variables, we propose the scale mixture of NNGPs for which we introduce a prior distribution on the scale of the last-layer parameters. We show that simply introducing a scale prior on the last-layer parameters can turn infinitely-wide neural networks of any architecture into a richer class of stochastic processes. With certain scale priors, we obtain heavy-tailed stochastic processes, and in the case of inverse gamma priors, we recover Student's $t$ processes. We further analyze the distributions of the neural networks initialized with our prior setting and trained with gradient descents and obtain similar results as for NNGPs. We present a practical posterior-inference algorithm for the scale mixture of NNGPs and empirically demonstrate its usefulness on regression and classification tasks. In particular, we show that in both tasks, the heavy-tailed stochastic processes obtained from our framework are robust to out-of-distribution data.

preprint2022arXiv

Sequential Reptile: Inter-Task Gradient Alignment for Multilingual Learning

Multilingual models jointly pretrained on multiple languages have achieved remarkable performance on various multilingual downstream tasks. Moreover, models finetuned on a single monolingual downstream task have shown to generalize to unseen languages. In this paper, we first show that it is crucial for those tasks to align gradients between them in order to maximize knowledge transfer while minimizing negative transfer. Despite its importance, the existing methods for gradient alignment either have a completely different purpose, ignore inter-task alignment, or aim to solve continual learning problems in rather inefficient ways. As a result of the misaligned gradients between tasks, the model suffers from severe negative transfer in the form of catastrophic forgetting of the knowledge acquired from the pretraining. To overcome the limitations, we propose a simple yet effective method that can efficiently align gradients between tasks. Specifically, we perform each inner-optimization by sequentially sampling batches from all the tasks, followed by a Reptile outer update. Thanks to the gradients aligned between tasks by our method, the model becomes less vulnerable to negative transfer and catastrophic forgetting. We extensively validate our method on various multi-task learning and zero-shot cross-lingual transfer tasks, where our method largely outperforms all the relevant baselines we consider.

preprint2022arXiv

Set Based Stochastic Subsampling

Deep models are designed to operate on huge volumes of high dimensional data such as images. In order to reduce the volume of data these models must process, we propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an \textit{arbitrary} downstream task network (e.g. classifier). In the first stage, we efficiently subsample \textit{candidate elements} using conditionally independent Bernoulli random variables by capturing coarse grained global information using set encoding functions, followed by conditionally dependent autoregressive subsampling of the candidate elements using Categorical random variables by modeling pair-wise interactions using set attention networks in the second stage. We apply our method to feature and instance selection and show that it outperforms the relevant baselines under low subsampling rates on a variety of tasks including image classification, image reconstruction, function reconstruction and few-shot classification. Additionally, for nonparametric models such as Neural Processes that require to leverage the whole training data at inference time, we show that our method enhances the scalability of these models.

preprint2021arXiv

Improving Uncertainty Calibration via Prior Augmented Data

Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators. However, they are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions. The problem of overconfidence becomes especially apparent in cases where the test-time data distribution differs from that which was seen during training. We propose a solution to this problem by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels. Our method results in a better calibrated network and is agnostic to the underlying model structure, so it can be applied to any neural network which produces a probability density as an output. We demonstrate the effectiveness of our method and validate its performance on both classification and regression problems, applying it to recent probabilistic neural network models.

preprint2020arXiv

Cost-effective Interactive Attention Learning with Neural Attention Processes

We propose a novel interactive learning framework which we refer to as Interactive Attention Learning (IAL), in which the human supervisors interactively manipulate the allocated attentions, to correct the model's behavior by updating the attention-generating network. However, such a model is prone to overfitting due to scarcity of human annotations, and requires costly retraining. Moreover, it is almost infeasible for the human annotators to examine attentions on tons of instances and features. We tackle these challenges by proposing a sample-efficient attention mechanism and a cost-effective reranking algorithm for instances and features. First, we propose Neural Attention Process (NAP), which is an attention generator that can update its behavior by incorporating new attention-level supervisions without any retraining. Secondly, we propose an algorithm which prioritizes the instances and the features by their negative impacts, such that the model can yield large improvements with minimal human feedback. We validate IAL on various time-series datasets from multiple domains (healthcare, real-estate, and computer vision) on which it significantly outperforms baselines with conventional attention mechanisms, or without cost-effective reranking, with substantially less retraining and human-model interaction cost.

preprint2020arXiv

Quasi-Fermi level splitting in nanoscale junctions from $\textit{ab initio}$

The splitting of quasi-Fermi levels (QFLs) represents a key concept utilized to describe finite-bias operations of semiconductor devices, but its atomic-scale characterization remains a significant challenge. Herein, the non-equilibrium QFL or electrochemical potential profiles within single-molecule junctions obtained from the newly developed first-principles multi-space constrained-search density functional formalism are presented. Benchmarking the standard non-equilibrium Green's function calculation results, it is first established that algorithmically the notion of separate electrode-originated nonlocal QFLs should be maintained within the channel region during self-consistent finite-bias electronic structure calculations. For the insulating hexandithiolate junction, the QFL profiles exhibit discontinuities at the left and right electrode interfaces and across the molecule the accompanying electrostatic potential drops linearly and Landauer residual-resistivity dipoles are uniformly distributed. For the conducting hexatrienedithiolate junction, on the other hand, the electrode QFLs penetrate into the channel region and produce split QFLs. With the highest occupied molecular orbital entering the bias window and becoming a good transport channel, the split QFLs are also accompanied by the nonlinear electrostatic potential drop and asymmetric Landauer residua-resistivity dipole formation. Our findings underscore the importance of the first-principles extraction of QFLs in nanoscale junctions and point to a new direction for the computational design of next-generation electronic, optoelectronic, and electrochemical devices.

preprint2020arXiv

Sparse Vector Transmission: An Idea Whose Time Has Come

In recent years, we are witnessing bewildering variety of automated services and applications of vehicles, robots, sensors, and machines powered by the artificial intelligence technologies. Communication mechanism associated with these services is dearly distinct from human-centric communications. One important feature for the machine-centric communications is that the amount of information to be transmitted is tiny. In view of the short packet transmission, relying on today's transmission mechanism would not be efficient due to the waste of resources, large decoding latency, and expensive operational cost. In this article, we present an overview of the sparse vector transmission (SVT), a scheme to transmit a short-sized information after the sparse transformation. We discuss basics of SVT, two distinct SVT strategies, viz., frequency-domain sparse transmission and sparse vector coding with detailed operations, and also demonstrate the effectiveness in realistic wireless environments.

preprint2018arXiv

Semimetallicity and Negative Differential Resistance from Hybrid Halide Perovskite Nanowires

In the rapidly progressing field of organometal halide perovskites, the dimensional reduction could open up new opportunities for device applications. Herein, taking the recently synthesized trimethylsulfonium lead triiodide (CH$_3$)$_3$SPbI$_3$ perovskite as a representative example, we carry out first-principles calculations and study the nanostructuring and device application of halide perovskite nanowires. We find that the one-dimensional (1D) (CH$_3$)$_3$SPbI$_3$ structure is structurally stable, and the electronic structures of higher-dimensional forms are robustly determined at the 1D level. Remarkably, due to the face-sharing [PbI$_6$] octahedral atomic structure, the organic ligand-removed 1D PbI$_3$ frameworks are also found to be stable. Moreover, the PbI$_3$ columns avoid the Peierls distortion and assume a semimetallic character, contradicting the conventional assumption of semiconducting metal-halogen inorganic frameworks. Adopting the bundled nanowire junctions consisting of (CH$_3$)$_3$SPbI$_3$ channels with sub-5 nm dimensions sandwiched between PbI$_3$ electrodes, we finally obtain high current densities and large room-temperature negative differential resistance (NDR). It will be emphasized that the NDR originates from the combination of the near-Ohmic character of (CH$_3$)$_3$SPbI$_3$-PbI$_3$ contacts and a novel NDR mechanism that involves the quantum-mechanical hybridization between channel and electrode states. Our work demonstrates the great potential of low-dimensional hybrid perovskites toward advanced electronic devices beyond actively-pursued photonic applications.

preprint2018arXiv

Uncertainty-Aware Attention for Reliable Interpretation and Prediction

Attention mechanism is effective in both focusing the deep learning models on relevant features and interpreting them. However, attentions may be unreliable since the networks that generate them are often trained in a weakly-supervised manner. To overcome this limitation, we introduce the notion of input-dependent uncertainty to the attention mechanism, such that it generates attention for each feature with varying degrees of noise based on the given input, to learn larger variance on instances it is uncertain about. We learn this Uncertainty-aware Attention (UA) mechanism using variational inference, and validate it on various risk prediction tasks from electronic health records on which our model significantly outperforms existing attention models. The analysis of the learned attentions shows that our model generates attentions that comply with clinicians' interpretation, and provide richer interpretation via learned variance. Further evaluation of both the accuracy of the uncertainty calibration and the prediction performance with "I don't know" decision show that UA yields networks with high reliability as well.