Source author record

Juho Lee

Juho Lee appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence cond-mat.mes-hall Information Theory math.IT Computation and Language cond-mat.str-el eess.SP Human-Computer Interaction math.NA

Catalog footprint

What is connected

17works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Improving Ensemble Distillation With Weight Averaging and Diversifying Perturbation

Ensembles of deep neural networks have demonstrated superior performance, but their heavy computational cost hinders applying them for resource-limited environments. It motivates distilling knowledge from the ensemble teacher into a smaller student network, and there are two important design choices for this ensemble distillation: 1) how to construct the student network, and 2) what data should be shown during training. In this paper, we propose a weight averaging technique where a student with multiple subnetworks is trained to absorb the functional diversity of ensemble teachers, but then those subnetworks are properly averaged for inference, giving a single student network with no additional inference cost. We also propose a perturbation strategy that seeks inputs from which the diversities of teachers can be better transferred to the student. Combining these two, our method significantly improves upon previous methods on various image classification tasks.

preprint2022arXiv

Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Uncertainty

Numerous recent works utilize bi-Lipschitz regularization of neural network layers to preserve relative distances between data instances in the feature spaces of each layer. This distance sensitivity with respect to the data aids in tasks such as uncertainty calibration and out-of-distribution (OOD) detection. In previous works, features extracted with a distance sensitive model are used to construct feature covariance matrices which are used in deterministic uncertainty estimation or OOD detection. However, in cases where there is a distribution over tasks, these methods result in covariances which are sub-optimal, as they may not leverage all of the meta information which can be shared among tasks. With the use of an attentive set encoder, we propose to meta learn either diagonal or diagonal plus low-rank factors to efficiently construct task specific covariance matrices. Additionally, we propose an inference procedure which utilizes scaled energy to achieve a final predictive distribution which is well calibrated under a distributional dataset shift.

preprint2022arXiv

PolarDenseNet: A Deep Learning Model for CSI Feedback in MIMO Systems

In multiple-input multiple-output (MIMO) systems, the high-resolution channel information (CSI) is required at the base station (BS) to ensure optimal performance, especially in the case of multi-user MIMO (MU-MIMO) systems. In the absence of channel reciprocity in frequency division duplex (FDD) systems, the user needs to send the CSI to the BS. Often the large overhead associated with this CSI feedback in FDD systems becomes the bottleneck in improving the system performance. In this paper, we propose an AI-based CSI feedback based on an auto-encoder architecture that encodes the CSI at UE into a low-dimensional latent space and decodes it back at the BS by effectively reducing the feedback overhead while minimizing the loss during recovery. Our simulation results show that the AI-based proposed architecture outperforms the state-of-the-art high-resolution linear combination codebook using the DFT basis adopted in the 5G New Radio (NR) system.

preprint2022arXiv

Scale Mixtures of Neural Network Gaussian Processes

Recent works have revealed that infinitely-wide feed-forward or recurrent neural networks of any architecture correspond to Gaussian processes referred to as Neural Network Gaussian Processes (NNGPs). While these works have extended the class of neural networks converging to Gaussian processes significantly, however, there has been little focus on broadening the class of stochastic processes that such neural networks converge to. In this work, inspired by the scale mixture of Gaussian random variables, we propose the scale mixture of NNGPs for which we introduce a prior distribution on the scale of the last-layer parameters. We show that simply introducing a scale prior on the last-layer parameters can turn infinitely-wide neural networks of any architecture into a richer class of stochastic processes. With certain scale priors, we obtain heavy-tailed stochastic processes, and in the case of inverse gamma priors, we recover Student's $t$ processes. We further analyze the distributions of the neural networks initialized with our prior setting and trained with gradient descents and obtain similar results as for NNGPs. We present a practical posterior-inference algorithm for the scale mixture of NNGPs and empirically demonstrate its usefulness on regression and classification tasks. In particular, we show that in both tasks, the heavy-tailed stochastic processes obtained from our framework are robust to out-of-distribution data.

preprint2022arXiv

Sequential Reptile: Inter-Task Gradient Alignment for Multilingual Learning

Multilingual models jointly pretrained on multiple languages have achieved remarkable performance on various multilingual downstream tasks. Moreover, models finetuned on a single monolingual downstream task have shown to generalize to unseen languages. In this paper, we first show that it is crucial for those tasks to align gradients between them in order to maximize knowledge transfer while minimizing negative transfer. Despite its importance, the existing methods for gradient alignment either have a completely different purpose, ignore inter-task alignment, or aim to solve continual learning problems in rather inefficient ways. As a result of the misaligned gradients between tasks, the model suffers from severe negative transfer in the form of catastrophic forgetting of the knowledge acquired from the pretraining. To overcome the limitations, we propose a simple yet effective method that can efficiently align gradients between tasks. Specifically, we perform each inner-optimization by sequentially sampling batches from all the tasks, followed by a Reptile outer update. Thanks to the gradients aligned between tasks by our method, the model becomes less vulnerable to negative transfer and catastrophic forgetting. We extensively validate our method on various multi-task learning and zero-shot cross-lingual transfer tasks, where our method largely outperforms all the relevant baselines we consider.

preprint2022arXiv

Set Based Stochastic Subsampling

Deep models are designed to operate on huge volumes of high dimensional data such as images. In order to reduce the volume of data these models must process, we propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an \textit{arbitrary} downstream task network (e.g. classifier). In the first stage, we efficiently subsample \textit{candidate elements} using conditionally independent Bernoulli random variables by capturing coarse grained global information using set encoding functions, followed by conditionally dependent autoregressive subsampling of the candidate elements using Categorical random variables by modeling pair-wise interactions using set attention networks in the second stage. We apply our method to feature and instance selection and show that it outperforms the relevant baselines under low subsampling rates on a variety of tasks including image classification, image reconstruction, function reconstruction and few-shot classification. Additionally, for nonparametric models such as Neural Processes that require to leverage the whole training data at inference time, we show that our method enhances the scalability of these models.

preprint2021arXiv

Improving Uncertainty Calibration via Prior Augmented Data

Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators. However, they are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions. The problem of overconfidence becomes especially apparent in cases where the test-time data distribution differs from that which was seen during training. We propose a solution to this problem by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels. Our method results in a better calibrated network and is agnostic to the underlying model structure, so it can be applied to any neural network which produces a probability density as an output. We demonstrate the effectiveness of our method and validate its performance on both classification and regression problems, applying it to recent probabilistic neural network models.

preprint2020arXiv

Cost-effective Interactive Attention Learning with Neural Attention Processes

We propose a novel interactive learning framework which we refer to as Interactive Attention Learning (IAL), in which the human supervisors interactively manipulate the allocated attentions, to correct the model's behavior by updating the attention-generating network. However, such a model is prone to overfitting due to scarcity of human annotations, and requires costly retraining. Moreover, it is almost infeasible for the human annotators to examine attentions on tons of instances and features. We tackle these challenges by proposing a sample-efficient attention mechanism and a cost-effective reranking algorithm for instances and features. First, we propose Neural Attention Process (NAP), which is an attention generator that can update its behavior by incorporating new attention-level supervisions without any retraining. Secondly, we propose an algorithm which prioritizes the instances and the features by their negative impacts, such that the model can yield large improvements with minimal human feedback. We validate IAL on various time-series datasets from multiple domains (healthcare, real-estate, and computer vision) on which it significantly outperforms baselines with conventional attention mechanisms, or without cost-effective reranking, with substantially less retraining and human-model interaction cost.

preprint2020arXiv

Quasi-Fermi level splitting in nanoscale junctions from $\textit{ab initio}$

The splitting of quasi-Fermi levels (QFLs) represents a key concept utilized to describe finite-bias operations of semiconductor devices, but its atomic-scale characterization remains a significant challenge. Herein, the non-equilibrium QFL or electrochemical potential profiles within single-molecule junctions obtained from the newly developed first-principles multi-space constrained-search density functional formalism are presented. Benchmarking the standard non-equilibrium Green's function calculation results, it is first established that algorithmically the notion of separate electrode-originated nonlocal QFLs should be maintained within the channel region during self-consistent finite-bias electronic structure calculations. For the insulating hexandithiolate junction, the QFL profiles exhibit discontinuities at the left and right electrode interfaces and across the molecule the accompanying electrostatic potential drops linearly and Landauer residual-resistivity dipoles are uniformly distributed. For the conducting hexatrienedithiolate junction, on the other hand, the electrode QFLs penetrate into the channel region and produce split QFLs. With the highest occupied molecular orbital entering the bias window and becoming a good transport channel, the split QFLs are also accompanied by the nonlinear electrostatic potential drop and asymmetric Landauer residua-resistivity dipole formation. Our findings underscore the importance of the first-principles extraction of QFLs in nanoscale junctions and point to a new direction for the computational design of next-generation electronic, optoelectronic, and electrochemical devices.

preprint2020arXiv

Sparse Vector Transmission: An Idea Whose Time Has Come

In recent years, we are witnessing bewildering variety of automated services and applications of vehicles, robots, sensors, and machines powered by the artificial intelligence technologies. Communication mechanism associated with these services is dearly distinct from human-centric communications. One important feature for the machine-centric communications is that the amount of information to be transmitted is tiny. In view of the short packet transmission, relying on today's transmission mechanism would not be efficient due to the waste of resources, large decoding latency, and expensive operational cost. In this article, we present an overview of the sparse vector transmission (SVT), a scheme to transmit a short-sized information after the sparse transformation. We discuss basics of SVT, two distinct SVT strategies, viz., frequency-domain sparse transmission and sparse vector coding with detailed operations, and also demonstrate the effectiveness in realistic wireless environments.

preprint2018arXiv

Semimetallicity and Negative Differential Resistance from Hybrid Halide Perovskite Nanowires

In the rapidly progressing field of organometal halide perovskites, the dimensional reduction could open up new opportunities for device applications. Herein, taking the recently synthesized trimethylsulfonium lead triiodide (CH$_3$)$_3$SPbI$_3$ perovskite as a representative example, we carry out first-principles calculations and study the nanostructuring and device application of halide perovskite nanowires. We find that the one-dimensional (1D) (CH$_3$)$_3$SPbI$_3$ structure is structurally stable, and the electronic structures of higher-dimensional forms are robustly determined at the 1D level. Remarkably, due to the face-sharing [PbI$_6$] octahedral atomic structure, the organic ligand-removed 1D PbI$_3$ frameworks are also found to be stable. Moreover, the PbI$_3$ columns avoid the Peierls distortion and assume a semimetallic character, contradicting the conventional assumption of semiconducting metal-halogen inorganic frameworks. Adopting the bundled nanowire junctions consisting of (CH$_3$)$_3$SPbI$_3$ channels with sub-5 nm dimensions sandwiched between PbI$_3$ electrodes, we finally obtain high current densities and large room-temperature negative differential resistance (NDR). It will be emphasized that the NDR originates from the combination of the near-Ohmic character of (CH$_3$)$_3$SPbI$_3$-PbI$_3$ contacts and a novel NDR mechanism that involves the quantum-mechanical hybridization between channel and electrode states. Our work demonstrates the great potential of low-dimensional hybrid perovskites toward advanced electronic devices beyond actively-pursued photonic applications.

preprint2018arXiv

Uncertainty-Aware Attention for Reliable Interpretation and Prediction

Attention mechanism is effective in both focusing the deep learning models on relevant features and interpreting them. However, attentions may be unreliable since the networks that generate them are often trained in a weakly-supervised manner. To overcome this limitation, we introduce the notion of input-dependent uncertainty to the attention mechanism, such that it generates attention for each feature with varying degrees of noise based on the given input, to learn larger variance on instances it is uncertain about. We learn this Uncertainty-aware Attention (UA) mechanism using variational inference, and validate it on various risk prediction tasks from electronic health records on which our model significantly outperforms existing attention models. The analysis of the learned attentions shows that our model generates attentions that comply with clinicians' interpretation, and provide richer interpretation via learned variance. Further evaluation of both the accuracy of the uncertainty calibration and the prediction performance with "I don't know" decision show that UA yields networks with high reliability as well.

preprint2015arXiv

A modified $P_1$ - immersed finite element method

In recent years, the immersed finite element methods (IFEM) introduced in \cite{Li2003}, \cite{Li2004} to solve elliptic problems having an interface in the domain due to the discontinuity of coefficients are getting more attentions of researchers because of their simplicity and efficiency. Unlike the conventional finite element methods, the IFEM allows the interface cut through the interior of the element, yet after the basis functions are altered so that they satisfy the flux jump conditions, it seems to show a reasonable order of convergence. In this paper, we propose an improved version of the $P_1$ based IFEM by adding the line integral of flux terms on each element. This technique resembles the discontinuous Galerkin (DG) method, however, our method has much less degrees of freedom than the DG methods since we use the same number of unknowns as the conventional $P_1$ finite element method. We prove $H^1$ and $L^2$ error estimates which are optimal both in order and regularity. Numerical experiments were carried out for several examples, which show the robustness of our scheme.

preprint2015arXiv

Bayesian Hierarchical Clustering with Exponential Family: Small-Variance Asymptotics and Reducibility

Bayesian hierarchical clustering (BHC) is an agglomerative clustering method, where a probabilistic model is defined and its marginal likelihoods are evaluated to decide which clusters to merge. While BHC provides a few advantages over traditional distance-based agglomerative clustering algorithms, successive evaluation of marginal likelihoods and careful hyperparameter tuning are cumbersome and limit the scalability. In this paper we relax BHC into a non-probabilistic formulation, exploring small-variance asymptotics in conjugate-exponential models. We develop a novel clustering algorithm, referred to as relaxed BHC (RBHC), from the asymptotic limit of the BHC model that exhibits the scalability of distance-based agglomerative clustering algorithms as well as the flexibility of Bayesian nonparametric models. We also investigate the reducibility of the dissimilarity measure emerged from the asymptotic limit of the BHC model, allowing us to use scalable algorithms such as the nearest neighbor chain algorithm. Numerical experiments on both synthetic and real-world datasets demonstrate the validity and high performance of our method.

preprint2015arXiv

Dynamical Mean Field Theory for Diatomic Molecules and the Exact Double Counting

Dynamical mean field theory (DMFT) combined with the local density approximation (LDA) is widely used in solids to predict properties of correlated systems. In this paper, its application to one of the simplest strongly correlated systems, the hydrogen molecule H$_2$, is demonstrated to develop a parameter-free LDA+DMFT framework. We propose a method to calculate the exact intersection of LDA and DMFT that leads to highly accurate subtraction of the doubly counted correlation in both methods. When the exact double-counting treatment and a good projector to the correlated subspace are used, LDA+DMFT yields very accurate total energy and excitation spectrum of the H$_2$ molecule. We also discuss how this double-counting scheme can be extended to solid state calculations.

preprint2015arXiv

Tree-Guided MCMC Inference for Normalized Random Measure Mixture Models

Normalized random measures (NRMs) provide a broad class of discrete random measures that are often used as priors for Bayesian nonparametric models. Dirichlet process is a well-known example of NRMs. Most of posterior inference methods for NRM mixture models rely on MCMC methods since they are easy to implement and their convergence is well studied. However, MCMC often suffers from slow convergence when the acceptance rate is low. Tree-based inference is an alternative deterministic posterior inference method, where Bayesian hierarchical clustering (BHC) or incremental Bayesian hierarchical clustering (IBHC) have been developed for DP or NRM mixture (NRMM) models, respectively. Although IBHC is a promising method for posterior inference for NRMM models due to its efficiency and applicability to online inference, its convergence is not guaranteed since it uses heuristics that simply selects the best solution after multiple trials are made. In this paper, we present a hybrid inference algorithm for NRMM models, which combines the merits of both MCMC and IBHC. Trees built by IBHC outlines partitions of data, which guides Metropolis-Hastings procedure to employ appropriate proposals. Inheriting the nature of MCMC, our tree-guided MCMC (tgMCMC) is guaranteed to converge, and enjoys the fast convergence thanks to the effective proposals guided by trees. Experiments on both synthetic and real-world datasets demonstrate the benefit of our method.

preprint2012arXiv

Network Massive MIMO for Cell-Boundary Users: From a Precoding Normalization Perspective

In this paper, we propose network massive multiple- input multiple-output (MIMO) systems, where three radio units (RUs) connected via one digital unit (DU) support multiple user equipments (UEs) at a cell-boundary through the same radio resource, i.e., the same frequency/time band. For precoding designs, zero-forcing (ZF) and matched filter (MF) with vector or matrix normalization are considered. We also derive the formulae of the lower and upper bounds of the achievable sum rate for each precoding. Based on our analytical results, we observe that vector normalization is better for ZF while matrix normalization is better for MF. Given antenna configurations, we also derive the optimal switching point as a function of the number of active users in a network. Numerical simulations confirm our analytical

Juho Lee

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Improving Ensemble Distillation With Weight Averaging and Diversifying Perturbation

Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Uncertainty

PolarDenseNet: A Deep Learning Model for CSI Feedback in MIMO Systems

Scale Mixtures of Neural Network Gaussian Processes

Sequential Reptile: Inter-Task Gradient Alignment for Multilingual Learning

Set Based Stochastic Subsampling

Improving Uncertainty Calibration via Prior Augmented Data

Cost-effective Interactive Attention Learning with Neural Attention Processes

Quasi-Fermi level splitting in nanoscale junctions from $\textit{ab initio}$

Sparse Vector Transmission: An Idea Whose Time Has Come

Semimetallicity and Negative Differential Resistance from Hybrid Halide Perovskite Nanowires

Uncertainty-Aware Attention for Reliable Interpretation and Prediction

A modified $P_1$ - immersed finite element method

Bayesian Hierarchical Clustering with Exponential Family: Small-Variance Asymptotics and Reducibility

Dynamical Mean Field Theory for Diatomic Molecules and the Exact Double Counting

Tree-Guided MCMC Inference for Normalized Random Measure Mixture Models

Network Massive MIMO for Cell-Boundary Users: From a Precoding Normalization Perspective