Source author record

Chang-Han Rhee

Chang-Han Rhee appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.PR Computation Machine Learning math.OC q-fin.CP

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Eliminating Sharp Minima from SGD with Truncated Heavy-tailed Noise

The empirical success of deep learning is often attributed to SGD's mysterious ability to avoid sharp local minima in the loss landscape, as sharp minima are known to lead to poor generalization. Recently, empirical evidence of heavy-tailed gradient noise was reported in many deep learning tasks, and it was shown in Şimşekli (2019a,b) that SGD can escape sharp local minima under the presence of such heavy-tailed gradient noise, providing a partial solution to the mystery. In this work, we analyze a popular variant of SGD where gradients are truncated above a fixed threshold. We show that it achieves a stronger notion of avoiding sharp minima: it can effectively eliminate sharp local minima entirely from its training trajectory. We characterize the dynamics of truncated SGD driven by heavy-tailed noises. First, we show that the truncation threshold and width of the attraction field dictate the order of the first exit time from the associated local minimum. Moreover, when the objective function satisfies appropriate structural conditions, we prove that as the learning rate decreases, the dynamics of heavy-tailed truncated SGD closely resemble those of a continuous-time Markov chain that never visits any sharp minima. Real data experiments on deep learning confirm our theoretical prediction that heavy-tailed SGD with gradient clipping finds a "flatter" local minima and achieves better generalization.

preprint2020arXiv

Efficient Rare-Event Simulation for Multiple Jump Events in Regularly Varying Lévy Processes with Infinite Activities

In this paper we address the problem of rare-event simulation for heavy-tailed Lévy processes with infinite activities. We propose a strongly efficient importance sampling algorithm that builds upon the sample path large deviations for heavy-tailed Lévy processes, stick-breaking approximation of extrema of Lévy processes, and the randomized debiasing Monte Carlo scheme. The proposed importance sampling algorithm can be applied to a broad class of Lévy processes and exhibits significant improvements in efficiency when compared to crude Monte-Carlo method in our numerical experiments.

preprint2016arXiv

Importance sampling of heavy-tailed iterated random functions

We consider a stochastic recurrence equation of the form $Z_{n+1} = A_{n+1} Z_n+B_{n+1}$, where $\mathbb{E}[\log A_1]<0$, $\mathbb{E}[\log^+ B_1]<\infty$ and $\{(A_n,B_n)\}_{n\in\mathbb{N}}$ is an i.i.d. sequence of positive random vectors. The stationary distribution of this Markov chain can be represented as the distribution of the random variable $Z \triangleq \sum_{n=0}^\infty B_{n+1}\prod_{k=1}^nA_k$. Such random variables can be found in the analysis of probabilistic algorithms or financial mathematics, where $Z$ would be called a stochastic perpetuity. If one interprets $-\log A_n$ as the interest rate at time $n$, then $Z$ is the present value of a bond that generates $B_n$ unit of money at each time point $n$. We are interested in estimating the probability of the rare event $\{Z>x\}$, when $x$ is large; we provide a consistent simulation estimator using state-dependent importance sampling for the case, where $\log A_1$ is heavy-tailed and the so-called Cramér condition is not satisfied. Our algorithm leads to an estimator for $P(Z>x)$. We show that under natural conditions, our estimator is strongly efficient. Furthermore, we extend our method to the case, where $\{Z_n\}_{n\in\mathbb{N}}$ is defined via the recursive formula $Z_{n+1}=Ψ_{n+1}(Z_n)$ and $\{Ψ_n\}_{n\in\mathbb{N}}$ is a sequence of i.i.d. random Lipschitz functions.

preprint2014arXiv

Exact Estimation for Markov Chain Equilibrium Expectations

We introduce a new class of Monte Carlo methods, which we call exact estimation algorithms. Such algorithms provide unbiased estimators for equilibrium expectations associated with real- valued functionals defined on a Markov chain. We provide easily implemented algorithms for the class of positive Harris recurrent Markov chains, and for chains that are contracting on average. We further argue that exact estimation in the Markov chain setting provides a significant theoretical relaxation relative to exact simulation methods.

preprint2012arXiv

A new approach to unbiased estimation for SDE's

In this paper, we introduce a new approach to constructing unbiased estimators when computing expectations of path functionals associated with stochastic differential equations (SDEs). Our randomization idea is closely related to multi-level Monte Carlo and provides a simple mechanism for constructing a finite variance unbiased estimator with "square root convergence rate" whenever one has available a scheme that produces strong error of order greater than 1/2 for the path functional under consideration.