Researcher profile

Michael C. H. Choi

Michael C. H. Choi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2022arXiv

Landscape modification meets spin systems: from torpid to rapid mixing, tunneling and annealing in the low-temperature regime

Given a target Gibbs distribution $π^0_β \propto e^{-β\mathcal{H}}$ to sample from in the low-temperature regime on $Σ_N := \{-1,+1\}^N$, in this paper we propose and analyze Metropolis dynamics that instead target an alternative distribution $π^{f}_{α,c,1/β} \propto e^{-\mathcal{H}^{f}_{α,c,1/β}}$, where $\mathcal{H}^{f}_{α,c,1/β}$ is a transformed Hamiltonian whose landscape is suitably modified and controlled by the parameters $f,α,c$ and $β$ and shares the same set of stationary points as $\mathcal{H}$. With appropriate tuning of these parameters, the major advantage of the proposed Metropolis dynamics on the modified landscape is that it enjoys an $\mathcal{O}(1)$ critical height while its stationary distribution $π^{f}_{α,c,1/β}$ maintains close proximity with the original target $π^0_β$ in the low-temperature. We prove rapid mixing and tunneling on the modified landscape with polynomial dependence on the system size $N$ and the inverse temperature $β$, while the original Metropolis dynamics mixes torpidly with exponential dependence on the critical height and $β$. In the setting of simulated annealing, we prove its long-time convergence under a power-law cooling schedule that is faster than the typical logarithmic cooling in the classical setup. We illustrate our results on a host of models including the Ising model on various deterministic and random graphs as well as Derrida's Random Energy Model. In these applications, the original dynamics mixes torpidly while the proposed dynamics on the modified landscape mixes rapidly with polynomial dependence on both $β$ and $N$ and find the approximate ground state provably in $\mathcal{O}(N^4)$ time. This paper highlights a novel use of the geometry and structure of the landscape to the design of accelerated samplers or optimizers.

preprint2020arXiv

An improved variant of simulated annealing that converges under fast cooling

Given a target function $U$ to minimize on a finite state space $\mathcal{X}$, a proposal chain with generator $Q$ and a cooling schedule $T(t)$ that depends on time $t$, in this paper we study two types of simulated annealing (SA) algorithms with generators $M_{1,t}(Q,U,T(t))$ and $M_{2,t}(Q,U,T(t))$ respectively. While $M_{1,t}$ is the classical SA algorithm, we introduce a simple and improved variant that we call $M_{2,t}$ which provably converges faster. When $T(t) > c_{M_2}/\log(t+1)$ follows the logarithmic cooling schedule, our proposed algorithm is strongly ergodic both in total variation and in relative entropy, and converges to the set of global minima, where $c_{M_2}$ is a constant that we explicitly identify. If $c_{M_1}$ is the optimal hill-climbing constant that appears in logarithmic cooling of $M_{1,t}$, we show that $c_{M_1} \geq c_{M_2}$ and give simple conditions under which $c_{M_1} > c_{M_2}$. Our proposed $M_{2,t}$ thus converges under a faster logarithmic cooling in this regime. The other situation that we investigate corresponds to $c_{M_1} > c_{M_2} = 0$, where we give a class of fast and non-logarithmic cooling schedule that works for $M_{2,t}$ (but not for $M_{1,t}$). In addition to these asymptotic convergence results, we compare and analyze finite-time behaviour between these two annealing algorithms as well. Finally, we present two algorithms to simulate $M_{2,t}$.

preprint2020arXiv

On the convergence of an improved and adaptive kinetic simulated annealing

Inspired by the work of [Fang et al.. An improved annealing method and its large-time behaviour. Stochastic Process. Appl. (1997), Volume 71 Issue 1 Page 55-74.], who propose an improved simulated annealing algorithm based on a variant of overdamped Langevin diffusion with state-dependent diffusion coefficient, we cast this idea in the kinetic setting and develop an improved kinetic simulated annealing (IKSA) method for minimizing a target function $U$. To analyze its convergence, we utilize the framework recently introduced by [Monmarché. Hypocoercivity in metastable settings and kinetic simulated annealing. Probab. Theory Related Fields (2018), Volume 172 Page 1215-1248.] for the case of kinetic simulated annealing (KSA). The core idea of IKSA rests on introducing a parameter $c > \inf U$, which de facto modifies the optimization landscape and clips the critical height in IKSA at a maximum of $c - \inf U$. Consequently IKSA enjoys improved convergence with faster logarithmic cooling than KSA. To tune the parameter $c$, we propose an adaptive method that we call IAKSA which utilizes the running minimum generated by the algorithm on the fly, thus avoiding the need to manually adjust $c$ for better performance. We present positive numerical results on some standard global optimization benchmark functions that verify the improved convergence of IAKSA over other Langevin-based annealing methods.

preprint2019arXiv

Analysis of non-reversible Markov chains via similarity orbit

In this paper, we develop an in-depth analysis of non-reversible Markov chains on denumerable state space from a similarity orbit perspective. In particular, we study the class of Markov chains whose transition kernel is in the similarity orbit of a normal transition kernel, such as the one of birth-death chains or reversible Markov chains. We start by identifying a set of sufficient conditions for a Markov chain to belong to the similarity orbit of a birth-death one. As by-products, we obtain a spectral representation in terms of non-self-adjoint resolutions of identity in the sense of Dunford [21] and offer a detailed analysis on the convergence rate, separation cutoff and ${\rm{L}}^2$-cutoff of this class of non-reversible Markov chains. We also look into the problem of estimating the integral functionals from discrete observations for this class. In the last part of this paper, we investigate a particular similarity orbit of reversible Markov kernels, that we call the pure birth orbit, and analyze various possibly non-reversible variants of classical birth-death processes in this orbit.

preprint2019arXiv

Entropy flow and De Bruijn's identity for a class of stochastic differential equations driven by fractional Brownian motion

Motivated by the classical De Bruijn's identity for the additive Gaussian noise channel, in this paper we consider a generalized setting where the channel is modelled via stochastic differential equations driven by fractional Brownian motion with Hurst parameter $H\in(0,1)$. We derive generalized De Bruijn's identity for Shannon entropy and Kullback-Leibler divergence by means of Itô's formula, and present two applications. In the first application we demonstrate its equivalence with Stein's identity for Gaussian distributions, while in the second application, we show that for $H \in (0,1/2]$, the entropy power is concave in time while for $H \in (1/2,1)$ it is convex in time when the initial distribution is Gaussian. Compared with the classical case of $H = 1/2$, the time parameter plays an interesting and significant role in the analysis of these quantities.

preprint2019arXiv

Metropolis-Hastings reversiblizations of non-reversible Markov chains

We study two types of Metropolis-Hastings (MH) reversiblizations for non-reversible Markov chains with Markov kernel $P$. While the first type is the classical Metropolised version of $P$, we introduce a new self-adjoint kernel which captures the opposite transition effect of the first type, that we call the second MH kernel. We investigate the spectral relationship between $P$ and the two MH kernels. Along the way, we state a version of Weyl's inequality for the spectral gap of $P$ (and hence its additive reversiblization), as well as an expansion of $P$. Both results are expressed in terms of the spectrum of the two MH kernels. In the spirit of \cite{Fill91} and \cite{Paulin15}, we define a new pseudo-spectral gap based on the two MH kernels, and show that the total variation distance from stationarity can be bounded by this gap. We give variance bounds of the Markov chain in terms of the proposed gap, and offer spectral bounds in metastability and Cheeger's inequality in terms of the two MH kernels by comparison of Dirichlet form and Peskun ordering.

preprint2019arXiv

On hitting time, mixing time and geometric interpretations of Metropolis-Hastings reversiblizations

Given a target distribution $μ$ and a proposal chain with generator $Q$ on a finite state space, in this paper we study two types of Metropolis-Hastings (MH) generator $M_1(Q,μ)$ and $M_2(Q,μ)$ in a continuous-time setting. While $M_1$ is the classical MH generator, we define a new generator $M_2$ that captures the opposite movement of $M_1$ and provide a comprehensive suite of comparison results ranging from hitting time and mixing time to asymptotic variance, large deviations and capacity, which demonstrate that $M_2$ enjoys superior mixing properties than $M_1$. To see that $M_1$ and $M_2$ are natural transformations, we offer an interesting geometric interpretation of $M_1$, $M_2$ and their convex combinations as $\ell^1$ minimizers between $Q$ and the set of $μ$-reversible generators, extending the results by Billera and Diaconis (2001). We provide two examples as illustrations. In the first one we give explicit spectral analysis of $M_1$ and $M_2$ for Metropolised independent sampling, while in the second example we prove a Laplace transform order of the fastest strong stationary time between birth-death $M_1$ and $M_2$.