Source author record

Jeffrey S. Rosenthal

Jeffrey S. Rosenthal appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.PR Computation Applications math.ST Methodology Statistics Theory astro-ph.GA

Catalog footprint

What is connected

18works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Weak convergence of adaptive Markov chain Monte Carlo

This article develops general conditions for weak convergence of adaptive Markov chain Monte Carlo processes and is shown to imply a weak law of large numbers for bounded Lipschitz continuous functions. This allows an estimation theory for adaptive Markov chain Monte Carlo where previously developed theory in total variation may fail or be difficult to establish. Extensions of weak convergence to general Wasserstein distances are established along with a weak law of large numbers for possibly unbounded Lipschitz functions. Applications are applied to auto-regressive processes in various settings, unadjusted Langevin processes, and adaptive Metropolis-Hastings.

preprint2022arXiv

Complexity Results for MCMC derived from Quantitative Bounds

This paper considers how to obtain MCMC quantitative convergence bounds which can be translated into tight complexity bounds in high-dimensional {settings}. We propose a modified drift-and-minorization approach, which establishes generalized drift conditions defined in subsets of the state space. The subsets are called the "large sets", and are chosen to rule out some "bad" states which have poor drift property when the dimension of the state space gets large. Using the "large sets" together with a "fitted family of drift functions", a quantitative bound can be obtained which can be translated into a tight complexity bound. As a demonstration, we analyze several Gibbs samplers and obtain complexity upper bounds for the mixing time. In particular, for one example of Gibbs sampler which is related to the James--Stein estimator, we show that the number of iterations required for the Gibbs sampler to converge is constant under certain conditions on the observed data and the initial state. It is our hope that this modified drift-and-minorization approach can be employed in many other specific examples to obtain complexity bounds for high-dimensional Markov chains.

preprint2022arXiv

Convergence rate bounds for iterative random functions using one-shot coupling

One-shot coupling is a method of bounding the convergence rate between two copies of a Markov chain in total variation distance, which was first introduced by Roberts and Rosenthal and generalized by Madras and Sezer. The method is divided into two parts: the contraction phase, when the chains converge in expected distance and the coalescing phase, which occurs at the last iteration, when there is an attempt to couple. One-shot coupling does not require the use of any exogenous variables like a drift function or a minorization constant. In this paper, we summarize the one-shot coupling method into the One-Shot Coupling Theorem. We then apply the theorem to two families of Markov chains: the random functional autoregressive process and the autoregressive conditional heteroscedastic (ARCH) process. We provide multiple examples of how the theorem can be used on various models including ones in high dimensions. These examples illustrate how the theorem's conditions can be verified in a straightforward way. The one-shot coupling method appears to generate tight geometric convergence rate bounds.

preprint2022arXiv

Dimension-free Mixing for High-dimensional Bayesian Variable Selection

Yang et al. (2016) proved that the symmetric random walk Metropolis--Hastings algorithm for Bayesian variable selection is rapidly mixing under mild high-dimensional assumptions. We propose a novel MCMC sampler using an informed proposal scheme, which we prove achieves a much faster mixing time that is independent of the number of covariates, under the same assumptions. To the best of our knowledge, this is the first high-dimensional result which rigorously shows that the mixing rate of informed MCMC methods can be fast enough to offset the computational cost of local posterior evaluation. Motivated by the theoretical analysis of our sampler, we further propose a new approach called "two-stage drift condition" to studying convergence rates of Markov chains on general state spaces, which can be useful for obtaining tight complexity bounds in high-dimensional settings. The practical advantages of our algorithm are illustrated by both simulation studies and real data analysis.

preprint2022arXiv

Optimal Strategies and Rules for the Game of Horse

We investigate the probability of scoring a point when playing the basketball shooting game called "Horse". We show that under the Traditional Rules, it is optimal to choose very easy shots. We propose alternative rules called Pops Rules, and show that they lead to more difficult optimal shots, and thus to a more interesting game.

preprint2021arXiv

Approximations of Geometrically Ergodic Reversible Markov Chains

A common tool in the practice of Markov Chain Monte Carlo is to use approximating transition kernels to speed up computation when the desired kernel is slow to evaluate or intractable. A limited set of quantitative tools exist to assess the relative accuracy and efficiency of such approximations. We derive a set of tools for such analysis based on the Hilbert space generated by the stationary distribution we intend to sample, $L_2(π)$. Our results apply to approximations of reversible chains which are geometrically ergodic, as is typically the case for applications to Markov Chain Monte Carlo. The focus of our work is on determining whether the approximating kernel will preserve the geometric ergodicity of the exact chain, and whether the approximating stationary distribution will be close to the original stationary distribution. For reversible chains, our results extend the results of Johndrow et al. [18] from the uniformly ergodic case to the geometrically ergodic case, under some additional regularity conditions. We then apply our results to a number of approximate MCMC algorithms.

preprint2021arXiv

Bayesian Inference of Globular Cluster Properties Using Distribution Functions

We present a Bayesian inference approach to estimating the cumulative mass profile and mean squared velocity profile of a globular cluster given the spatial and kinematic information of its stars. Mock globular clusters with a range of sizes and concentrations are generated from lowered isothermal dynamical models, from which we test the reliability of the Bayesian method to estimate model parameters through repeated statistical simulation. We find that given unbiased star samples, we are able to reconstruct the cluster parameters used to generate the mock cluster and the cluster's cumulative mass and mean velocity squared profiles with good accuracy. We further explore how strongly biased sampling, which could be the result of observing constraints, may affect this approach. Our tests indicate that if we instead have biased samples, then our estimates can be off in certain ways that are dependent on cluster morphology. Overall, our findings motivate obtaining samples of stars that are as unbiased as possible. This may be achieved by combining information from multiple telescopes (e.g., Hubble and Gaia), but will require careful modeling of the measurement uncertainties through a hierarchical model, which we plan to pursue in future work.

preprint2020arXiv

Optimal Scaling of Random-Walk Metropolis Algorithms on General Target Distributions

One main limitation of the existing optimal scaling results for Metropolis--Hastings algorithms is that the assumptions on the target distribution are unrealistic. In this paper, we consider optimal scaling of random-walk Metropolis algorithms on general target distributions in high dimensions arising from practical MCMC models from Bayesian statistics. For optimal scaling by maximizing expected squared jumping distance (ESJD), we show the asymptotically optimal acceptance rate $0.234$ can be obtained under general realistic sufficient conditions on the target distribution. The new sufficient conditions are easy to be verified and may hold for some general classes of MCMC models arising from Bayesian statistics applications, which substantially generalize the product i.i.d. condition required in most existing literature of optimal scaling. Furthermore, we show one-dimensional diffusion limits can be obtained under slightly stronger conditions, which still allow dependent coordinates of the target distribution. We also connect the new diffusion limit results to complexity bounds of Metropolis algorithms in high dimensions.

preprint2016arXiv

Hitting Time and Convergence Rate Bounds for Symmetric Langevin Diffusions

We provide quantitative bounds on the convergence to stationarity of real-valued Langevin diffusions with symmetric target densities.

preprint2015arXiv

Stability of adversarial Markov chains, with an application to adaptive MCMC algorithms

We consider whether ergodic Markov chains with bounded step size remain bounded in probability when their transitions are modified by an adversary on a bounded subset. We provide counterexamples to show that the answer is no in general, and prove theorems to show that the answer is yes under various additional assumptions. We then use our results to prove convergence of various adaptive Markov chain Monte Carlo algorithms.

preprint2014arXiv

Complexity Bounds for MCMC via Diffusion Limits

We connect known results about diffusion limits of Markov chain Monte Carlo (MCMC) algorithms to the Computer Science notion of algorithm complexity. Our main result states that any diffusion limit of a Markov process implies a corresponding complexity bound (in an appropriate metric). We then combine this result with previously-known MCMC diffusion limit results to prove that under appropriate assumptions, the Random-Walk Metropolis (RWM) algorithm in $d$ dimensions takes $O(d)$ iterations to converge to stationarity, while the Metropolis-Adjusted Langevin Algorithm (MALA) takes $O(d^{1/3})$ iterations to converge to stationarity.

preprint2014arXiv

Minimising MCMC variance via diffusion limits, with an application to simulated tempering

We derive new results comparing the asymptotic variance of diffusions by writing them as appropriate limits of discrete-time birth-death chains which themselves satisfy Peskun orderings. We then apply our results to simulated tempering algorithms to establish which choice of inverse temperatures minimises the asymptotic variance of all functionals and thus leads to the most efficient MCMC algorithm.

preprint2014arXiv

On the efficiency of pseudo-marginal random walk Metropolis algorithms

We examine the behaviour of the pseudo-marginal random walk Metropolis algorithm, where evaluations of the target density for the accept/reject probability are estimated rather than computed precisely. Under relatively general conditions on the target distribution, we obtain limiting formulae for the acceptance rate and for the expected squared jump distance, as the dimension of the target approaches infinity, under the assumption that the noise in the estimate of the log-target is additive and is independent of the position. For targets with independent and identically distributed components, we also obtain a limiting diffusion for the first component. We then consider the overall efficiency of the algorithm, in terms of both speed of mixing and computational time. Assuming the additive noise is Gaussian and is inversely proportional to the number of unbiased estimates that are used, we prove that the algorithm is optimally efficient when the variance of the noise is approximately 3.283 and the acceptance rate is approximately 7.001%. We also find that the optimal scaling is insensitive to the noise and that the optimal variance of the noise is insensitive to the scaling. The theory is illustrated with a simulation study using the particle marginal random walk Metropolis.

preprint2013arXiv

Adaptive Gibbs samplers and related MCMC methods

We consider various versions of adaptive Gibbs and Metropolis-within-Gibbs samplers, which update their selection probabilities (and perhaps also their proposal distributions) on the fly during a run by learning as they go in an attempt to optimize the algorithm. We present a cautionary example of how even a simple-seeming adaptive Gibbs sampler may fail to converge. We then present various positive results guaranteeing convergence of adaptive Gibbs samplers under certain conditions.

preprint2013arXiv

Convergence rate of Markov chain methods for genomic motif discovery

We analyze the convergence rate of a simplified version of a popular Gibbs sampling method used for statistical discovery of gene regulatory binding motifs in DNA sequences. This sampler satisfies a very strong form of ergodicity (uniform). However, we show that, due to multimodality of the posterior distribution, the rate of convergence often decreases exponentially as a function of the length of the DNA sequence. Specifically, we show that this occurs whenever there is more than one true repeating pattern in the data. In practice there are typically multiple such patterns in biological data, the goal being to detect the most well-conserved and frequently-occurring of these. Our findings match empirical results, in which the motif-discovery Gibbs sampler has exhibited such poor convergence that it is used only for finding modes of the posterior distribution (candidate motifs) rather than for obtaining samples from that distribution. Ours are some of the first meaningful bounds on the convergence rate of a Markov chain method for sampling from a multimodal posterior distribution, as a function of statistical quantities like the number of observations.

preprint2013arXiv

The Containment Condition and AdapFail algorithms

This short note investigates convergence of adaptive MCMC algorithms, i.e.\ algorithms which modify the Markov chain update probabilities on the fly. We focus on the Containment condition introduced in \cite{roberts2007coupling}. We show that if the Containment condition is \emph{not} satisfied, then the algorithm will perform very poorly. Specifically, with positive probability, the adaptive algorithm will be asymptotically less efficient then \emph{any} nonadaptive ergodic MCMC algorithm. We call such algorithms \texttt{AdapFail}, and conclude that they should not be used.

preprint2011arXiv

Detecting multiple authorship of United States Supreme Court legal decisions using function words

This paper uses statistical analysis of function words used in legal judgments written by United States Supreme Court justices, to determine which justices have the most variable writing style (which may indicated greater reliance on their law clerks when writing opinions), and also the extent to which different justices' writing styles are distinguishable from each other.

preprint2010arXiv

Adaptive Gibbs samplers

We consider various versions of adaptive Gibbs and Metropolis within-Gibbs samplers, which update their selection probabilities (and perhaps also their proposal distributions) on the fly during a run, by learning as they go in an attempt to optimise the algorithm. We present a cautionary example of how even a simple-seeming adaptive Gibbs sampler may fail to converge. We then present various positive results guaranteeing convergence of adaptive Gibbs samplers under certain conditions.

Jeffrey S. Rosenthal

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

Weak convergence of adaptive Markov chain Monte Carlo

Complexity Results for MCMC derived from Quantitative Bounds

Convergence rate bounds for iterative random functions using one-shot coupling

Dimension-free Mixing for High-dimensional Bayesian Variable Selection

Optimal Strategies and Rules for the Game of Horse

Approximations of Geometrically Ergodic Reversible Markov Chains

Bayesian Inference of Globular Cluster Properties Using Distribution Functions

Optimal Scaling of Random-Walk Metropolis Algorithms on General Target Distributions

Hitting Time and Convergence Rate Bounds for Symmetric Langevin Diffusions

Stability of adversarial Markov chains, with an application to adaptive MCMC algorithms

Complexity Bounds for MCMC via Diffusion Limits

Minimising MCMC variance via diffusion limits, with an application to simulated tempering

On the efficiency of pseudo-marginal random walk Metropolis algorithms

Adaptive Gibbs samplers and related MCMC methods

Convergence rate of Markov chain methods for genomic motif discovery

The Containment Condition and AdapFail algorithms

Detecting multiple authorship of United States Supreme Court legal decisions using function words

Adaptive Gibbs samplers