Source author record

Pavel Sountsov

Pavel Sountsov appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation Machine Learning Artificial Intelligence Computation and Language Programming Languages

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Focusing on Difficult Directions for Learning HMC Trajectory Lengths

Hamiltonian Monte Carlo (HMC) is a premier Markov Chain Monte Carlo (MCMC) algorithm for continuous target distributions. Its full potential can only be unleashed when its problem-dependent hyperparameters are tuned well. The adaptation of one such hyperparameter, trajectory length ($τ$), has been closely examined by many research programs with the No-U-Turn Sampler (NUTS) coming out as the preferred method in 2011. A decade later, the evolving hardware profile has lead to the proliferation of personal and cloud based SIMD hardware in the form of Graphics and Tensor Processing Units (GPUs, TPUs) which are hostile to certain algorithmic details of NUTS. This has opened up a hole in the MCMC toolkit for an algorithm that can learn $τ$ while maintaining good hardware utilization. In this work we build on recent advances along this direction and introduce SNAPER-HMC, a SIMD-accelerator-friendly adaptive-MCMC scheme for learning $τ$. The algorithm maximizes an upper bound on per-gradient effective sample size along an estimated principal component. We empirically show that SNAPER-HMC is stable when combined with mass-matrix adaptation, and is tolerant of certain pathological target distribution covariance spectra while providing excellent long and short run sampling efficiency. We provide a complete implementation for continuous multi-chain adaptive HMC combining trajectory learning with standard step-size and mass-matrix adaptation in one turnkey inference package.

preprint2022arXiv

MCMC Should Mix: Learning Energy-Based Model with Neural Transport Latent Space MCMC

Learning energy-based model (EBM) requires MCMC sampling of the learned model as an inner loop of the learning algorithm. However, MCMC sampling of EBMs in high-dimensional data space is generally not mixing, because the energy function, which is usually parametrized by a deep network, is highly multi-modal in the data space. This is a serious handicap for both theory and practice of EBMs. In this paper, we propose to learn an EBM with a flow-based model (or in general a latent variable model) serving as a backbone, so that the EBM is a correction or an exponential tilting of the flow-based model. We show that the model has a particularly simple form in the space of the latent variables of the backbone model, and MCMC sampling of the EBM in the latent space mixes well and traverses modes in the data space. This enables proper sampling and learning of EBMs.

preprint2020arXiv

Hamiltonian Monte Carlo Swindles

Hamiltonian Monte Carlo (HMC) is a powerful Markov chain Monte Carlo (MCMC) algorithm for estimating expectations with respect to continuous un-normalized probability distributions. MCMC estimators typically have higher variance than classical Monte Carlo with i.i.d. samples due to autocorrelations; most MCMC research tries to reduce these autocorrelations. In this work, we explore a complementary approach to variance reduction based on two classical Monte Carlo "swindles": first, running an auxiliary coupled chain targeting a tractable approximation to the target distribution, and using the auxiliary samples as control variates; and second, generating anti-correlated ("antithetic") samples by running two chains with flipped randomness. Both ideas have been explored previously in the context of Gibbs samplers and random-walk Metropolis algorithms, but we argue that they are ripe for adaptation to HMC in light of recent coupling results from the HMC theory literature. For many posterior distributions, we find that these swindles generate effective sample sizes orders of magnitude larger than plain HMC, as well as being more efficient than analogous swindles for Metropolis-adjusted Langevin algorithm and random-walk Metropolis.

preprint2020arXiv

tfp.mcmc: Modern Markov Chain Monte Carlo Tools Built for Modern Hardware

Markov chain Monte Carlo (MCMC) is widely regarded as one of the most important algorithms of the 20th century. Its guarantees of asymptotic convergence, stability, and estimator-variance bounds using only unnormalized probability functions make it indispensable to probabilistic programming. In this paper, we introduce the TensorFlow Probability MCMC toolkit, and discuss some of the considerations that motivated its design.

preprint2016arXiv

Length bias in Encoder Decoder Models and a Case for Global Conditioning

Encoder-decoder networks are popular for modeling sequences probabilistically in many applications. These models use the power of the Long Short-Term Memory (LSTM) architecture to capture the full dependence among variables, unlike earlier models like CRFs that typically assumed conditional independence among non-adjacent variables. However in practice encoder-decoder models exhibit a bias towards short sequences that surprisingly gets worse with increasing beam size. In this paper we show that such phenomenon is due to a discrepancy between the full sequence margin and the per-element margin enforced by the locally conditioned training objective of a encoder-decoder model. The discrepancy more adversely impacts long sequences, explaining the bias towards predicting short sequences. For the case where the predicted sequences come from a closed set, we show that a globally conditioned model alleviates the above problems of encoder-decoder models. From a practical point of view, our proposed model also eliminates the need for a beam-search during inference, which reduces to an efficient dot-product based search in a vector-space.