Source author record

Matthew D. Hoffman

Matthew D. Hoffman appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computation Programming Languages Applications Computer Vision Distributed, Parallel, and Cluster Computing Information Theory math.IT Mathematical Software Sound

Catalog footprint

What is connected

12works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Lossy Compression with Gaussian Diffusion

We consider a novel lossy compression approach based on unconditional diffusion generative models, which we call DiffC. Unlike modern compression schemes which rely on transform coding and quantization to restrict the transmitted information, DiffC relies on the efficient communication of pixels corrupted by Gaussian noise. We implement a proof of concept and find that it works surprisingly well despite the lack of an encoder transform, outperforming the state-of-the-art generative compression method HiFiC on ImageNet 64x64. DiffC only uses a single model to encode and denoise corrupted pixels at arbitrary bitrates. The approach further provides support for progressive coding, that is, decoding from partial bit streams. We perform a rate-distortion analysis to gain a deeper understanding of its performance, providing analytical results for multivariate Gaussian data as well as theoretic bounds for general distributions. Furthermore, we prove that a flow-based reconstruction achieves a 3 dB gain over ancestral sampling at high bitrates.

preprint2020arXiv

Automatically Batching Control-Intensive Programs for Modern Accelerators

We present a general approach to batching arbitrary computations for accelerators such as GPUs. We show orders-of-magnitude speedups using our method on the No U-Turn Sampler (NUTS), a workhorse algorithm in Bayesian statistics. The central challenge of batching NUTS and other Markov chain Monte Carlo algorithms is data-dependent control flow and recursion. We overcome this by mechanically transforming a single-example implementation into a form that explicitly tracks the current program point for each batch member, and only steps forward those in the same place. We present two different batching algorithms: a simpler, previously published one that inherits recursion from the host Python, and a more complex, novel one that implemenents recursion directly and can batch across it. We implement these batching methods as a general program transformation on Python source. Both the batching system and the NUTS implementation presented here are available as part of the popular TensorFlow Probability software package.

preprint2020arXiv

Hamiltonian Monte Carlo Swindles

Hamiltonian Monte Carlo (HMC) is a powerful Markov chain Monte Carlo (MCMC) algorithm for estimating expectations with respect to continuous un-normalized probability distributions. MCMC estimators typically have higher variance than classical Monte Carlo with i.i.d. samples due to autocorrelations; most MCMC research tries to reduce these autocorrelations. In this work, we explore a complementary approach to variance reduction based on two classical Monte Carlo "swindles": first, running an auxiliary coupled chain targeting a tractable approximation to the target distribution, and using the auxiliary samples as control variates; and second, generating anti-correlated ("antithetic") samples by running two chains with flipped randomness. Both ideas have been explored previously in the context of Gibbs samplers and random-walk Metropolis algorithms, but we argue that they are ripe for adaptation to HMC in light of recent coupling results from the HMC theory literature. For many posterior distributions, we find that these swindles generate effective sample sizes orders of magnitude larger than plain HMC, as well as being more efficient than analogous swindles for Metropolis-adjusted Langevin algorithm and random-walk Metropolis.

preprint2020arXiv

tfp.mcmc: Modern Markov Chain Monte Carlo Tools Built for Modern Hardware

Markov chain Monte Carlo (MCMC) is widely regarded as one of the most important algorithms of the 20th century. Its guarantees of asymptotic convergence, stability, and estimator-variance bounds using only unnormalized probability functions make it indispensable to probabilistic programming. In this paper, we introduce the TensorFlow Probability MCMC toolkit, and discuss some of the considerations that motivated its design.

preprint2015arXiv

A trust-region method for stochastic variational inference with applications to streaming data

Stochastic variational inference allows for fast posterior inference in complex Bayesian models. However, the algorithm is prone to local optima which can make the quality of the posterior approximation sensitive to the choice of hyperparameters and initialization. We address this problem by replacing the natural gradient step of stochastic varitional inference with a trust-region update. We show that this leads to generally better results and reduced sensitivity to hyperparameters. We also describe a new strategy for variational inference on streaming data and show that here our trust-region method is crucial for getting good performance.

preprint2015arXiv

The Stan Math Library: Reverse-Mode Automatic Differentiation in C++

As computational challenges in optimization and statistical inference grow ever harder, algorithms that utilize derivatives are becoming increasingly more important. The implementation of the derivatives that make these algorithms so powerful, however, is a substantial user burden and the practicality of these algorithms depends critically on tools like automatic differentiation that remove the implementation burden entirely. The Stan Math Library is a C++, reverse-mode automatic differentiation library designed to be usable, extensive and extensible, efficient, scalable, stable, portable, and redistributable in order to facilitate the construction and utilization of such algorithms. Usability is achieved through a simple direct interface and a cleanly abstracted functional interface. The extensive built-in library includes functions for matrix operations, linear algebra, differential equation solving, and most common probability functions. Extensibility derives from a straightforward object-oriented framework for expressions, allowing users to easily create custom functions. Efficiency is achieved through a combination of custom memory management, subexpression caching, traits-based metaprogramming, and expression templates. Partial derivatives for compound functions are evaluated lazily for improved scalability. Stability is achieved by taking care with arithmetic precision in algebraic expressions and providing stable, compound functions where possible. For portability, the library is standards-compliant C++ (03) and has been tested for all major compilers for Windows, Mac OS X, and Linux.

preprint2014arXiv

A Generative Product-of-Filters Model of Audio

We propose the product-of-filters (PoF) model, a generative model that decomposes audio spectra as sparse linear combinations of "filters" in the log-spectral domain. PoF makes similar assumptions to those used in the classic homomorphic filtering approach to signal processing, but replaces hand-designed decompositions built of basic signal processing operations with a learned decomposition based on statistical inference. This paper formulates the PoF model and derives a mean-field method for posterior inference and a variational EM algorithm to estimate the model's free parameters. We demonstrate PoF's potential for audio processing on a bandwidth expansion task, and show that PoF can serve as an effective unsupervised feature extractor for a speaker identification task.

preprint2014arXiv

Beta Process Non-negative Matrix Factorization with Stochastic Structured Mean-Field Variational Inference

Beta process is the standard nonparametric Bayesian prior for latent factor model. In this paper, we derive a structured mean-field variational inference algorithm for a beta process non-negative matrix factorization (NMF) model with Poisson likelihood. Unlike the linear Gaussian model, which is well-studied in the nonparametric Bayesian literature, NMF model with beta process prior does not enjoy the conjugacy. We leverage the recently developed stochastic structured mean-field variational inference to relax the conjugacy constraint and restore the dependencies among the latent variables in the approximating variational distribution. Preliminary results on both synthetic and real examples demonstrate that the proposed inference algorithm can reasonably recover the hidden structure of the data.

preprint2014arXiv

Image Classification and Retrieval from User-Supplied Tags

This paper proposes direct learning of image classification from user-supplied tags, without filtering. Each tag is supplied by the user who shared the image online. Enormous numbers of these tags are freely available online, and they give insight about the image categories important to users and to image classification. Our approach is complementary to the conventional approach of manual annotation, which is extremely costly. We analyze of the Flickr 100 Million Image dataset, making several useful observations about the statistics of these tags. We introduce a large-scale robust classification algorithm, in order to handle the inherent noise in these tags, and a calibration procedure to better predict objective annotations. We show that freely available, user-supplied tags can obtain similar or superior results to large databases of costly manual annotations.

preprint2014arXiv

Structured Stochastic Variational Inference

Stochastic variational inference makes it possible to approximate posterior distributions induced by large datasets quickly using stochastic optimization. The algorithm relies on the use of fully factorized variational distributions. However, this "mean-field" independence approximation limits the fidelity of the posterior approximation, and introduces local optima. We show how to relax the mean-field approximation to allow arbitrary dependencies between global parameters and local hidden variables, producing better parameter estimates by reducing bias, sensitivity to local optima, and sensitivity to hyperparameters.

preprint2011arXiv

The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo

Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo (MCMC) algorithm that avoids the random walk behavior and sensitivity to correlated parameters that plague many MCMC methods by taking a series of steps informed by first-order gradient information. These features allow it to converge to high-dimensional target distributions much more quickly than simpler methods such as random walk Metropolis or Gibbs sampling. However, HMC's performance is highly sensitive to two user-specified parameters: a step size ε and a desired number of steps L. In particular, if L is too small then the algorithm exhibits undesirable random walk behavior, while if L is too large the algorithm wastes computation. We introduce the No-U-Turn Sampler (NUTS), an extension to HMC that eliminates the need to set a number of steps L. NUTS uses a recursive algorithm to build a set of likely candidate points that spans a wide swath of the target distribution, stopping automatically when it starts to double back and retrace its steps. Empirically, NUTS perform at least as efficiently as and sometimes more efficiently than a well tuned standard HMC method, without requiring user intervention or costly tuning runs. We also derive a method for adapting the step size parameter ε on the fly based on primal-dual averaging. NUTS can thus be used with no hand-tuning at all. NUTS is also suitable for applications such as BUGS-style automatic inference engines that require efficient "turnkey" sampling algorithms.

preprint2010arXiv

Approximate Maximum A Posteriori Inference with Entropic Priors

In certain applications it is useful to fit multinomial distributions to observed data with a penalty term that encourages sparsity. For example, in probabilistic latent audio source decomposition one may wish to encode the assumption that only a few latent sources are active at any given time. The standard heuristic of applying an L1 penalty is not an option when fitting the parameters to a multinomial distribution, which are constrained to sum to 1. An alternative is to use a penalty term that encourages low-entropy solutions, which corresponds to maximum a posteriori (MAP) parameter estimation with an entropic prior. The lack of conjugacy between the entropic prior and the multinomial distribution complicates this approach. In this report I propose a simple iterative algorithm for MAP estimation of multinomial distributions with sparsity-inducing entropic priors.

Matthew D. Hoffman

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Lossy Compression with Gaussian Diffusion

Automatically Batching Control-Intensive Programs for Modern Accelerators

Hamiltonian Monte Carlo Swindles

tfp.mcmc: Modern Markov Chain Monte Carlo Tools Built for Modern Hardware

A trust-region method for stochastic variational inference with applications to streaming data

The Stan Math Library: Reverse-Mode Automatic Differentiation in C++

A Generative Product-of-Filters Model of Audio

Beta Process Non-negative Matrix Factorization with Stochastic Structured Mean-Field Variational Inference

Image Classification and Retrieval from User-Supplied Tags

Structured Stochastic Variational Inference

The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo

Approximate Maximum A Posteriori Inference with Entropic Priors