Source author record

Justin Domke

Justin Domke appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Computer Vision

Catalog footprint

What is connected

15works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Variational Inference with Locally Enhanced Bounds for Hierarchical Models

Hierarchical models represent a challenging setting for inference algorithms. MCMC methods struggle to scale to large models with many local variables and observations, and variational inference (VI) may fail to provide accurate approximations due to the use of simple variational families. Some variational methods (e.g. importance weighted VI) integrate Monte Carlo methods to give better accuracy, but these tend to be unsuitable for hierarchical models, as they do not allow for subsampling and their performance tends to degrade for high dimensional models. We propose a new family of variational bounds for hierarchical models, based on the application of tightening methods (e.g. importance weighting) separately for each group of local random variables. We show that our approach naturally allows the use of subsampling to get unbiased gradients, and that it fully leverages the power of methods that build tighter lower bounds by applying them independently in lower dimensional spaces, leading to better results and more accurate posterior approximations than relevant baselines.

preprint2022arXiv

Variational Marginal Particle Filters

Variational inference for state space models (SSMs) is known to be hard in general. Recent works focus on deriving variational objectives for SSMs from unbiased sequential Monte Carlo estimators. We reveal that the marginal particle filter is obtained from sequential Monte Carlo by applying Rao-Blackwellization operations, which sacrifices the trajectory information for reduced variance and differentiability. We propose the variational marginal particle filter (VMPF), which is a differentiable and reparameterizable variational filtering objective for SSMs based on an unbiased estimator. We find that VMPF with biased gradients gives tighter bounds than previous objectives, and the unbiased reparameterization gradients are sometimes beneficial.

preprint2021arXiv

An Easy to Interpret Diagnostic for Approximate Inference: Symmetric Divergence Over Simulations

It is important to estimate the errors of probabilistic inference algorithms. Existing diagnostics for Markov chain Monte Carlo methods assume inference is asymptotically exact, and are not appropriate for approximate methods like variational inference or Laplace's method. This paper introduces a diagnostic based on repeatedly simulating datasets from the prior and performing inference on each. The central observation is that it is possible to estimate a symmetric KL-divergence defined over these simulations.

preprint2020arXiv

Divide and Couple: Using Monte Carlo Variational Objectives for Posterior Approximation

Recent work in variational inference (VI) uses ideas from Monte Carlo estimation to tighten the lower bounds on the log-likelihood that are used as objectives. However, there is no systematic understanding of how optimizing different objectives relates to approximating the posterior distribution. Developing such a connection is important if the ideas are to be applied to inference-i.e., applications that require an approximate posterior and not just an approximation of the log-likelihood. Given a VI objective defined by a Monte Carlo estimator of the likelihood, we use a "divide and couple" procedure to identify augmented proposal and target distributions. The divergence between these is equal to the gap between the VI objective and the log-likelihood. Thus, after maximizing the VI objective, the augmented variational distribution may be used to approximate the posterior distribution.

preprint2020arXiv

Moment-Matching Conditions for Exponential Families with Conditioning or Hidden Data

Maximum likelihood learning with exponential families leads to moment-matching of the sufficient statistics, a classic result. This can be generalized to conditional exponential families and/or when there are hidden data. This document gives a first-principles explanation of these generalized moment-matching conditions, along with a self-contained derivation.

preprint2020arXiv

Provable Smoothness Guarantees for Black-Box Variational Inference

Black-box variational inference tries to approximate a complex target distribution though a gradient-based optimization of the parameters of a simpler distribution. Provable convergence guarantees require structural properties of the objective. This paper shows that for location-scale family approximations, if the target is M-Lipschitz smooth, then so is the objective, if the entropy is excluded. The key proof idea is to describe gradients in a certain inner-product space, thus permitting use of Bessel's inequality. This result gives insight into how to parameterize distributions, gives bounds the location of the optimal parameters, and is a key ingredient for convergence guarantees.

preprint2020arXiv

Thompson Sampling with Approximate Inference

We study the effects of approximate inference on the performance of Thompson sampling in the $k$-armed bandit problems. Thompson sampling is a successful algorithm for online decision-making but requires posterior inference, which often must be approximated in practice. We show that even small constant inference error (in $α$-divergence) can lead to poor performance (linear regret) due to under-exploration (for $α<1$) or over-exploration (for $α>0$) by the approximation. While for $α> 0$ this is unavoidable, for $α\leq 0$ the regret can be improved by adding a small amount of forced exploration even when the inference error is a large constant.

preprint2015arXiv

Clamping Improves TRW and Mean Field Approximations

We examine the effect of clamping variables for approximate inference in undirected graphical models with pairwise relationships and discrete variables. For any number of variable labels, we demonstrate that clamping and summing approximate sub-partition functions can lead only to a decrease in the partition function estimate for TRW, and an increase for the naive mean field method, in each case guaranteeing an improvement in the approximation and bound. We next focus on binary variables, add the Bethe approximation to consideration and examine ways to choose good variables to clamp, introducing new methods. We show the importance of identifying highly frustrated cycles, and of checking the singleton entropy of a variable. We explore the value of our methods by empirical analysis and draw lessons to guide practitioners.

preprint2015arXiv

Maximum Likelihood Learning With Arbitrary Treewidth via Fast-Mixing Parameter Sets

Inference is typically intractable in high-treewidth undirected graphical models, making maximum likelihood learning a challenge. One way to overcome this is to restrict parameters to a tractable set, most typically the set of tree-structured parameters. This paper explores an alternative notion of a tractable set, namely a set of "fast-mixing parameters" where Markov chain Monte Carlo (MCMC) inference can be guaranteed to quickly converge to the stationary distribution. While it is common in practice to approximate the likelihood gradient using samples obtained from MCMC, such procedures lack theoretical guarantees. This paper proves that for any exponential family with bounded sufficient statistics, (not just graphical models) when parameters are constrained to a fast-mixing set, gradient descent with gradients approximated by sampling will approximate the maximum likelihood solution inside the set with high-probability. When unregularized, to find a solution epsilon-accurate in log-likelihood requires a total amount of effort cubic in 1/epsilon, disregarding logarithmic factors. When ridge-regularized, strong convexity allows a solution epsilon-accurate in parameter distance with effort quadratic in 1/epsilon. Both of these provide of a fully-polynomial time randomized approximation scheme.

preprint2014arXiv

Finito: A Faster, Permutable Incremental Gradient Method for Big Data Problems

Recent advances in optimization theory have shown that smooth strongly convex finite sums can be minimized faster than by treating them as a black box "batch" problem. In this work we introduce a new method in this class with a theoretical convergence rate four times faster than existing methods, for sums with sufficiently many terms. This method is also amendable to a sampling without replacement scheme that in practice gives further speed-ups. We give empirical results showing state of the art performance.

preprint2014arXiv

Projecting Ising Model Parameters for Fast Mixing

Inference in general Ising models is difficult, due to high treewidth making tree-based algorithms intractable. Moreover, when interactions are strong, Gibbs sampling may take exponential time to converge to the stationary distribution. We present an algorithm to project Ising model parameters onto a parameter set that is guaranteed to be fast mixing, under several divergences. We find that Gibbs sampling using the projected parameters is more accurate than with the original parameters when interaction strengths are strong and when limited time is available for sampling.

preprint2014arXiv

Projecting Markov Random Field Parameters for Fast Mixing

Markov chain Monte Carlo (MCMC) algorithms are simple and extremely powerful techniques to sample from almost arbitrary distributions. The flaw in practice is that it can take a large and/or unknown amount of time to converge to the stationary distribution. This paper gives sufficient conditions to guarantee that univariate Gibbs sampling on Markov Random Fields (MRFs) will be fast mixing, in a precise sense. Further, an algorithm is given to project onto this set of fast-mixing parameters in the Euclidean norm. Following recent work, we give an example use of this to project in various divergence measures, comparing univariate marginals obtained by sampling after projection to common variational methods and Gibbs sampling on the original parameters.

preprint2014arXiv

Structured Learning via Logistic Regression

A successful approach to structured learning is to write the learning objective as a joint function of linear parameters and inference messages, and iterate between updates to each. This paper observes that if the inference problem is "smoothed" through the addition of entropy terms, for fixed messages, the learning objective reduces to a traditional (non-structured) logistic regression problem with respect to parameters. In these logistic regression problems, each training example has a bias term determined by the current set of messages. Based on this insight, the structured energy function can be extended from linear factors to any function class where an "oracle" exists to minimize a logistic loss.

preprint2013arXiv

Learning Graphical Model Parameters with Approximate Marginal Inference

Likelihood based-learning of graphical models faces challenges of computational-complexity and robustness to model mis-specification. This paper studies methods that fit parameters directly to maximize a measure of the accuracy of predicted marginals, taking into account both model and inference approximations at training time. Experiments on imaging problems suggest marginalization-based learning performs better than likelihood-based approximations on difficult problems where the model being fit is approximate in nature.

preprint2012arXiv

Learning Convex Inference of Marginals

Graphical models trained using maximum likelihood are a common tool for probabilistic inference of marginal distributions. However, this approach suffers difficulties when either the inference process or the model is approximate. In this paper, the inference process is first defined to be the minimization of a convex function, inspired by free energy approximations. Learning is then done directly in terms of the performance of the inference process at univariate marginal prediction. The main novelty is that this is a direct minimization of emperical risk, where the risk measures the accuracy of predicted marginals.

Justin Domke

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

Variational Inference with Locally Enhanced Bounds for Hierarchical Models

Variational Marginal Particle Filters

An Easy to Interpret Diagnostic for Approximate Inference: Symmetric Divergence Over Simulations

Divide and Couple: Using Monte Carlo Variational Objectives for Posterior Approximation

Moment-Matching Conditions for Exponential Families with Conditioning or Hidden Data

Provable Smoothness Guarantees for Black-Box Variational Inference

Thompson Sampling with Approximate Inference

Clamping Improves TRW and Mean Field Approximations

Maximum Likelihood Learning With Arbitrary Treewidth via Fast-Mixing Parameter Sets

Finito: A Faster, Permutable Incremental Gradient Method for Big Data Problems

Projecting Ising Model Parameters for Fast Mixing

Projecting Markov Random Field Parameters for Fast Mixing

Structured Learning via Logistic Regression

Learning Graphical Model Parameters with Approximate Marginal Inference

Learning Convex Inference of Marginals