Source author record

Tamara Broderick

Tamara Broderick appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Machine Learning math.ST Statistics Theory Computation Applications Artificial Intelligence Human-Computer Interaction math.PR astro-ph.HE astro-ph.IM cs.CY Distributed, Parallel, and Cluster Computing physics.ao-ph

Catalog footprint

What is connected

35works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Gaussian processes at the Helm(holtz): A more fluid model for ocean currents

Given sparse observations of buoy velocities, oceanographers are interested in reconstructing ocean currents away from the buoys and identifying divergences in a current vector field. As a first and modular step, we focus on the time-stationary case - for instance, by restricting to short time periods. Since we expect current velocity to be a continuous but highly non-linear function of spatial location, Gaussian processes (GPs) offer an attractive model. But we show that applying a GP with a standard stationary kernel directly to buoy data can struggle at both current reconstruction and divergence identification, due to some physically unrealistic prior assumptions. To better reflect known physical properties of currents, we propose to instead put a standard stationary kernel on the divergence and curl-free components of a vector field obtained through a Helmholtz decomposition. We show that, because this decomposition relates to the original vector field just via mixed partial derivatives, we can still perform inference given the original data with only a small constant multiple of additional computational expense. We illustrate the benefits of our method with theory and experiments on synthetic and real ocean data.

preprint2022arXiv

A Performance Evaluation of Nomon: A Flexible Interface for Noisy Single-Switch Users

Some individuals with motor impairments communicate using a single switch -- such as a button click, air puff, or blink. Row-column scanning provides a method for choosing items arranged in a grid using a single switch. An alternative, Nomon, allows potential selections to be arranged arbitrarily rather than requiring a grid (as desired for gaming, drawing, etc.) -- and provides an alternative probabilistic selection method. While past results suggest that Nomon may be faster and easier to use than row-column scanning, no work has yet quantified performance of the two methods over longer time periods or in tasks beyond writing. In this paper, we also develop and validate a webcam-based switch that allows a user without a motor impairment to approximate the response times of a motor-impaired single switch user; although the approximation is not a replacement for testing with single-switch users, it allows us to better initialize, calibrate, and evaluate our method. Over 10 sessions with the webcam switch, we found users typed faster and more easily with Nomon than with row-column scanning. The benefits of Nomon were even more pronounced in a picture-selection task. Evaluation and feedback from a motor-impaired switch user further supports the promise of Nomon.

preprint2022arXiv

Developing a Series of AI Challenges for the United States Department of the Air Force

Through a series of federal initiatives and orders, the U.S. Government has been making a concerted effort to ensure American leadership in AI. These broad strategy documents have influenced organizations such as the United States Department of the Air Force (DAF). The DAF-MIT AI Accelerator is an initiative between the DAF and MIT to bridge the gap between AI researchers and DAF mission requirements. Several projects supported by the DAF-MIT AI Accelerator are developing public challenge problems that address numerous Federal AI research priorities. These challenges target priorities by making large, AI-ready datasets publicly available, incentivizing open-source solutions, and creating a demand signal for dual use technologies that can stimulate further research. In this article, we describe these public challenges being developed and how their application contributes to scientific advances.

preprint2022arXiv

Evaluating Sensitivity to the Stick-Breaking Prior in Bayesian Nonparametrics

Bayesian models based on the Dirichlet process and other stick-breaking priors have been proposed as core ingredients for clustering, topic modeling, and other unsupervised learning tasks. However, due to the flexibility of these models, the consequences of prior choices can be opaque. And so prior specification can be relatively difficult. At the same time, prior choice can have a substantial effect on posterior inferences. Thus, considerations of robustness need to go hand in hand with nonparametric modeling. In the current paper, we tackle this challenge by exploiting the fact that variational Bayesian methods, in addition to having computational advantages in fitting complex nonparametric models, also yield sensitivities with respect to parametric and nonparametric aspects of Bayesian models. In particular, we demonstrate how to assess the sensitivity of conclusions to the choice of concentration parameter and stick-breaking distribution for inferences under Dirichlet process mixtures and related mixture models. We provide both theoretical and empirical support for our variational approach to Bayesian sensitivity analysis.

preprint2022arXiv

Local Exchangeability

Exchangeability -- in which the distribution of an infinite sequence is invariant to reorderings of its elements -- implies the existence of a simple conditional independence structure that may be leveraged in the design of statistical models and inference procedures. In this work, we study a relaxation of exchangeability in which this invariance need not hold precisely. We introduce the notion of local exchangeability -- where swapping data associated with nearby covariates causes a bounded change in the distribution. We prove that locally exchangeable processes correspond to independent observations from an underlying measure-valued stochastic process. Using this main probabilistic result, we show that the local empirical measure of a finite collection of observations provides an approximation of the underlying measure-valued process and Bayesian posterior predictive distributions. The paper concludes with applications of the main theoretical results to a model from Bayesian nonparametrics and covariate-dependent permutation tests.

preprint2022arXiv

Many processors, little time: MCMC for partitions via optimal transport couplings

Markov chain Monte Carlo (MCMC) methods are often used in clustering since they guarantee asymptotically exact expectations in the infinite-time limit. In finite time, though, slow mixing often leads to poor performance. Modern computing environments offer massive parallelism, but naive implementations of parallel MCMC can exhibit substantial bias. In MCMC samplers of continuous random variables, Markov chain couplings can overcome bias. But these approaches depend crucially on paired chains meetings after a small number of transitions. We show that straightforward applications of existing coupling ideas to discrete clustering variables fail to meet quickly. This failure arises from the "label-switching problem": semantically equivalent cluster relabelings impede fast meeting of coupled chains. We instead consider chains as exploring the space of partitions rather than partitions' (arbitrary) labelings. Using a metric on the partition space, we formulate a practical algorithm using optimal transport couplings. Our theory confirms our method is accurate and efficient. In experiments ranging from clustering of genes or seeds to graph colorings, we show the benefits of our coupling in the highly parallel, time-limited regime.

preprint2022arXiv

Measuring the robustness of Gaussian processes to kernel choice

Gaussian processes (GPs) are used to make medical and scientific decisions, including in cardiac care and monitoring of atmospheric carbon dioxide levels. Notably, the choice of GP kernel is often somewhat arbitrary. In particular, uncountably many kernels typically align with qualitative prior knowledge (e.g.\ function smoothness or stationarity). But in practice, data analysts choose among a handful of convenient standard kernels (e.g.\ squared exponential). In the present work, we ask: Would decisions made with a GP differ under other, qualitatively interchangeable kernels? We show how to answer this question by solving a constrained optimization problem over a finite-dimensional space. We can then use standard optimizers to identify substantive changes in relevant decisions made with a GP. We demonstrate in both synthetic and real-world examples that decisions made with a GP can exhibit non-robustness to kernel choice, even when prior draws are qualitatively interchangeable to a user.

preprint2021arXiv

More for less: Predicting and maximizing genetic variant discovery via Bayesian nonparametrics

While the cost of sequencing genomes has decreased dramatically in recent years, this expense often remains non-trivial. Under a fixed budget, then, scientists face a natural trade-off between quantity and quality; they can spend resources to sequence a greater number of genomes (quantity) or spend resources to sequence genomes with increased accuracy (quality). Our goal is to find the optimal allocation of resources between quantity and quality. Optimizing resource allocation promises to reveal as many new variations in the genome as possible, and thus as many new scientific insights as possible. In this paper, we consider the common setting where scientists have already conducted a pilot study to reveal variants in a genome and are contemplating a follow-up study. We introduce a Bayesian nonparametric methodology to predict the number of new variants in the follow-up study based on the pilot study. When experimental conditions are kept constant between the pilot and follow-up, we demonstrate on real data from the gnomAD project that our prediction is more accurate than three recent proposals, and competitive with a more classic proposal. Unlike existing methods, though, our method allows practitioners to change experimental conditions between the pilot and the follow-up. We demonstrate how this distinction allows our method to be used for (i) more realistic predictions and (ii) optimal allocation of a fixed budget between quality and quantity.

preprint2020arXiv

A Swiss Army Infinitesimal Jackknife

The error or variability of machine learning algorithms is often assessed by repeatedly re-fitting a model with different weighted versions of the observed data. The ubiquitous tools of cross-validation (CV) and the bootstrap are examples of this technique. These methods are powerful in large part due to their model agnosticism but can be slow to run on modern, large data sets due to the need to repeatedly re-fit the model. In this work, we use a linear approximation to the dependence of the fitting procedure on the weights, producing results that can be faster than repeated re-fitting by an order of magnitude. This linear approximation is sometimes known as the "infinitesimal jackknife" in the statistics literature, where it is mostly used as a theoretical tool to prove asymptotic results. We provide explicit finite-sample error bounds for the infinitesimal jackknife in terms of a small number of simple, verifiable assumptions. Our results apply whether the weights and data are stochastic or deterministic, and so can be used as a tool for proving the accuracy of the infinitesimal jackknife on a wide variety of problems. As a corollary, we state mild regularity conditions under which our approximation consistently estimates true leave-$k$-out cross-validation for any fixed $k$. These theoretical results, together with modern automatic differentiation software, support the application of the infinitesimal jackknife to a wide variety of practical problems in machine learning, providing a "Swiss Army infinitesimal jackknife". We demonstrate the accuracy of our methods on a range of simulated and real datasets.

preprint2020arXiv

Approximate Cross-Validation in High Dimensions with Guarantees

Leave-one-out cross-validation (LOOCV) can be particularly accurate among cross-validation (CV) variants for machine learning assessment tasks -- e.g., assessing methods' error or variability. But it is expensive to re-fit a model $N$ times for a dataset of size $N$. Previous work has shown that approximations to LOOCV can be both fast and accurate -- when the unknown parameter is of small, fixed dimension. But these approximations incur a running time roughly cubic in dimension -- and we show that, besides computational issues, their accuracy dramatically deteriorates in high dimensions. Authors have suggested many potential and seemingly intuitive solutions, but these methods have not yet been systematically evaluated or compared. We find that all but one perform so poorly as to be unusable for approximating LOOCV. Crucially, though, we are able to show, both empirically and theoretically, that one approximation can perform well in high dimensions -- in cases where the high-dimensional parameter exhibits sparsity. Under interpretable assumptions, our theory demonstrates that the problem can be reduced to working within an empirically recovered (small) support. This procedure is straightforward to implement, and we prove that its running time and error depend on the (small) support size even when the full parameter dimension is large.

preprint2020arXiv

Validated Variational Inference via Practical Posterior Error Bounds

Variational inference has become an increasingly attractive fast alternative to Markov chain Monte Carlo methods for approximate Bayesian inference. However, a major obstacle to the widespread use of variational methods is the lack of post-hoc accuracy measures that are both theoretically justified and computationally efficient. In this paper, we provide rigorous bounds on the error of posterior mean and uncertainty estimates that arise from full-distribution approximations, as in variational inference. Our bounds are widely applicable, as they require only that the approximating and exact posteriors have polynomial moments. Our bounds are also computationally efficient for variational inference because they require only standard values from variational objectives, straightforward analytic calculations, and simple Monte Carlo estimates. We show that our analysis naturally leads to a new and improved workflow for validated variational inference. Finally, we demonstrate the utility of our proposed workflow and error bounds on a robust regression problem and on a real-data example with a widely used multilevel hierarchical model.

preprint2019arXiv

Truncated Random Measures

Completely random measures (CRMs) and their normalizations are a rich source of Bayesian nonparametric priors. Examples include the beta, gamma, and Dirichlet processes. In this paper we detail two major classes of sequential CRM representations---series representations and superposition representations---within which we organize both novel and existing sequential representations that can be used for simulation and posterior inference. These two classes and their constituent representations subsume existing ones that have previously been developed in an ad hoc manner for specific processes. Since a complete infinite-dimensional CRM cannot be used explicitly for computation, sequential representations are often truncated for tractability. We provide truncation error analyses for each type of sequential representation, as well as their normalized versions, thereby generalizing and improving upon existing truncation error bounds in the literature. We analyze the computational complexity of the sequential representations, which in conjunction with our error bounds allows us to directly compare representations and discuss their relative efficiency. We include numerous applications of our theoretical results to commonly-used (normalized) CRMs, demonstrating that our results enable a straightforward representation and analysis of CRMs that has not previously been available in a Bayesian nonparametric context.

preprint2018arXiv

Exchangeable Trait Allocations

Trait allocations are a class of combinatorial structures in which data may belong to multiple groups and may have different levels of belonging in each group. Often the data are also exchangeable, i.e., their joint distribution is invariant to reordering. In clustering---a special case of trait allocation---exchangeability implies the existence of both a de Finetti representation and an exchangeable partition probability function (EPPF), distributional representations useful for computational and theoretical purposes. In this work, we develop the analogous de Finetti representation and exchangeable trait probability function (ETPF) for trait allocations, along with a characterization of all trait allocations with an ETPF. Unlike previous feature allocation characterizations, our proofs fully capture single-occurrence "dust" groups. We further introduce a novel constrained version of the ETPF that we use to establish an intuitive connection between the probability functions for clustering, feature allocations, and trait allocations. As an application of our general theory, we characterize the distribution of all edge-exchangeable graphs, a class of recently-developed models that captures realistic sparse graph sequences.

preprint2016arXiv

Completely random measures for modeling power laws in sparse graphs

Network data appear in a number of applications, such as online social networks and biological networks, and there is growing interest in both developing models for networks as well as studying the properties of such data. Since individual network datasets continue to grow in size, it is necessary to develop models that accurately represent the real-life scaling properties of networks. One behavior of interest is having a power law in the degree distribution. However, other types of power laws that have been observed empirically and considered for applications such as clustering and feature allocation models have not been studied as frequently in models for graph data. In this paper, we enumerate desirable asymptotic behavior that may be of interest for modeling graph data, including sparsity and several types of power laws. We outline a general framework for graph generative models using completely random measures; by contrast to the pioneering work of Caron and Fox (2015), we consider instantiating more of the existing atoms of the random measure as the dataset size increases rather than adding new atoms to the measure. We see that these two models can be complementary; they respectively yield interpretations as (1) time passing among existing members of a network and (2) new individuals joining a network. We detail a particular instance of this framework and show simulated results that suggest this model exhibits some desirable asymptotic power-law behavior.

preprint2016arXiv

Edge-exchangeable graphs and sparsity

A known failing of many popular random graph models is that the Aldous-Hoover Theorem guarantees these graphs are dense with probability one; that is, the number of edges grows quadratically with the number of nodes. This behavior is considered unrealistic in observed graphs. We define a notion of edge exchangeability for random graphs in contrast to the established notion of infinite exchangeability for random graphs --- which has traditionally relied on exchangeability of nodes (rather than edges) in a graph. We show that, unlike node exchangeability, edge exchangeability encompasses models that are known to provide a projective sequence of random graphs that circumvent the Aldous-Hoover Theorem and exhibit sparsity, i.e., sub-quadratic growth of the number of edges with the number of nodes. We show how edge-exchangeability of graphs relates naturally to existing notions of exchangeability from clustering (a.k.a. partitions) and other familiar combinatorial structures.

preprint2016arXiv

Fast Measurements of Robustness to Changing Priors in Variational Bayes

In Bayesian analysis, the posterior follows from the data and a choice of a prior and a likelihood. One hopes that the posterior is robust to reasonable variation in the choice of prior, since this choice is made by the modeler and is often somewhat subjective. A different, equally subjectively plausible choice of prior may result in a substantially different posterior, and so different conclusions drawn from the data. Were this to be the case, our conclusions would not be robust to the choice of prior. To determine whether our model is robust, we must quantify how sensitive our posterior is to perturbations of our prior. Despite the importance of the problem and a considerable body of literature, generic, easy-to-use methods to quantify Bayesian robustness are still lacking. Abstract In this paper, we demonstrate that powerful measures of robustness can be easily calculated from Variational Bayes (VB) approximate posteriors. We begin with local robustness, which measures the effect of infinitesimal changes to the prior on a posterior mean of interest. In particular, we show that the influence function of Gustafson (2012) has a simple, easy-to-calculate closed form expression for VB approximations. We then demonstrate how local robustness measures can be inadequate for non-local prior changes, such as replacing one prior entirely with another. We propose a simple approximate non-local robustness measure and demonstrate its effectiveness on a simulated data set.

preprint2016arXiv

Fast robustness quantification with variational Bayes

Bayesian hierarchical models are increasing popular in economics. When using hierarchical models, it is useful not only to calculate posterior expectations, but also to measure the robustness of these expectations to reasonable alternative prior choices. We use variational Bayes and linear response methods to provide fast, accurate posterior means and robustness measures with an application to measuring the effectiveness of microcredit in the developing world.

preprint2016arXiv

Posteriors, conjugacy, and exponential families for completely random measures

We demonstrate how to calculate posteriors for general CRM-based priors and likelihoods for Bayesian nonparametric models. We further show how to represent Bayesian nonparametric priors as a sequence of finite draws using a size-biasing approach---and how to represent full Bayesian nonparametric models via finite marginals. Motivated by conjugate priors based on exponential family representations of likelihoods, we introduce a notion of exponential families for CRMs, which we call exponential CRMs. This construction allows us to specify automatic Bayesian nonparametric conjugate priors for exponential CRM likelihoods. We demonstrate that our exponential CRMs allow particularly straightforward recipes for size-biased and marginal representations of Bayesian nonparametric models. Along the way, we prove that the gamma process is a conjugate prior for the Poisson likelihood process and the beta prime process is a conjugate prior for a process we call the odds Bernoulli process. We deliver a size-biased representation of the gamma process and a marginal representation of the gamma process coupled with a Poisson likelihood process.

preprint2015arXiv

A translation of "The characteristic function of a random phenomenon" by Bruno de Finetti

This article is a translation of Bruno de Finetti's paper "Funzione Caratteristica di un fenomeno aleatorio" which appeared in Atti del Congresso Internazionale dei Matematici, Bologna 3-10 Settembre 1928, Tomo VI, pp. 179-190, originally published by Nicola Zanichelli Editore S.p.A. The translation was made as close as possible to the original in form and style, except for apparent mistakes found in the original document, which were corrected and are mentioned as footnotes. Most of these were resolved by comparing against a longer version of this work by de Finetti, published shortly after this one under the same titlea. The interested reader is highly encouraged to consult this other version for a more detailed treatment of the topics covered here. Footnotes regarding the translation are labeled with letters to distinguish them from de Finetti's original footnotes.

preprint2015arXiv

Covariance Matrices and Influence Scores for Mean Field Variational Bayes

Mean field variational Bayes (MFVB) is a popular posterior approximation method due to its fast runtime on large-scale data sets. However, it is well known that a major failing of MFVB is that it underestimates the uncertainty of model variables (sometimes severely) and provides no information about model variable covariance. We develop a fast, general methodology for exponential families that augments MFVB to deliver accurate uncertainty estimates for model variables -- both for individual variables and coherently across variables. MFVB for exponential families defines a fixed-point equation in the means of the approximating posterior, and our approach yields a covariance estimate by perturbing this fixed point. Inspired by linear response theory, we call our method linear response variational Bayes (LRVB). We also show how LRVB can be used to quickly calculate a measure of the influence of individual data points on parameter point estimates. We demonstrate the accuracy and scalability of our method by learning Gaussian mixture models for both simulated and real data.

preprint2015arXiv

Robust Inference with Variational Bayes

In Bayesian analysis, the posterior follows from the data and a choice of a prior and a likelihood. One hopes that the posterior is robust to reasonable variation in the choice of prior and likelihood, since this choice is made by the modeler and is necessarily somewhat subjective. Despite the fundamental importance of the problem and a considerable body of literature, the tools of robust Bayes are not commonly used in practice. This is in large part due to the difficulty of calculating robustness measures from MCMC draws. Although methods for computing robustness measures from MCMC draws exist, they lack generality and often require additional coding or computation. In contrast to MCMC, variational Bayes (VB) techniques are readily amenable to robustness analysis. The derivative of a posterior expectation with respect to a prior or data perturbation is a measure of local robustness to the prior or likelihood. Because VB casts posterior inference as an optimization problem, its methodology is built on the ability to calculate derivatives of posterior quantities with respect to model parameters, even in very complex models. In the present work, we develop local prior robustness measures for mean-field variational Bayes(MFVB), a VB technique which imposes a particular factorization assumption on the variational posterior approximation. We start by outlining existing local prior measures of robustness. Next, we use these results to derive closed-form measures of the sensitivity of mean-field variational posterior approximation to prior specification. We demonstrate our method on a meta-analysis of randomized controlled interventions in access to microcredit in developing countries.

preprint2014arXiv

Covariance Matrices for Mean Field Variational Bayes

Mean Field Variational Bayes (MFVB) is a popular posterior approximation method due to its fast runtime on large-scale data sets. However, it is well known that a major failing of MFVB is its (sometimes severe) underestimates of the uncertainty of model variables and lack of information about model variable covariance. We develop a fast, general methodology for exponential families that augments MFVB to deliver accurate uncertainty estimates for model variables -- both for individual variables and coherently across variables. MFVB for exponential families defines a fixed-point equation in the means of the approximating posterior, and our approach yields a covariance estimate by perturbing this fixed point. Inspired by linear response theory, we call our method linear response variational Bayes (LRVB). We demonstrate the accuracy of our method on simulated data sets.

preprint2014arXiv

Variational Bayes for Merging Noisy Databases

Bayesian entity resolution merges together multiple, noisy databases and returns the minimal collection of unique individuals represented, together with their true, latent record values. Bayesian methods allow flexible generative models that share power across databases as well as principled quantification of uncertainty for queries of the final, resolved database. However, existing Bayesian methods for entity resolution use Markov monte Carlo method (MCMC) approximations and are too slow to run on modern databases containing millions or billions of records. Instead, we propose applying variational approximations to allow scalable Bayesian inference in these models. We derive a coordinate-ascent approximation for mean-field variational Bayes, qualitatively compare our algorithm to existing methods, note unique challenges for inference that arise from the expected distribution of cluster sizes in entity resolution, and discuss directions for future work in this domain.

preprint2013arXiv

Cluster and Feature Modeling from Combinatorial Stochastic Processes

One of the focal points of the modern literature on Bayesian nonparametrics has been the problem of clustering, or partitioning, where each data point is modeled as being associated with one and only one of some collection of groups called clusters or partition blocks. Underlying these Bayesian nonparametric models are a set of interrelated stochastic processes, most notably the Dirichlet process and the Chinese restaurant process. In this paper we provide a formal development of an analogous problem, called feature modeling, for associating data points with arbitrary nonnegative integer numbers of groups, now called features or topics. We review the existing combinatorial stochastic process representations for the clustering problem and develop analogous representations for the feature modeling problem. These representations include the beta process and the Indian buffet process as well as new representations that provide insight into the connections between these processes. We thereby bring the same level of completeness to the treatment of Bayesian nonparametric feature modeling that has previously been achieved for Bayesian nonparametric clustering.

preprint2013arXiv

Combinatorial clustering and the beta negative binomial process

We develop a Bayesian nonparametric approach to a general family of latent class problems in which individuals can belong simultaneously to multiple classes and where each class can be exhibited multiple times by an individual. We introduce a combinatorial stochastic process known as the negative binomial process (NBP) as an infinite-dimensional prior appropriate for such problems. We show that the NBP is conjugate to the beta process, and we characterize the posterior distribution under the beta-negative binomial process (BNBP) and hierarchical models based on the BNBP (the HBNBP). We study the asymptotic properties of the BNBP and develop a three-parameter extension of the BNBP that exhibits power-law behavior. We derive MCMC algorithms for posterior inference under the HBNBP, and we present experiments using these algorithms in the domains of image segmentation, object recognition, and document analysis.

preprint2013arXiv

Feature allocations, probability functions, and paintboxes

The problem of inferring a clustering of a data set has been the subject of much research in Bayesian analysis, and there currently exists a solid mathematical foundation for Bayesian approaches to clustering. In particular, the class of probability distributions over partitions of a data set has been characterized in a number of ways, including via exchangeable partition probability functions (EPPFs) and the Kingman paintbox. Here, we develop a generalization of the clustering problem, called feature allocation, where we allow each data point to belong to an arbitrary, non-negative integer number of groups, now called features or topics. We define and study an "exchangeable feature probability function" (EFPF)---analogous to the EPPF in the clustering setting---for certain types of feature models. Moreover, we introduce a "feature paintbox" characterization---analogous to the Kingman paintbox for clustering---of the class of exchangeable feature models. We provide a further characterization of the subclass of feature allocations that have EFPF representations.

preprint2013arXiv

MAD-Bayes: MAP-based Asymptotic Derivations from Bayes

The classical mixture of Gaussians model is related to K-means via small-variance asymptotics: as the covariances of the Gaussians tend to zero, the negative log-likelihood of the mixture of Gaussians model approaches the K-means objective, and the EM algorithm approaches the K-means algorithm. Kulis & Jordan (2012) used this observation to obtain a novel K-means-like algorithm from a Gibbs sampler for the Dirichlet process (DP) mixture. We instead consider applying small-variance asymptotics directly to the posterior in Bayesian nonparametric models. This framework is independent of any specific Bayesian inference algorithm, and it has the major advantage that it generalizes immediately to a range of models beyond the DP mixture. To illustrate, we apply our framework to the feature learning setting, where the beta process and Indian buffet process provide an appropriate Bayesian nonparametric prior. We obtain a novel objective function that goes beyond clustering to learn (and penalize new) groupings for which we relax the mutual exclusivity and exhaustivity assumptions of clustering. We demonstrate several other algorithms, all of which are scalable and simple to implement. Empirical results demonstrate the benefits of the new framework.

preprint2013arXiv

Optimistic Concurrency Control for Distributed Unsupervised Learning

Research on distributed machine learning algorithms has focused primarily on one of two extremes - algorithms that obey strict concurrency constraints or algorithms that obey few or no such constraints. We consider an intermediate alternative in which algorithms optimistically assume that conflicts are unlikely and if conflicts do arise a conflict-resolution protocol is invoked. We view this "optimistic concurrency control" paradigm as particularly appropriate for large-scale machine learning algorithms, particularly in the unsupervised setting. We demonstrate our approach in three problem areas: clustering, feature learning and online facility location. We evaluate our methods via large-scale experiments in a cluster computing environment.

preprint2013arXiv

Real-time semiparametric regression

We develop algorithms for performing semiparametric regression analysis in real time, with data processed as it is collected and made immediately available via modern telecommunications technologies. Our definition of semiparametric regression is quite broad and includes, as special cases, generalized linear mixed models, generalized additive models, geostatistical models, wavelet nonparametric regression models and their various combinations. Fast updating of regression fits is achieved by couching semiparametric regression into a Bayesian hierarchical model or, equivalently, graphical model framework and employing online mean field variational ideas. An internet site attached to this article, realtime-semiparametric-regression.net, illustrates the methodology for continually arriving stock market, real estate and airline data. Flexible real-time analyses, based on increasingly ubiquitous streaming data sources stand to benefit.

preprint2013arXiv

Streaming Variational Bayes

We present SDA-Bayes, a framework for (S)treaming, (D)istributed, (A)synchronous computation of a Bayesian posterior. The framework makes streaming updates to the estimated posterior according to a user-specified approximation batch primitive. We demonstrate the usefulness of our framework, with variational Bayes (VB) as the primitive, by fitting the latent Dirichlet allocation model to two large-scale document collections. We demonstrate the advantages of our algorithm over stochastic variational inference (SVI) by comparing the two after a single pass through a known amount of data---a case where SVI may be applied---and in the streaming setting, where SVI does not apply.

preprint2012arXiv

Combining Spatial and Telemetric Features for Learning Animal Movement Models

We introduce a new graphical model for tracking radio-tagged animals and learning their movement patterns. The model provides a principled way to combine radio telemetry data with an arbitrary set of userdefined, spatial features. We describe an efficient stochastic gradient algorithm for fitting model parameters to data and demonstrate its effectiveness via asymptotic analysis and synthetic experiments. We also apply our model to real datasets, and show that it outperforms the most popular radio telemetry software package used in ecology. We conclude that integration of different data sources under a single statistical framework, coupled with appropriate parameter and state estimation procedures, produces both accurate location estimates and an interpretable statistical model of animal movement.

preprint2011arXiv

Beta processes, stick-breaking, and power laws

The beta-Bernoulli process provides a Bayesian nonparametric prior for models involving collections of binary-valued features. A draw from the beta process yields an infinite collection of probabilities in the unit interval, and a draw from the Bernoulli process turns these into binary-valued features. Recent work has provided stick-breaking representations for the beta process analogous to the well-known stick-breaking representation for the Dirichlet process. We derive one such stick-breaking representation directly from the characterization of the beta process as a completely random measure. This approach motivates a three-parameter generalization of the beta process, and we study the power laws that can be obtained from this generalized beta process. We present a posterior inference algorithm for the beta-Bernoulli process that exploits the stick-breaking representation, and we present experimental results for a discrete factor-analysis model.

preprint2011arXiv

Rapid, Machine-Learned Resource Allocation: Application to High-redshift GRB Follow-up

As the number of observed Gamma-Ray Bursts (GRBs) continues to grow, follow-up resources need to be used more efficiently in order to maximize science output from limited telescope time. As such, it is becoming increasingly important to rapidly identify bursts of interest as soon as possible after the event, before the afterglows fade beyond detectability. Studying the most distant (highest redshift) events, for instance, remains a primary goal for many in the field. Here we present our Random forest Automated Triage Estimator for GRB redshifts (RATE GRB-z) for rapid identification of high-redshift candidates using early-time metrics from the three telescopes onboard Swift. While the basic RATE methodology is generalizable to a number of resource allocation problems, here we demonstrate its utility for telescope-constrained follow-up efforts with the primary goal to identify and study high-z GRBs. For each new GRB, RATE GRB-z provides a recommendation - based on the available telescope time - of whether the event warrants additional follow-up resources. We train RATE GRB-z using a set consisting of 135 Swift bursts with known redshifts, only 18 of which are z > 4. Cross-validated performance metrics on this training data suggest that ~56% of high-z bursts can be captured from following up the top 20% of the ranked candidates, and ~84% of high-z bursts are identified after following up the top ~40% of candidates. We further use the method to rank 200+ Swift bursts with unknown redshifts according to their likelihood of being high-z.

preprint2010arXiv

Classification and categorical inputs with treed Gaussian process models

Recognizing the successes of treed Gaussian process (TGP) models as an interpretable and thrifty model for nonparametric regression, we seek to extend the model to classification. Both treed models and Gaussian processes (GPs) have, separately, enjoyed great success in application to classification problems. An example of the former is Bayesian CART. In the latter, real-valued GP output may be utilized for classification via latent variables, which provide classification rules by means of a softmax function. We formulate a Bayesian model averaging scheme to combine these two models and describe a Monte Carlo method for sampling from the full posterior distribution with joint proposals for the tree topology and the GP parameters corresponding to latent variables at the leaves. We concentrate on efficient sampling of the latent variables, which is important to obtain good mixing in the expanded parameter space. The tree structure is particularly helpful for this task and also for developing an efficient scheme for handling categorical predictors, which commonly arise in classification problems. Our proposed classification TGP (CTGP) methodology is illustrated on a collection of synthetic and real data sets. We assess performance relative to existing methods and thereby show how CTGP is highly flexible, offers tractable inference, produces rules that are easy to interpret, and performs well out of sample.

preprint2010arXiv

Fast and flexible selection with a single switch

Selection methods that require only a single-switch input, such as a button click or blink, are potentially useful for individuals with motor impairments, mobile technology users, and individuals wishing to transmit information securely. We present a single-switch selection method, "Nomon," that is general and efficient. Existing single-switch selection methods require selectable options to be arranged in ways that limit potential applications. By contrast, traditional operating systems, web browsers, and free-form applications (such as drawing) place options at arbitrary points on the screen. Nomon, however, has the flexibility to select any point on a screen. Nomon adapts automatically to an individual's clicking ability; it allows a person who clicks precisely to make a selection quickly and allows a person who clicks imprecisely more time to make a selection without error. Nomon reaps gains in information rate by allowing the specification of beliefs (priors) about option selection probabilities and by avoiding tree-based selection schemes in favor of direct (posterior) inference. We have developed both a Nomon-based writing application and a drawing application. To evaluate Nomon's performance, we compared the writing application with a popular existing method for single-switch writing (row-column scanning). Novice users wrote 35% faster with the Nomon interface than with the scanning interface. An experienced user (author TB, with > 10 hours practice) wrote at speeds of 9.3 words per minute with Nomon, using 1.2 clicks per character and making no errors in the final text.

Tamara Broderick

What is connected

Connect this record

See the researcher in context

Building this map preview

35 published item(s)

Gaussian processes at the Helm(holtz): A more fluid model for ocean currents

A Performance Evaluation of Nomon: A Flexible Interface for Noisy Single-Switch Users

Developing a Series of AI Challenges for the United States Department of the Air Force

Evaluating Sensitivity to the Stick-Breaking Prior in Bayesian Nonparametrics

Local Exchangeability

Many processors, little time: MCMC for partitions via optimal transport couplings

Measuring the robustness of Gaussian processes to kernel choice

More for less: Predicting and maximizing genetic variant discovery via Bayesian nonparametrics

A Swiss Army Infinitesimal Jackknife

Approximate Cross-Validation in High Dimensions with Guarantees

Validated Variational Inference via Practical Posterior Error Bounds

Truncated Random Measures

Exchangeable Trait Allocations

Completely random measures for modeling power laws in sparse graphs

Edge-exchangeable graphs and sparsity

Fast Measurements of Robustness to Changing Priors in Variational Bayes

Fast robustness quantification with variational Bayes

Posteriors, conjugacy, and exponential families for completely random measures

A translation of "The characteristic function of a random phenomenon" by Bruno de Finetti

Covariance Matrices and Influence Scores for Mean Field Variational Bayes

Robust Inference with Variational Bayes

Covariance Matrices for Mean Field Variational Bayes

Variational Bayes for Merging Noisy Databases

Cluster and Feature Modeling from Combinatorial Stochastic Processes

Combinatorial clustering and the beta negative binomial process

Feature allocations, probability functions, and paintboxes

MAD-Bayes: MAP-based Asymptotic Derivations from Bayes

Optimistic Concurrency Control for Distributed Unsupervised Learning

Real-time semiparametric regression

Streaming Variational Bayes

Combining Spatial and Telemetric Features for Learning Animal Movement Models

Beta processes, stick-breaking, and power laws

Rapid, Machine-Learned Resource Allocation: Application to High-redshift GRB Follow-up

Classification and categorical inputs with treed Gaussian process models

Fast and flexible selection with a single switch