Source author record

David Dunson

David Dunson appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Machine Learning Computation math.ST Statistics Theory Applications Distributed, Parallel, and Cluster Computing Computer Vision Cryptography and Security Multimedia Numerical Analysis physics.comp-ph

Catalog footprint

What is connected

31works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Using prior information to boost power in correlation structure support recovery

Hypothesis testing of structure in correlation and covariance matrices is of broad interest in many application areas. In high dimensions and/or small to moderate sample sizes, high error rates in testing is a substantial concern. This article focuses on increasing power through a frequentist assisted by Bayes (FAB) procedure. This FAB approach boosts power by including prior information on the correlation parameters. In particular, we suppose there is one of two sources of prior information: (i) a prior dataset that is distinct from the current data but related enough that it may contain valuable information about the correlation structure in the current data; and (ii) knowledge about a tendency for the correlations in different parameters to be similar so that it is appropriate to consider a hierarchical model. When the prior information is relevant, the proposed FAB approach can have significant gains in power. A divide-and-conquer algorithm is developed to reduce computational complexity in massive testing dimensions. We show improvements in power for detecting correlated gene pairs in genomic studies while maintaining control of Type I error or false discover rate (FDR).

preprint2022arXiv

Posterior computation with the Gibbs zig-zag sampler

An intriguing new class of piecewise deterministic Markov processes (PDMPs) has recently been proposed as an alternative to Markov chain Monte Carlo (MCMC). In order to facilitate the application to a larger class of problems, we propose a new class of PDMPs termed Gibbs zig-zag samplers, which allow parameters to be updated in blocks with a zig-zag sampler applied to certain parameters and traditional MCMC-style updates to others. We demonstrate the flexibility of this framework on posterior sampling for logistic models with shrinkage priors for high-dimensional regression and random effects and provide conditions for geometric ergodicity and the validity of a central limit theorem.

preprint2022arXiv

Predicting Phenotypes from Brain Connection Structure

This article focuses on the problem of predicting a response variable based on a network-valued predictor. Our motivation is the development of interpretable and accurate predictive models for cognitive traits and neuro-psychiatric disorders based on an individual's brain connection network (connectome). Current methods reduce the complex, high dimensional brain network into low-dimensional pre-specified features prior to applying standard predictive algorithms. These methods are sensitive to feature choice and inevitably discard important information. Instead, we propose a nonparametric Bayes class of models that utilize the entire adjacency matrix defining brain region connections to adaptively detect predictive algorithms, while maintaining interpretability. The Bayesian Connectomics (BaCon) model class utilizes Poisson-Dirichlet processes to find a lower-dimensional, bidirectional (covariate, subject) pattern in the adjacency matrix. The small n, large p problem is transformed into a "small n, small q" problem, facilitating an effective stochastic search of the predictors. A spike-and-slab prior for the cluster predictors strikes a balance between regression model parsimony and flexibility, resulting in improved inferences and test case predictions. We describe basic properties of the BaCon model and develop efficient algorithms for posterior computation. The resulting methods are found to outperform existing approaches and applied to a creative reasoning data set.

preprint2020arXiv

Bayesian Hierarchical Factor Regression Models to Infer Cause of Death From Verbal Autopsy Data

In low-resource settings where vital registration of death is not routine it is often of critical interest to determine and study the cause of death (COD) for individuals and the cause-specific mortality fraction (CSMF) for populations. Post-mortem autopsies, considered the gold standard for COD assignment, are often difficult or impossible to implement due to deaths occurring outside the hospital, expense, and/or cultural norms. For this reason, Verbal Autopsies (VAs) are commonly conducted, consisting of a questionnaire administered to next of kin recording demographic information, known medical conditions, symptoms, and other factors for the decedent. This article proposes a novel class of hierarchical factor regression models that avoid restrictive assumptions of standard methods, allow both the mean and covariance to vary with COD category, and can include covariate information on the decedent, region, or events surrounding death. Taking a Bayesian approach to inference, this work develops an MCMC algorithm and validates the FActor Regression for Verbal Autopsy (FARVA) model in simulation experiments. An application of FARVA to real VA data shows improved goodness-of-fit and better predictive performance in inferring COD and CSMF over competing methods. Code and a user manual are made available at https://github.com/kelrenmor/farva.

preprint2020arXiv

Bayesian joint modeling of chemical structure and dose response curves

Today there are approximately 85,000 chemicals regulated under the Toxic Substances Control Act, with around 2,000 new chemicals introduced each year. It is impossible to screen all of these chemicals for potential toxic effects either via full organism in vivo studies or in vitro high-throughput screening (HTS) programs. Toxicologists face the challenge of choosing which chemicals to screen, and predicting the toxicity of as-yet-unscreened chemicals. Our goal is to describe how variation in chemical structure relates to variation in toxicological response to enable in silico toxicity characterization designed to meet both of these challenges. With our Bayesian partially Supervised Sparse and Smooth Factor Analysis ($\text{BS}^3\text{FA}$) model, we learn a distance between chemicals targeted to toxicity, rather than one based on molecular structure alone. Our model also enables the prediction of chemical dose-response profiles based on chemical structure (that is, without in vivo or in vitro testing) by taking advantage of a large database of chemicals that have already been tested for toxicity in HTS programs. We show superior simulation performance in distance learning and modest to large gains in predictive ability compared to existing methods. Results from the high-throughput screening data application elucidate the relationship between chemical structure and a toxicity-relevant high-throughput assay. An R package for $\text{BS}^3\text{FA}$ is available online at https://github.com/kelrenmor/bs3fa.

preprint2020arXiv

Bayesian neural networks and dimensionality reduction

In conducting non-linear dimensionality reduction and feature learning, it is common to suppose that the data lie near a lower-dimensional manifold. A class of model-based approaches for such problems includes latent variables in an unknown non-linear regression function; this includes Gaussian process latent variable models and variational auto-encoders (VAEs) as special cases. VAEs are artificial neural networks (ANNs) that employ approximations to make computation tractable; however, current implementations lack adequate uncertainty quantification in estimating the parameters, predictive densities, and lower-dimensional subspace, and can be unstable and lack interpretability in practice. We attempt to solve these problems by deploying Markov chain Monte Carlo sampling algorithms (MCMC) for Bayesian inference in ANN models with latent variables. We address issues of identifiability by imposing constraints on the ANN parameters as well as by using anchor points. This is demonstrated on simulated and real data examples. We find that current MCMC sampling schemes face fundamental challenges in neural networks involving latent variables, motivating new research directions.

preprint2020arXiv

Fiedler Regularization: Learning Neural Networks with Graph Sparsity

We introduce a novel regularization approach for deep learning that incorporates and respects the underlying graphical structure of the neural network. Existing regularization methods often focus on dropping/penalizing weights in a global manner that ignores the connectivity structure of the neural network. We propose to use the Fiedler value of the neural network's underlying graph as a tool for regularization. We provide theoretical support for this approach via spectral graph theory. We list several useful properties of the Fiedler value that makes it suitable in regularization. We provide an approximate, variational approach for fast computation in practical training of neural networks. We provide bounds on such approximations. We provide an alternative but equivalent formulation of this framework in the form of a structurally weighted L1 penalty, thus linking our approach to sparsity induction. We performed experiments on datasets that compare Fiedler regularization with traditional regularization methods such as dropout and weight decay. Results demonstrate the efficacy of Fiedler regularization.

preprint2020arXiv

Maximum Pairwise Bayes Factors for Covariance Structure Testing

Hypothesis testing of structure in covariance matrices is of significant importance, but faces great challenges in high-dimensional settings. Although consistent frequentist one-sample covariance tests have been proposed, there is a lack of simple, computationally scalable, and theoretically sound Bayesian testing methods for large covariance matrices. Motivated by this gap and by the need for tests that are powerful against sparse alternatives, we propose a novel testing framework based on the maximum pairwise Bayes factor. Our initial focus is on one-sample covariance testing; the proposed test can {\it optimally} distinguish null and alternative hypotheses in a frequentist asymptotic sense. We then propose diagonal tests and a scalable covariance graph selection procedure that are shown to be consistent. A simulation study evaluates the proposed approach relative to competitors. We illustrate advantages of our graph selection method on a gene expression data set.

preprint2020arXiv

Principal Ellipsoid Analysis (PEA): Efficient non-linear dimension reduction & clustering

Even with the rise in popularity of over-parameterized models, simple dimensionality reduction and clustering methods, such as PCA and k-means, are still routinely used in an amazing variety of settings. A primary reason is the combination of simplicity, interpretability and computational efficiency. The focus of this article is on improving upon PCA and k-means, by allowing non-linear relations in the data and more flexible cluster shapes, without sacrificing the key advantages. The key contribution is a new framework for Principal Elliptical Analysis (PEA), defining a simple and computationally efficient alternative to PCA that fits the best elliptical approximation through the data. We provide theoretical guarantees on the proposed PEA algorithm using Vapnik-Chervonenkis (VC) theory to show strong consistency and uniform concentration bounds. Toy experiments illustrate the performance of PEA, and the ability to adapt to non-linear structure and complex cluster shapes. In a rich variety of real data clustering applications, PEA is shown to do as well as k-means for simple datasets, while dramatically improving performance in more complex settings.

preprint2019arXiv

Discontinuous Hamiltonian Monte Carlo for discrete parameters and discontinuous likelihoods

Hamiltonian Monte Carlo has emerged as a standard tool for posterior computation. In this article, we present an extension that can efficiently explore target distributions with discontinuous densities. Our extension in particular enables efficient sampling from ordinal parameters though embedding of probability mass functions into continuous spaces. We motivate our approach through a theory of discontinuous Hamiltonian dynamics and develop a corresponding numerical solver. The proposed solver is the first of its kind, with a remarkable ability to exactly preserve the Hamiltonian. We apply our algorithm to challenging posterior inference problems to demonstrate its wide applicability and competitive performance.

preprint2019arXiv

Recycling intermediate steps to improve Hamiltonian Monte Carlo

Hamiltonian Monte Carlo (HMC) and related algorithms have become routinely used in Bayesian computation. In this article, we present a simple and provably accurate method to improve the efficiency of HMC and related algorithms with essentially no extra computational cost. This is achieved by {recycling the intermediate states along simulated trajectories of Hamiltonian dynamics. Standard algorithms use only the end points of trajectories, wastefully discarding all the intermediate states. Compared to the alternative methods for utilizing the intermediate states, our algorithm is simpler to apply in practice and requires little programming effort beyond the usual implementations of HMC and related algorithms. Our algorithm applies straightforwardly to the no-U-turn sampler, arguably the most popular variant of HMC. Through a variety of experiments, we demonstrate that our recycling algorithm yields substantial computational efficiency gains.

preprint2016arXiv

DECOrrelated feature space partitioning for distributed sparse regression

Fitting statistical models is computationally challenging when the sample size or the dimension of the dataset is huge. An attractive approach for down-scaling the problem size is to first partition the dataset into subsets and then fit using distributed algorithms. The dataset can be partitioned either horizontally (in the sample space) or vertically (in the feature space). While the majority of the literature focuses on sample space partitioning, feature space partitioning is more effective when $p\gg n$. Existing methods for partitioning features, however, are either vulnerable to high correlations or inefficient in reducing the model dimension. In this paper, we solve these problems through a new embarrassingly parallel framework named DECO for distributed variable selection and parameter estimation. In DECO, variables are first partitioned and allocated to $m$ distributed workers. The decorrelated subset data within each worker are then fitted via any algorithm designed for high-dimensional problems. We show that by incorporating the decorrelation step, DECO can achieve consistent variable selection and parameter estimation on each subset with (almost) no assumptions. In addition, the convergence rate is nearly minimax optimal for both sparse and weakly sparse models and does NOT depend on the partition number $m$. Extensive numerical experiments are provided to illustrate the performance of the new framework.

preprint2016arXiv

No penalty no tears: Least squares in high-dimensional linear models

Ordinary least squares (OLS) is the default method for fitting linear models, but is not applicable for problems with dimensionality larger than the sample size. For these problems, we advocate the use of a generalized version of OLS motivated by ridge regression, and propose two novel three-step algorithms involving least squares fitting and hard thresholding. The algorithms are methodologically simple to understand intuitively, computationally easy to implement efficiently, and theoretically appealing for choosing models consistently. Numerical exercises comparing our methods with penalization-based approaches in simulations and data analyses illustrate the great potential of the proposed algorithms.

preprint2016arXiv

Variable length trajectory compressible hybrid Monte Carlo

Hybrid Monte Carlo (HMC) generates samples from a prescribed probability distribution in a configuration space by simulating Hamiltonian dynamics, followed by the Metropolis (-Hastings) acceptance/rejection step. Compressible HMC (CHMC) generalizes HMC to a situation in which the dynamics is reversible but not necessarily Hamiltonian. This article presents a framework to further extend the algorithm. Within the existing framework, each trajectory of the dynamics must be integrated for the same amount of (random) time to generate a valid Metropolis proposal. Our generalized acceptance/rejection mechanism allows a more deliberate choice of the integration time for each trajectory. The proposed algorithm in particular enables an effective application of variable step size integrators to HMC-type sampling algorithms based on reversible dynamics. The potential of our framework is further demonstrated by another extension of HMC which reduces the wasted computations due to unstable numerical approximations and corresponding rejected proposals.

preprint2015arXiv

Data augmentation for models based on rejection sampling

We present a data augmentation scheme to perform Markov chain Monte Carlo inference for models where data generation involves a rejection sampling algorithm. Our idea, which seems to be missing in the literature, is a simple scheme to instantiate the rejected proposals preceding each data point. The resulting joint probability over observed and rejected variables can be much simpler than the marginal distribution over the observed variables, which often involves intractable integrals. We consider three problems, the first being the modeling of flow-cytometry measurements subject to truncation. The second is a Bayesian analysis of the matrix Langevin distribution on the Stiefel manifold, and the third, Bayesian inference for a nonparametric Gaussian process density model. The latter two are instances of problems where Markov chain Monte Carlo inference is doubly-intractable. Our experiments demonstrate superior performance over state-of-the-art sampling algorithms for such problems.

preprint2014arXiv

Anisotropic function estimation using multi-bandwidth Gaussian processes

In nonparametric regression problems involving multiple predictors, there is typically interest in estimating an anisotropic multivariate regression surface in the important predictors while discarding the unimportant ones. Our focus is on defining a Bayesian procedure that leads to the minimax optimal rate of posterior contraction (up to a log factor) adapting to the unknown dimension and anisotropic smoothness of the true surface. We propose such an approach based on a Gaussian process prior with dimension-specific scalings, which are assigned carefully-chosen hyperpriors. We additionally show that using a homogenous Gaussian process with a single bandwidth leads to a sub-optimal rate in anisotropic cases.

preprint2014arXiv

Finite sample posterior concentration in high-dimensional regression

We study the behavior of the posterior distribution in high-dimensional Bayesian Gaussian linear regression models having $p\gg n$, with $p$ the number of predictors and $n$ the sample size. Our focus is on obtaining quantitative finite sample bounds ensuring sufficient posterior probability assigned in neighborhoods of the true regression coefficient vector, $β^0$, with high probability. We assume that $β^0$ is approximately $S$-sparse and obtain universal bounds, which provide insight into the role of the prior in controlling concentration of the posterior. Based on these finite sample bounds, we examine the implied asymptotic contraction rates for several examples showing that sparsely-structured and heavy-tail shrinkage priors exhibit rapid contraction rates. We also demonstrate that a stronger result holds for the Uniform-Gaussian\footnote[2]{A binary vector of indicators ($γ$) is drawn from the uniform distribution on the set of binary sequences with exactly $S$ ones, and then each $β_i\sim\mathcal{N}(0,V^2)$ if $γ_i=1$ and $β_i=0$ if $γ_i=0$.} prior. These types of finite sample bounds provide guidelines for designing and evaluating priors for high-dimensional problems.

preprint2014arXiv

Median Selection Subset Aggregation for Parallel Inference

For massive data sets, efficient computation commonly relies on distributed algorithms that store and process subsets of the data on different machines, minimizing communication costs. Our focus is on regression and classification problems involving many features. A variety of distributed algorithms have been proposed in this context, but challenges arise in defining an algorithm with low communication, theoretical guarantees and excellent practical performance in general settings. We propose a MEdian Selection Subset AGgregation Estimator (message) algorithm, which attempts to solve these problems. The algorithm applies feature selection in parallel for each subset using Lasso or another method, calculates the `median' feature inclusion index, estimates coefficients for the selected features in parallel for each subset, and then averages these estimates. The algorithm is simple, involves very minimal communication, scales efficiently in both sample and feature size, and has theoretical guarantees. In particular, we show model selection consistency and coefficient estimation efficiency. Extensive experiments show excellent performance in variable selection, estimation, prediction, and computation time relative to usual competitors.

preprint2014arXiv

Posterior contraction in sparse Bayesian factor models for massive covariance matrices

Sparse Bayesian factor models are routinely implemented for parsimonious dependence modeling and dimensionality reduction in high-dimensional applications. We provide theoretical understanding of such Bayesian procedures in terms of posterior convergence rates in inferring high-dimensional covariance matrices where the dimension can be larger than the sample size. Under relevant sparsity assumptions on the true covariance matrix, we show that commonly-used point mass mixture priors on the factor loadings lead to consistent estimation in the operator norm even when $p\gg n$. One of our major contributions is to develop a new class of continuous shrinkage priors and provide insights into their concentration around sparse vectors. Using such priors for the factor loadings, we obtain similar rate of convergence as obtained with point mass mixture priors. To obtain the convergence rates, we construct test functions to separate points in the space of high-dimensional covariance matrices using insights from random matrix theory; the tools developed may be of independent interest. We also derive minimax rates and show that the Bayesian posterior rates of convergence coincide with the minimax rates upto a $\sqrt{\log n}$ term.

preprint2014arXiv

Scalable multiscale density estimation

Although Bayesian density estimation using discrete mixtures has good performance in modest dimensions, there is a lack of statistical and computational scalability to high-dimensional multivariate cases. To combat the curse of dimensionality, it is necessary to assume the data are concentrated near a lower-dimensional subspace. However, Bayesian methods for learning this subspace along with the density of the data scale poorly computationally. To solve this problem, we propose an empirical Bayes approach, which estimates a multiscale dictionary using geometric multiresolution analysis in a first stage. We use this dictionary within a multiscale mixture model, which allows uncertainty in component allocation, mixture weights and scaling factors over a binary tree. A computational algorithm is proposed, which scales efficiently to massive dimensional problems. We provide some theoretical support for this geometric density estimation (GEODE) method, and illustrate the performance through simulated and real data examples.

preprint2013arXiv

Bayesian crack detection in ultra high resolution multimodal images of paintings

The preservation of our cultural heritage is of paramount importance. Thanks to recent developments in digital acquisition techniques, powerful image analysis algorithms are developed which can be useful non-invasive tools to assist in the restoration and preservation of art. In this paper we propose a semi-supervised crack detection method that can be used for high-dimensional acquisitions of paintings coming from different modalities. Our dataset consists of a recently acquired collection of images of the Ghent Altarpiece (1432), one of Northern Europe's most important art masterpieces. Our goal is to build a classifier that is able to discern crack pixels from the background consisting of non-crack pixels, making optimal use of the information that is provided by each modality. To accomplish this we employ a recently developed non-parametric Bayesian classifier, that uses tensor factorizations to characterize any conditional probability. A prior is placed on the parameters of the factorization such that every possible interaction between predictors is allowed while still identifying a sparse subset among these predictors. The proposed Bayesian classifier, which we will refer to as conditional Bayesian tensor factorization or CBTF, is assessed by visually comparing classification results with the Random Forest (RF) algorithm.

preprint2013arXiv

Bayesian factorizations of big sparse tensors

It has become routine to collect data that are structured as multiway arrays (tensors). There is an enormous literature on low rank and sparse matrix factorizations, but limited consideration of extensions to the tensor case in statistics. The most common low rank tensor factorization relies on parallel factor analysis (PARAFAC), which expresses a rank $k$ tensor as a sum of rank one tensors. When observations are only available for a tiny subset of the cells of a big tensor, the low rank assumption is not sufficient and PARAFAC has poor performance. We induce an additional layer of dimension reduction by allowing the effective rank to vary across dimensions of the table. For concreteness, we focus on a contingency table application. Taking a Bayesian approach, we place priors on terms in the factorization and develop an efficient Gibbs sampler for posterior computation. Theory is provided showing posterior concentration rates in high-dimensional settings, and the methods are shown to have excellent performance in simulations and several real data applications.

preprint2013arXiv

Generalized double Pareto shrinkage

We propose a generalized double Pareto prior for Bayesian shrinkage estimation and inferences in linear models. The prior can be obtained via a scale mixture of Laplace or normal distributions, forming a bridge between the Laplace and Normal-Jeffreys' priors. While it has a spike at zero like the Laplace density, it also has a Student's $t$-like tail behavior. Bayesian computation is straightforward via a simple Gibbs sampling algorithm. We investigate the properties of the maximum a posteriori estimator, as sparse estimation plays an important role in many problems, reveal connections with some well-established regularization procedures, and show some asymptotic results. The performance of the prior is tested through simulations and an application.

preprint2012arXiv

Bayesian inference on dependence in multivariate longitudinal data

In many applications, it is of interest to assess the dependence structure in multivariate longitudinal data. Discovering such dependence is challenging due to the dimensionality involved. By concatenating the random effects from component models for each response, dependence within and across longitudinal responses can be characterized through a large random effects covariance matrix. Motivated by the common problems in estimating this matrix, especially the off-diagonal elements, we propose a Bayesian approach that relies on shrinkage priors for parameters in a modified Cholesky decomposition. Without adjustment, such priors and previous related approaches are order-dependent and tend to shrink strongly toward an ARtype structure. We propose moment-matching (MM) priors to mitigate such problems. Efficient Gibbs samplers are developed for posterior computation. The methods are illustrated through simulated examples and are applied to a longitudinal epidemiologic study of hormones and oxidative stress.

preprint2012arXiv

Bayesian Watermark Attacks

This paper presents an application of statistical machine learning to the field of watermarking. We propose a new attack model on additive spread-spectrum watermarking systems. The proposed attack is based on Bayesian statistics. We consider the scenario in which a watermark signal is repeatedly embedded in specific, possibly chosen based on a secret message bitstream, segments (signals) of the host data. The host signal can represent a patch of pixels from an image or a video frame. We propose a probabilistic model that infers the embedded message bitstream and watermark signal, directly from the watermarked data, without access to the decoder. We develop an efficient Markov chain Monte Carlo sampler for updating the model parameters from their conjugate full conditional posteriors. We also provide a variational Bayesian solution, which further increases the convergence speed of the algorithm. Experiments with synthetic and real image signals demonstrate that the attack model is able to correctly infer a large part of the message bitstream and obtain a very accurate estimate of the watermark signal.

preprint2012arXiv

Beta-Negative Binomial Process and Poisson Factor Analysis

A beta-negative binomial (BNB) process is proposed, leading to a beta-gamma-Poisson process, which may be viewed as a "multi-scoop" generalization of the beta-Bernoulli process. The BNB process is augmented into a beta-gamma-gamma-Poisson hierarchical structure, and applied as a nonparametric Bayesian prior for an infinite Poisson factor analysis model. A finite approximation for the beta process Levy random measure is constructed for convenient implementation. Efficient MCMC computations are performed with data augmentation and marginalization techniques. Encouraging results are shown on document count matrix factorization.

preprint2012arXiv

Ensemble Methods for Convex Regression with Applications to Geometric Programming Based Circuit Design

Convex regression is a promising area for bridging statistical estimation and deterministic convex optimization. New piecewise linear convex regression methods are fast and scalable, but can have instability when used to approximate constraints or objective functions for optimization. Ensemble methods, like bagging, smearing and random partitioning, can alleviate this problem and maintain the theoretical properties of the underlying estimator. We empirically examine the performance of ensemble methods for prediction and optimization, and then apply them to device modeling and constraint approximation for geometric programming based circuit design.

preprint2012arXiv

Lognormal and Gamma Mixed Negative Binomial Regression

In regression analysis of counts, a lack of simple and efficient algorithms for posterior computation has made Bayesian approaches appear unattractive and thus underdeveloped. We propose a lognormal and gamma mixed negative binomial (NB) regression model for counts, and present efficient closed-form Bayesian inference; unlike conventional Poisson models, the proposed approach has two free parameters to include two different kinds of random effects, and allows the incorporation of prior information, such as sparsity in the regression coefficients. By placing a gamma distribution prior on the NB dispersion parameter r, and connecting a lognormal distribution prior with the logit of the NB probability parameter p, efficient Gibbs sampling and variational Bayes inference are both developed. The closed-form updates are obtained by exploiting conditional conjugacy via both a compound Poisson representation and a Polya-Gamma distribution based data augmentation approach. The proposed Bayesian inference can be implemented routinely, while being easily generalizable to more complex settings involving multivariate dependence structures. The algorithms are illustrated using real examples.

preprint2011arXiv

Bayesian Nonparametric Covariance Regression

Although there is a rich literature on methods for allowing the variance in a univariate regression model to vary with predictors, time and other factors, relatively little has been done in the multivariate case. Our focus is on developing a class of nonparametric covariance regression models, which allow an unknown p x p covariance matrix to change flexibly with predictors. The proposed modeling framework induces a prior on a collection of covariance matrices indexed by predictors through priors for predictor-dependent loadings matrices in a factor model. In particular, the predictor-dependent loadings are characterized as a sparse combination of a collection of unknown dictionary functions (e.g, Gaussian process random functions). The induced covariance is then a regularized quadratic function of these dictionary elements. Our proposed framework leads to a highly-flexible, but computationally tractable formulation with simple conjugate posterior updates that can readily handle missing data. Theoretical properties are discussed and the methods are illustrated through simulations studies and an application to the Google Flu Trends data.

preprint2011arXiv

Density Estimation and Classification via Bayesian Nonparametric Learning of Affine Subspaces

It is now practically the norm for data to be very high dimensional in areas such as genetics, machine vision, image analysis and many others. When analyzing such data, parametric models are often too inflexible while nonparametric procedures tend to be non-robust because of insufficient data on these high dimensional spaces. It is often the case with high-dimensional data that most of the variability tends to be along a few directions, or more generally along a much smaller dimensional submanifold of the data space. In this article, we propose a class of models that flexibly learn about this submanifold and its dimension which simultaneously performs dimension reduction. As a result, density estimation is carried out efficiently. When performing classification with a large predictor space, our approach allows the category probabilities to vary nonparametrically with a few features expressed as linear combinations of the predictors. As opposed to many black-box methods for dimensionality reduction, the proposed model is appealing in having clearly interpretable and identifiable parameters. Gibbs sampling methods are developed for posterior computation, and the methods are illustrated in simulated and real data applications.

preprint2011arXiv

Efficient Gaussian Process Regression for Large Data Sets

Gaussian processes (GPs) are widely used in nonparametric regression, classification and spatio-temporal modeling, motivated in part by a rich literature on theoretical properties. However, a well known drawback of GPs that limits their use is the expensive computation, typically O($n^3$) in performing the necessary matrix inversions with $n$ denoting the number of data points. In large data sets, data storage and processing also lead to computational bottlenecks and numerical stability of the estimates and predicted values degrades with $n$. To address these problems, a rich variety of methods have been proposed, with recent options including predictive processes in spatial data analysis and subset of regressors in machine learning. The underlying idea in these approaches is to use a subset of the data, leading to questions of sensitivity to the subset and limitations in estimating fine scale structure in regions that are not well covered by the subset. Motivated by the literature on compressive sensing, we propose an alternative random projection of all the data points onto a lower-dimensional subspace. We demonstrate the superiority of this approach from a theoretical perspective and through the use of simulated and real data examples. Some Keywords: Bayesian; Compressive Sensing; Dimension Reduction; Gaussian Processes; Random Projections; Subset Selection

David Dunson

What is connected

Connect this record

See the researcher in context

Building this map preview

31 published item(s)

Using prior information to boost power in correlation structure support recovery

Posterior computation with the Gibbs zig-zag sampler

Predicting Phenotypes from Brain Connection Structure

Bayesian Hierarchical Factor Regression Models to Infer Cause of Death From Verbal Autopsy Data

Bayesian joint modeling of chemical structure and dose response curves

Bayesian neural networks and dimensionality reduction

Fiedler Regularization: Learning Neural Networks with Graph Sparsity

Maximum Pairwise Bayes Factors for Covariance Structure Testing

Principal Ellipsoid Analysis (PEA): Efficient non-linear dimension reduction & clustering

Discontinuous Hamiltonian Monte Carlo for discrete parameters and discontinuous likelihoods

Recycling intermediate steps to improve Hamiltonian Monte Carlo

DECOrrelated feature space partitioning for distributed sparse regression

No penalty no tears: Least squares in high-dimensional linear models

Variable length trajectory compressible hybrid Monte Carlo

Data augmentation for models based on rejection sampling

Anisotropic function estimation using multi-bandwidth Gaussian processes

Finite sample posterior concentration in high-dimensional regression

Median Selection Subset Aggregation for Parallel Inference

Posterior contraction in sparse Bayesian factor models for massive covariance matrices

Scalable multiscale density estimation

Bayesian crack detection in ultra high resolution multimodal images of paintings

Bayesian factorizations of big sparse tensors

Generalized double Pareto shrinkage

Bayesian inference on dependence in multivariate longitudinal data

Bayesian Watermark Attacks

Beta-Negative Binomial Process and Poisson Factor Analysis

Ensemble Methods for Convex Regression with Applications to Geometric Programming Based Circuit Design

Lognormal and Gamma Mixed Negative Binomial Regression

Bayesian Nonparametric Covariance Regression

Density Estimation and Classification via Bayesian Nonparametric Learning of Affine Subspaces

Efficient Gaussian Process Regression for Large Data Sets