Source author record

Abhra Sarkar

Abhra Sarkar appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Applications math.ST Statistics Theory

Catalog footprint

What is connected

11works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Bayesian Scalable Precision Factor Analysis for Massive Sparse Gaussian Graphical Models

We propose a novel approach to estimating the precision matrix of multivariate Gaussian data that relies on decomposing them into a low-rank and a diagonal component. Such decompositions are very popular for modeling large covariance matrices as they admit a latent factor based representation that allows easy inference. The same is however not true for precision matrices due to the lack of computationally convenient representations which restricts inference to low-to-moderate dimensional problems. We address this remarkable gap in the literature by building on a latent variable representation for such decomposition for precision matrices. The construction leads to an efficient Gibbs sampler that scales very well to high-dimensional problems far beyond the limits of the current state-of-the-art. The ability to efficiently explore the full posterior space also allows the model uncertainty to be easily assessed. The decomposition crucially additionally allows us to adapt sparsity inducing priors to shrink the insignificant entries of the precision matrix toward zero, making the approach adaptable to high-dimensional small-sample-size sparse settings. Exact zeros in the matrix encoding the underlying conditional independence graph are then determined via a novel posterior false discovery rate control procedure. A near minimax optimal posterior concentration rate for estimating precision matrices is attained by our method under mild regularity assumptions. We evaluate the method's empirical performance through synthetic experiments and illustrate its practical utility in data sets from two different application domains.

preprint2022arXiv

Bayesian Semiparametric Covariate Informed Multivariate Density Deconvolution

Estimating the marginal and joint densities of the long-term average intakes of different dietary components is an important problem in nutritional epidemiology. Since these variables cannot be directly measured, data are usually collected in the form of 24-hour recalls of the intakes. The problem of estimating the density of the latent long-term average intakes from their observed but error contaminated recalls then becomes a problem of multivariate deconvolution of densities. The underlying densities could potentially vary with the subjects' demographic characteristics such as sex, ethnicity, age, etc. The problem of density deconvolution in the presence of associated precisely measured covariates has, however, never been considered before, not even in the univariate setting. We present a flexible Bayesian semiparametric approach to covariate informed multivariate deconvolution. Building on recent advances in copula deconvolution and conditional tensor factorization techniques, our proposed method not only allows the joint and the marginal densities to vary flexibly with the associated predictors but also allows automatic selection of the most influential predictors. Importantly, the method also allows the density of interest and the density of the measurement errors to vary with potentially different sets of predictors. We design Markov chain Monte Carlo algorithms that enable efficient posterior inference, appropriately accommodating uncertainty in all aspects of our analysis. The empirical efficacy of the proposed method is illustrated through simulation experiments. Its practical utility is demonstrated in the afore-described nutritional epidemiology applications in estimating covariate-adjusted long term intakes of different dietary components. Supplementary materials include substantive additional details and R codes are also available online.

preprint2022arXiv

Bayesian Semiparametric Hidden Markov Tensor Partition Models for Longitudinal Data with Local Variable Selection

We present a flexible Bayesian semiparametric mixed model for longitudinal data analysis in the presence of potentially high-dimensional categorical covariates. Building on a novel hidden Markov tensor decomposition technique, our proposed method allows the fixed effects components to vary between dependent random partitions of the covariate space at different time points. The mechanism not only allows different sets of covariates to be included in the model at different time points but also allows the selected predictors' influences to vary flexibly over time. Smooth time-varying additive random effects are used to capture subject specific heterogeneity. We establish posterior convergence guarantees for both function estimation and variable selection. We design a Markov chain Monte Carlo algorithm for posterior computation. We evaluate the method's empirical performances through synthetic experiments and demonstrate its practical utility through real world applications.

preprint2022arXiv

Bayesian Tensor Factorized Vector Autoregressive Models for Inferring Granger Causality Patterns from High-Dimensional Multi-subject Panel Neuroimaging Data

Understanding the dynamics of functional brain connectivity patterns using noninvasive neuroimaging techniques is an important focus in human neuroscience. Vector autoregressive (VAR) processes and Granger causality analysis (GCA) have been extensively used for this purpose. While high-resolution multi-subject neuroimaging data are routinely collected now-a-days, the statistics literature on VAR models has remained heavily focused on small-to-moderate dimensional problems and single-subject data. Motivated by these issues, we develop a novel Bayesian random effects panel VAR model for multi-subject high-dimensional neuroimaging data. We begin with a single-subject model that structures the VAR coefficients as a three-way tensor, then reduces the dimensions by applying a Tucker tensor decomposition. A novel sparsity-inducing shrinkage prior allows data-adaptive rank and lag selection. We then extend the approach to a novel random effects model for multi-subject data that carefully avoids the dimensions getting exploded with the number of subjects but also flexibly accommodates subject-specific heterogeneity. We design a Markov chain Monte Carlo algorithm for posterior computation. Finally, GCA with posterior false discovery control is performed on the posterior samples. The method shows excellent empirical performance in simulation experiments. Applied to our motivating functional magnetic resonance imaging study, the approach allows the directional connectivity of human brain networks to be studied in fine detail, revealing meaningful but previously unsubstantiated cortical connectivity patterns.

preprint2020arXiv

Bayesian Semiparametric Longitudinal Drift-Diffusion Mixed Models for Tone Learning in Adults

Understanding how adult humans learn non-native speech categories such as tone information has shed novel insights into the mechanisms underlying experience-dependent brain plasticity. Scientists have traditionally examined these questions using longitudinal learning experiments under a multi-category decision making paradigm. Drift-diffusion processes are popular in such contexts for their ability to mimic underlying neural mechanisms. Motivated by these problems, we develop a novel Bayesian semiparametric inverse Gaussian drift-diffusion mixed model for multi-alternative decision making in longitudinal settings. We design a Markov chain Monte Carlo algorithm for posterior computation. We evaluate the method's empirical performances through synthetic experiments. Applied to our motivating longitudinal tone learning study, the method provides novel insights into how the biologically interpretable model parameters evolve with learning, differ between input-response tone combinations, and differ between well and poorly performing adults.

preprint2016arXiv

Bayesian Semiparametric Mixed Effects Markov Chains

Studying the neurological, genetic and evolutionary basis of human vocal communication mechanisms using animal vocalization models is an important field of neuroscience. The data sets typically comprise structured sequences of syllables or `songs' produced by animals from different genotypes under different social contexts. We develop a novel Bayesian semiparametric framework for inference in such data sets. Our approach is built on a novel class of mixed effects Markov transition models for the songs that accommodates exogenous influences of genotype and context as well as animal-specific heterogeneity. We design efficient Markov chain Monte Carlo algorithms for posterior computation. Crucial advantages of the proposed approach include its ability to provide insights into key scientific queries related to global and local influences of the exogenous predictors on the transition dynamics via automated tests of hypotheses. The methodology is illustrated using simulation experiments and the aforementioned motivating application in neuroscience.

preprint2016arXiv

Bayesian Semiparametric Multivariate Density Deconvolution

We consider the problem of multivariate density deconvolution when the interest lies in estimating the distribution of a vector-valued random variable but precise measurements of the variable of interest are not available, observations being contaminated with additive measurement errors. The existing sparse literature on the problem assumes the density of the measurement errors to be completely known. We propose robust Bayesian semiparametric multivariate deconvolution approaches when the measurement error density is not known but replicated proxies are available for each unobserved value of the random vector. Additionally, we allow the variability of the measurement errors to depend on the associated unobserved value of the vector of interest through unknown relationships which also automatically includes the case of multivariate multiplicative measurement errors. Basic properties of finite mixture models, multivariate normal kernels and exchangeable priors are exploited in many novel ways to meet the modeling and computational challenges. Theoretical results that show the flexibility of the proposed methods are provided. We illustrate the efficiency of the proposed methods in recovering the true density of interest through simulation experiments. The methodology is applied to estimate the joint consumption pattern of different dietary components from contaminated 24 hour recalls.

preprint2015arXiv

Bayesian Nonparametric Modeling of Higher Order Markov Chains

We consider the problem of flexible modeling of higher order Markov chains when an upper bound on the order of the chain is known but the true order and nature of the serial dependence are unknown. We propose Bayesian nonparametric methodology based on conditional tensor factorizations, which can characterize any transition probability with a specified maximal order. The methodology selects the important lags and captures higher order interactions among the lags, while also facilitating calculation of Bayes factors for a variety of hypotheses of interest. We design efficient Markov chain Monte Carlo algorithms for posterior computation, allowing for uncertainty in the set of important lags to be included and in the nature and order of the serial dependence. The methods are illustrated using simulation experiments and real world applications.

preprint2013arXiv

Adaptive Posterior Convergence Rates in Bayesian Density Deconvolution with Supersmooth Errors

Bayesian density deconvolution using nonparametric prior distributions is a useful alternative to the frequentist kernel based deconvolution estimators due to its potentially wide range of applicability, straightforward uncertainty quantification and generalizability to more sophisticated models. This article is the first substantive effort to theoretically quantify the behavior of the posterior in this recent line of research. In particular, assuming a known supersmooth error density, a Dirichlet process mixture of Normals on the true density leads to a posterior convergence rate same as the minimax rate $(\log n)^{-η/β}$ adaptively over the smoothness $η$ of an appropriate Hölder space of densities, where $β$ is the degree of smoothness of the error distribution. Our main contribution is achieving adaptive minimax rates with respect to the $L_p$ norm for $2 \leq p \leq \infty$ under mild regularity conditions on the true density. En route, we develop tight concentration bounds for a class of kernel based deconvolution estimators which might be of independent interest.

preprint2013arXiv

Bayesian Low Rank and Sparse Covariance Matrix Decomposition

We consider the problem of estimating high-dimensional covariance matrices of a particular structure, which is a summation of low rank and sparse matrices. This covariance structure has a wide range of applications including factor analysis and random effects models. We propose a Bayesian method of estimating the covariance matrices by representing the covariance model in the form of a factor model with unknown number of latent factors. We introduce binary indicators for factor selection and rank estimation for the low rank component combined with a Bayesian lasso method for the sparse component estimation. Simulation studies show that our method can recover the rank as well as the sparsity of the two components respectively. We further extend our method to a graphical factor model where the graphical model of the residuals as well as selecting the number of factors is of interest. We employ a hyper-inverse Wishart prior for modeling decomposable graphs of the residuals, and a Bayesian graphical lasso selection method for unrestricted graphs. We show through simulations that the extended models can recover both the number of latent factors and the graphical model of the residuals successfully when the sample size is sufficient relative to the dimension.

preprint2012arXiv

Nonparametric Bayesian Approaches to Non-homogeneous Hidden Markov Models

In this article a flexible Bayesian non-parametric model is proposed for non-homogeneous hidden Markov models. The model is developed through the amalgamation of the ideas of hidden Markov models and predictor dependent stick-breaking processes. Computation is carried out using auxiliary variable representation of the model which enable us to perform exact MCMC sampling from the posterior. Furthermore, the model is extended to the situation when the predictors can simultaneously in influence the transition dynamics of the hidden states as well as the emission distribution. Estimates of few steps ahead conditional predictive distributions of the response have been used as performance diagnostics for these models. The proposed methodology is illustrated through simulation experiments as well as analysis of a real data set concerned with the prediction of rainfall induced malaria epidemics.

Abhra Sarkar

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Bayesian Scalable Precision Factor Analysis for Massive Sparse Gaussian Graphical Models

Bayesian Semiparametric Covariate Informed Multivariate Density Deconvolution

Bayesian Semiparametric Hidden Markov Tensor Partition Models for Longitudinal Data with Local Variable Selection

Bayesian Tensor Factorized Vector Autoregressive Models for Inferring Granger Causality Patterns from High-Dimensional Multi-subject Panel Neuroimaging Data

Bayesian Semiparametric Longitudinal Drift-Diffusion Mixed Models for Tone Learning in Adults

Bayesian Semiparametric Mixed Effects Markov Chains

Bayesian Semiparametric Multivariate Density Deconvolution

Bayesian Nonparametric Modeling of Higher Order Markov Chains

Adaptive Posterior Convergence Rates in Bayesian Density Deconvolution with Supersmooth Errors

Bayesian Low Rank and Sparse Covariance Matrix Decomposition

Nonparametric Bayesian Approaches to Non-homogeneous Hidden Markov Models