Source author record

Bani K. Mallick

Bani K. Mallick appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Applications Computation Genomics Machine Learning math.ST Statistics Theory

Catalog footprint

What is connected

17works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Bayesian Survival Tree Partition Model Using Latent Gaussian Processes

Survival models are used to analyze time-to-event data in a variety of disciplines. Proportional hazard models provide interpretable parameter estimates, but proportional hazards assumptions are not always appropriate. Non-parametric models are more flexible but often lack a clear inferential framework. We propose a Bayesian tree partition model which is both flexible and inferential. Inference is obtained through the posterior tree structure and flexibility is preserved by modeling the the hazard function in each partition using a latent exponentiated Gaussian process. An efficient reversible jump Markov chain Monte Carlo algorithm is accomplished by marginalizing the parameters in each partition element via a Laplace approximation. Consistency properties for the estimator are established. The method can be used to help determine subgroups as well as prognostic and/or predictive biomarkers in time-to-event data. The method is applied to a liver survival dataset and is compared with some existing methods on simulated data.

preprint2022arXiv

Adaptive Bayesian Variable Clustering via Structural Learning of Breast Cancer Data

Clustering of proteins is of interest in cancer cell biology. This article proposes a hierarchical Bayesian model for protein (variable) clustering hinging on correlation structure. Starting from a multivariate normal likelihood, we enforce the clustering through prior modeling using angle based unconstrained reparameterization of correlations and assume a truncated Poisson distribution (to penalize the large number of clusters) as prior on the number of clusters. The posterior distributions of the parameters are not in explicit form and we use a reversible jump Markov chain Monte Carlo (RJMCMC) based technique is used to simulate the parameters from the posteriors. The end products of the proposed method are estimated cluster configuration of the proteins (variables) along with the number of clusters. The Bayesian method is flexible enough to cluster the proteins as well as the estimate the number of clusters. The performance of the proposed method has been substantiated with extensive simulation studies and one protein expression data with a hereditary disposition in breast cancer where the proteins are coming from different pathways.

preprint2020arXiv

Directionally Dependent Multi-View Clustering Using Copula Model

In recent biomedical scientific problems, it is a fundamental issue to integratively cluster a set of objects from multiple sources of datasets. Such problems are mostly encountered in genomics, where data is collected from various sources, and typically represent distinct yet complementary information. Integrating these data sources for multi-source clustering is challenging due to their complex dependence structure including directional dependency. Particularly in genomics studies, it is known that there is certain directional dependence between DNA expression, DNA methylation, and RNA expression, widely called The Central Dogma. Most of the existing multi-view clustering methods either assume an independent structure or pair-wise (non-directional) dependency, thereby ignoring the directional relationship. Motivated by this, we propose a copula-based multi-view clustering model where a copula enables the model to accommodate the directional dependence existing in the datasets. We conduct a simulation experiment where the simulated datasets exhibiting inherent directional dependence: it turns out that ignoring the directional dependence negatively affects the clustering performance. As a real application, we applied our model to the breast cancer tumor samples collected from The Cancer Genome Altas (TCGA).

preprint2020arXiv

Estimation of COVID-19 spread curves integrating global data and borrowing information

Currently, novel coronavirus disease 2019 (COVID-19) is a big threat to global health. The rapid spread of the virus has created pandemic, and countries all over the world are struggling with a surge in COVID-19 infected cases. There are no drugs or other therapeutics approved by the US Food and Drug Administration to prevent or treat COVID-19: information on the disease is very limited and scattered even if it exists. This motivates the use of data integration, combining data from diverse sources and eliciting useful information with a unified view of them. In this paper, we propose a Bayesian hierarchical model that integrates global data for real-time prediction of infection trajectory for multiple countries. Because the proposed model takes advantage of borrowing information across multiple countries, it outperforms an existing individual country-based model. As fully Bayesian way has been adopted, the model provides a powerful predictive tool endowed with uncertainty quantification. Additionally, a joint variable selection technique has been integrated into the proposed modeling scheme, which aimed to identify possible country-level risk factors for severe disease due to COVID-19.

preprint2020arXiv

Quantile Graphical Models: Bayesian Approaches

Graphical models are ubiquitous tools to describe the interdependence between variables measured simultaneously such as large-scale gene or protein expression data. Gaussian graphical models (GGMs) are well-established tools for probabilistic exploration of dependence structures using precision matrices and they are generated under a multivariate normal joint distribution. However, they suffer from several shortcomings since they are based on Gaussian distribution assumptions. In this article, we propose a Bayesian quantile based approach for sparse estimation of graphs. We demonstrate that the resulting graph estimation is robust to outliers and applicable under general distributional assumptions. Furthermore, we develop efficient variational Bayes approximations to scale the methods for large data sets. Our methods are applied to a novel cancer proteomics data dataset wherein multiple proteomic antibodies are simultaneously assessed on tumor samples using reverse-phase protein arrays (RPPA) technology.

preprint2016arXiv

Bayesian Semiparametric Multivariate Density Deconvolution

We consider the problem of multivariate density deconvolution when the interest lies in estimating the distribution of a vector-valued random variable but precise measurements of the variable of interest are not available, observations being contaminated with additive measurement errors. The existing sparse literature on the problem assumes the density of the measurement errors to be completely known. We propose robust Bayesian semiparametric multivariate deconvolution approaches when the measurement error density is not known but replicated proxies are available for each unobserved value of the random vector. Additionally, we allow the variability of the measurement errors to depend on the associated unobserved value of the vector of interest through unknown relationships which also automatically includes the case of multivariate multiplicative measurement errors. Basic properties of finite mixture models, multivariate normal kernels and exchangeable priors are exploited in many novel ways to meet the modeling and computational challenges. Theoretical results that show the flexibility of the proposed methods are provided. We illustrate the efficiency of the proposed methods in recovering the true density of interest through simulation experiments. The methodology is applied to estimate the joint consumption pattern of different dietary components from contaminated 24 hour recalls.

preprint2016arXiv

Fast sampling with Gaussian scale-mixture priors in high-dimensional regression

We propose an efficient way to sample from a class of structured multivariate Gaussian distributions which routinely arise as conditional posteriors of model parameters that are assigned a conditionally Gaussian prior. The proposed algorithm only requires matrix operations in the form of matrix multiplications and linear system solutions. We exhibit that the computational complexity of the proposed algorithm grows linearly with the dimension unlike existing algorithms relying on Cholesky factorizations with cubic orders of complexity. The algorithm should be broadly applicable in settings where Gaussian scale mixture priors are used on high dimensional model parameters. We provide an illustration through posterior sampling in a high dimensional regression setting with a horseshoe prior on the vector of regression coefficients.

preprint2015arXiv

Bayesian Variable Selection with Structure Learning: Applications in Integrative Genomics

Significant advances in biotechnology have allowed for simultaneous measurement of molecular data points across multiple genomic and transcriptomic levels from a single tumor/cancer sample. This has motivated systematic approaches to integrate multi-dimensional structured datasets since cancer development and progression is driven by numerous co-ordinated molecular alterations and the interactions between them. We propose a novel two-step Bayesian approach that combines a variable selection framework with integrative structure learning between multiple sources of data. The structure learning in the first step is accomplished through novel joint graphical models for heterogeneous (mixed scale) data allowing for flexible incorporation of prior knowledge. This structure learning subsequently informs the variable selection in the second step to identify groups of molecular features within and across platforms associated with outcomes of cancer progression. The variable selection strategy adjusts for collinearity and multiplicity, and also has theoretical justifications. We evaluate our methods through simulations and apply them to a motivating genomic (DNA copy number and methylation) and transcriptomic (mRNA expression) data for assessing important markers associated with Glioblastoma progression.

preprint2014arXiv

Bayesian sparse graphical models for classification with application to protein expression data

Reverse-phase protein array (RPPA) analysis is a powerful, relatively new platform that allows for high-throughput, quantitative analysis of protein networks. One of the challenges that currently limit the potential of this technology is the lack of methods that allow for accurate data modeling and identification of related networks and samples. Such models may improve the accuracy of biological sample classification based on patterns of protein network activation and provide insight into the distinct biological relationships underlying different types of cancer. Motivated by RPPA data, we propose a Bayesian sparse graphical modeling approach that uses selection priors on the conditional relationships in the presence of class information. The novelty of our Bayesian model lies in the ability to draw information from the network data as well as from the associated categorical outcome in a unified hierarchical model for classification. In addition, our method allows for intuitive integration of a priori network information directly in the model and allows for posterior inference on the network topologies both within and between classes. Applying our methodology to an RPPA data set generated from panels of human breast cancer and ovarian cancer cell lines, we demonstrate that the model is able to distinguish the different cancer cell types more accurately than several existing models and to identify differential regulation of components of a critical signaling network (the PI3K-AKT pathway) between these two types of cancer. This approach represents a powerful new tool that can be used to improve our understanding of protein networks in cancer.

preprint2013arXiv

Adaptive Posterior Convergence Rates in Bayesian Density Deconvolution with Supersmooth Errors

Bayesian density deconvolution using nonparametric prior distributions is a useful alternative to the frequentist kernel based deconvolution estimators due to its potentially wide range of applicability, straightforward uncertainty quantification and generalizability to more sophisticated models. This article is the first substantive effort to theoretically quantify the behavior of the posterior in this recent line of research. In particular, assuming a known supersmooth error density, a Dirichlet process mixture of Normals on the true density leads to a posterior convergence rate same as the minimax rate $(\log n)^{-η/β}$ adaptively over the smoothness $η$ of an appropriate Hölder space of densities, where $β$ is the degree of smoothness of the error distribution. Our main contribution is achieving adaptive minimax rates with respect to the $L_p$ norm for $2 \leq p \leq \infty$ under mild regularity conditions on the true density. En route, we develop tight concentration bounds for a class of kernel based deconvolution estimators which might be of independent interest.

preprint2013arXiv

Bayes Regularized Graphical Model Estimation in High Dimensions

There has been an intense development of Bayes graphical model estimation approaches over the past decade - however, most of the existing methods are restricted to moderate dimensions. We propose a novel approach suitable for high dimensional settings, by decoupling model fitting and covariance selection. First, a full model based on a complete graph is fit under novel class of continuous shrinkage priors on the precision matrix elements, which induces shrinkage under an equivalence with Cholesky-based regularization while enabling conjugate updates of entire precision matrices. Subsequently, we propose a post-fitting graphical model estimation step which proceeds using penalized joint credible regions to perform neighborhood selection sequentially for each node. The posterior computation proceeds using straightforward fully Gibbs sampling, and the approach is scalable to high dimensions. The proposed approach is shown to be asymptotically consistent in estimating the graph structure for fixed $p$ when the truth is a Gaussian graphical model. Simulations show that our approach compares favorably with Bayesian competitors both in terms of graphical model estimation and computational efficiency. We apply our methods to high dimensional gene expression and microRNA datasets in cancer genomics.

preprint2013arXiv

Bayesian Low Rank and Sparse Covariance Matrix Decomposition

We consider the problem of estimating high-dimensional covariance matrices of a particular structure, which is a summation of low rank and sparse matrices. This covariance structure has a wide range of applications including factor analysis and random effects models. We propose a Bayesian method of estimating the covariance matrices by representing the covariance model in the form of a factor model with unknown number of latent factors. We introduce binary indicators for factor selection and rank estimation for the low rank component combined with a Bayesian lasso method for the sparse component estimation. Simulation studies show that our method can recover the rank as well as the sparsity of the two components respectively. We further extend our method to a graphical factor model where the graphical model of the residuals as well as selecting the number of factors is of interest. We employ a hyper-inverse Wishart prior for modeling decomposable graphs of the residuals, and a Bayesian graphical lasso selection method for unrestricted graphs. We show through simulations that the extended models can recover both the number of latent factors and the graphical model of the residuals successfully when the sample size is sufficient relative to the dimension.

preprint2013arXiv

Bayesian object classification of gold nanoparticles

The properties of materials synthesized with nanoparticles (NPs) are highly correlated to the sizes and shapes of the nanoparticles. The transmission electron microscopy (TEM) imaging technique can be used to measure the morphological characteristics of NPs, which can be simple circles or more complex irregular polygons with varying degrees of scales and sizes. A major difficulty in analyzing the TEM images is the overlapping of objects, having different morphological properties with no specific information about the number of objects present. Furthermore, the objects lying along the boundary render automated image analysis much more difficult. To overcome these challenges, we propose a Bayesian method based on the marked-point process representation of the objects. We derive models, both for the marks which parameterize the morphological aspects and the points which determine the location of the objects. The proposed model is an automatic image segmentation and classification procedure, which simultaneously detects the boundaries and classifies the NPs into one of the predetermined shape families. We execute the inference by sampling the posterior distribution using Markov chain Monte Carlo (MCMC) since the posterior is doubly intractable. We apply our novel method to several TEM imaging samples of gold NPs, producing the needed statistical characterization of their morphology.

preprint2013arXiv

Bayesian sparse graphical models and their mixtures using lasso selection priors

We propose Bayesian methods for Gaussian graphical models that lead to sparse and adaptively shrunk estimators of the precision (inverse covariance) matrix. Our methods are based on lasso-type regularization priors leading to parsimonious parameterization of the precision matrix, which is essential in several applications involving learning relationships among the variables. In this context, we introduce a novel type of selection prior that develops a sparse structure on the precision matrix by making most of the elements exactly zero, in addition to ensuring positive definiteness -- thus conducting model selection and estimation simultaneously. We extend these methods to finite and infinite mixtures of Gaussian graphical models for clustered data using Dirichlet process priors. We discuss appropriate posterior simulation schemes to implement posterior inference in the proposed models, including the evaluation of normalizing constants that are functions of parameters of interest which result from the restrictions on the correlation matrix. We evaluate the operating characteristics of our method via several simulations and in application to real data sets.

preprint2012arXiv

Investigating international new product diffusion speed: A semiparametric approach

Global marketing managers are interested in understanding the speed of the new product diffusion process and how the speed has changed in our ever more technologically advanced and global marketplace. Understanding the process allows firms to forecast the expected rate of return on their new products and develop effective marketing strategies. The most recent major study on this topic [Marketing Science 21 (2002) 97--114] investigated new product diffusions in the United States. We expand upon that study in three important ways. (1) Van den Bulte notes that a similar study is needed in the international context, especially in developing countries. Our study covers four new product diffusions across 31 developed and developing nations from 1980--2004. Our sample accounts for about 80% of the global economic output and 60% of the global population, allowing us to examine more general phenomena. (2) His model contains the implicit assumption that the diffusion speed parameter is constant throughout the diffusion life cycle of a product. Recognizing the likely effects on the speed parameter of recent changes in the marketplace, we model the parameter as a semiparametric function, allowing it the flexibility to change over time. (3) We perform a variable selection to determine that the number of internet users and the consumer price index are strongly associated with the speed of diffusion.

preprint2012arXiv

Nonparametric Bayesian Approaches to Non-homogeneous Hidden Markov Models

In this article a flexible Bayesian non-parametric model is proposed for non-homogeneous hidden Markov models. The model is developed through the amalgamation of the ideas of hidden Markov models and predictor dependent stick-breaking processes. Computation is carried out using auxiliary variable representation of the model which enable us to perform exact MCMC sampling from the posterior. Furthermore, the model is extended to the situation when the predictors can simultaneously in influence the transition dynamics of the hidden states as well as the emission distribution. Estimates of few steps ahead conditional predictive distributions of the response have been used as performance diagnostics for these models. The proposed methodology is illustrated through simulation experiments as well as analysis of a real data set concerned with the prediction of rainfall induced malaria epidemics.

preprint2011arXiv

A generalized linear mixed model for longitudinal binary data with a marginal logit link function

Longitudinal studies of a binary outcome are common in the health, social, and behavioral sciences. In general, a feature of random effects logistic regression models for longitudinal binary data is that the marginal functional form, when integrated over the distribution of the random effects, is no longer of logistic form. Recently, Wang and Louis [Biometrika 90 (2003) 765--775] proposed a random intercept model in the clustered binary data setting where the marginal model has a logistic form. An acknowledged limitation of their model is that it allows only a single random effect that varies from cluster to cluster. In this paper we propose a modification of their model to handle longitudinal data, allowing separate, but correlated, random intercepts at each measurement occasion. The proposed model allows for a flexible correlation structure among the random intercepts, where the correlations can be interpreted in terms of Kendall's $τ$. For example, the marginal correlations among the repeated binary outcomes can decline with increasing time separation, while the model retains the property of having matching conditional and marginal logit link functions. Finally, the proposed method is used to analyze data from a longitudinal study designed to monitor cardiac abnormalities in children born to HIV-infected women.

Bani K. Mallick

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

A Bayesian Survival Tree Partition Model Using Latent Gaussian Processes

Adaptive Bayesian Variable Clustering via Structural Learning of Breast Cancer Data

Directionally Dependent Multi-View Clustering Using Copula Model

Estimation of COVID-19 spread curves integrating global data and borrowing information

Quantile Graphical Models: Bayesian Approaches

Bayesian Semiparametric Multivariate Density Deconvolution

Fast sampling with Gaussian scale-mixture priors in high-dimensional regression

Bayesian Variable Selection with Structure Learning: Applications in Integrative Genomics

Bayesian sparse graphical models for classification with application to protein expression data

Adaptive Posterior Convergence Rates in Bayesian Density Deconvolution with Supersmooth Errors

Bayes Regularized Graphical Model Estimation in High Dimensions

Bayesian Low Rank and Sparse Covariance Matrix Decomposition

Bayesian object classification of gold nanoparticles

Bayesian sparse graphical models and their mixtures using lasso selection priors

Investigating international new product diffusion speed: A semiparametric approach

Nonparametric Bayesian Approaches to Non-homogeneous Hidden Markov Models

A generalized linear mixed model for longitudinal binary data with a marginal logit link function