Source author record

Silvia Liverani

Silvia Liverani appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications Computation Methodology

Catalog footprint

What is connected

5works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Variance matrix priors for Dirichlet process mixture models with Gaussian kernels

The Dirichlet Process Mixture Model (DPMM) is a Bayesian non-parametric approach widely used for density estimation and clustering. In this manuscript, we study the choice of prior for the variance or precision matrix when Gaussian kernels are adopted. Typically, in the relevant literature, the assessment of mixture models is done by considering observations in a space of only a handful of dimensions. Instead, we are concerned with more realistic problems of higher dimensionality, in a space of up to 20 dimensions. We observe that the choice of prior is increasingly important as the dimensionality of the problem increases. After identifying certain undesirable properties of standard priors in problems of higher dimensionality, we review and implement possible alternative priors. The most promising priors are identified, as well as other factors that affect the convergence of MCMC samplers. Our results show that the choice of prior is critical for deriving reliable posterior inferences. This manuscript offers a thorough overview and comparative investigation into possible priors, with detailed guidelines for their implementation. Although our work focuses on the use of the DPMM in clustering, it is also applicable to density estimation.

preprint2020arXiv

Dirichlet Process Mixture Models for Regression Discontinuity Designs

The Regression Discontinuity Design (RDD) is a quasi-experimental design that estimates the causal effect of a treatment when its assignment is defined by a threshold value for a continuous assignment variable. The RDD assumes that subjects with measurements within a bandwidth around the threshold belong to a common population, so that the threshold can be seen as a randomising device assigning treatment to those falling just above the threshold and withholding it from those who fall just below. Bandwidth selection represents a compelling decision for the RDD analysis as the results may be highly sensitive to its choice. A number of methods to select the optimal bandwidth, mainly originating from the econometric literature, have been proposed. However, their use in practice is limited. We propose a methodology that, tackling the problem from an applied point of view, consider units' exchangeability, i.e., their similarity with respect to measured covariates, as the main criteria to select subjects for the analysis, irrespectively of their distance from the threshold. We carry out clustering on the sample using a Dirichlet process mixture model to identify balanced and homogeneous clusters. Our proposal exploits the posterior similarity matrix, which contains the pairwise probabilities that two observations are allocated to the same cluster in the MCMC sample. Thus we include in the RDD analysis only those clusters for which we have stronger evidence of exchangeability. We illustrate the validity of our methodology with both a simulated experiment and a motivating example on the effect of statins to lower cholesterol level, using UK primary care data.

preprint2016arXiv

Modelling collinear and spatially correlated data

In this work we present a statistical approach to distinguish and interpret the complex relationship between several predictors and a response variable at the small area level, in the presence of i) high correlation between the predictors and ii) spatial correlation for the response. Covariates which are highly correlated create collinearity problems when used in a standard multiple regression model. Many methods have been proposed in the literature to address this issue. A very common approach is to create an index which aggregates all the highly correlated variables of interest. For example, it is well known that there is a relationship between social deprivation measured through the Multiple Deprivation Index (IMD) and air pollution; this index is then used as a confounder in assessing the effect of air pollution on health outcomes (e.g. respiratory hospital admissions or mortality). However it would be more informative to look specifically at each domain of the IMD and at its relationship with air pollution to better understand its role as a confounder in the epidemiological analyses. In this paper we illustrate how the complex relationships between the domains of IMD and air pollution can be deconstructed and analysed using profile regression, a Bayesian non-parametric model for clustering responses and covariates simultaneously. Moreover, we include an intrinsic spatial conditional autoregressive (ICAR) term to account for the spatial correlation of the response variable.

preprint2014arXiv

PReMiuM: An R Package for Profile Regression Mixture Models using Dirichlet Processes

PReMiuM is a recently developed R package for Bayesian clustering using a Dirichlet process mixture model. This model is an alternative to regression models, non-parametrically linking a response vector to covariate data through cluster membership. The package allows Bernoulli, Binomial, Poisson, Normal and categorical response, as well as Normal and discrete covariates. Additionally, predictions may be made for the response, and missing values for the covariates are handled. Several samplers and label switching moves are implemented along with diagnostic tools to assess convergence. A number of R functions for post-processing of the output are also provided. In addition to fitting mixtures, it may additionally be of interest to determine which covariates actively drive the mixture components. This is implemented in the package as variable selection.

preprint2014arXiv

Sampling from Dirichlet process mixture models with unknown concentration parameter: Mixing issues in large data implementations

We consider the question of Markov chain Monte Carlo sampling from a general stick-breaking Dirichlet process mixture model, with concentration parameter alpha. This paper introduces a Gibbs sampling algorithm that combines the slice sampling approach of Walker (2007) and the retrospective sampling approach of Papaspiliopoulos and Roberts (2008). Our general algorithm is implemented as efficient open source C++ software, available as an R package, and is based on a blocking strategy similar to that suggested by Papaspiliopoulos (2008) and implemented by Yau et al (2011). We discuss the difficulties of achieving good mixing in MCMC samplers of this nature and investigate sensitivity to initialisation. We additionally consider the challenges when an additional layer of hierarchy is added such that joint inference is to be made on alpha. We introduce a new label switching move and compute the marginal model posterior to help to surmount these difficulties. Our work is illustrated using a profile regression (Molitor et al, 2010) application, where we demonstrate good mixing behaviour for both synthetic and real examples.

Silvia Liverani

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Variance matrix priors for Dirichlet process mixture models with Gaussian kernels

Dirichlet Process Mixture Models for Regression Discontinuity Designs

Modelling collinear and spatially correlated data

PReMiuM: An R Package for Profile Regression Mixture Models using Dirichlet Processes

Sampling from Dirichlet process mixture models with unknown concentration parameter: Mixing issues in large data implementations