Perfecting MCMC Sampling: Recipes and Reservations
This review paper is intended for the Handbook of Markov chain Monte Carlo's second edition. The authors will be grateful for any suggestions that could perfect it.
Discover
Workspaces
Network
Opportunities
Account
Researcher profile
Radu V. Craiu contributes to research discovery and scholarly infrastructure.
Trust snapshot
Actions
Identity and collaboration
Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.
Log in to claimDirect collaboration
Claim this author entity first to unlock direct invitations.
Research graph
Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.
BZPEER is loading the nearby papers, people, topics and institutions for this page.
Published work
This review paper is intended for the Handbook of Markov chain Monte Carlo's second edition. The authors will be grateful for any suggestions that could perfect it.
Establishing a low-dimensional representation of the data leads to efficient data learning strategies. In many cases, the reduced dimension needs to be explicitly stated and estimated from the data. We explore the estimation of dimension in finite samples as a constrained optimization problem, where the estimated dimension is a maximizer of a penalized profile likelihood criterion within the framework of a probabilistic principal components analysis. Unlike other penalized maximization problems that require an "optimal" penalty tuning parameter, we propose a data-averaging procedure whereby the estimated dimension emerges as the most favourable choice over a range of plausible penalty parameters. The proposed heuristic is compared to a large number of alternative criteria in simulations and an application to gene expression data. Extensive simulation studies reveal that none of the methods uniformly dominate the other and highlight the importance of subject-specific knowledge in choosing statistical methods for dimension learning. Our application results also suggest that gene expression data have a higher intrinsic dimension than previously thought. Overall, our proposed heuristic strikes a good balance and is the method of choice when model assumptions deviated moderately.
Multi-collinearity is a wide-spread phenomenon in modern statistical applications and when ignored, can negatively impact model selection and statistical inference. Classic tools and measures that were developed for "$n>p$" data are not applicable nor interpretable in the high-dimensional regime. Here we propose 1) new individualized measures that can be used to visualize patterns of multi-collinearity, and subsequently 2) global measures to assess the overall burden of multi-collinearity without limiting the observed data dimensions. We applied these measures to genomic applications to investigate patterns of multi-collinearity in genetic variations across individuals with diverse ancestral backgrounds. The measures were able to visually distinguish genomic regions of excessive multi-collinearity and contrast the level of multi-collinearity between different continental populations.
Many imputation methods are based on statistical models that assume that the variable of interest is a noisy observation of a function of the auxiliary variables or covariates. Misspecification of this model may lead to severe errors in estimates and to misleading conclusions. A new imputation method for item nonresponse in surveys is proposed based on a nonparametric estimation of the functional dependence between the variable of interest and the auxiliary variables. We consider the use of smoothing spline estimation within an additive model framework to flexibly build an imputation model in the case of multiple auxiliary variables. The performance of our method is assessed via numerical experiments involving simulated and real data.