Source author record

Radu V. Craiu

Radu V. Craiu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Computation Applications math.PR math.ST Statistics Theory

Catalog footprint

What is connected

13works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Perfecting MCMC Sampling: Recipes and Reservations

This review paper is intended for the Handbook of Markov chain Monte Carlo's second edition. The authors will be grateful for any suggestions that could perfect it.

preprint2022arXiv

Exploring dimension learning via a penalized probabilistic principal component analysis

Establishing a low-dimensional representation of the data leads to efficient data learning strategies. In many cases, the reduced dimension needs to be explicitly stated and estimated from the data. We explore the estimation of dimension in finite samples as a constrained optimization problem, where the estimated dimension is a maximizer of a penalized profile likelihood criterion within the framework of a probabilistic principal components analysis. Unlike other penalized maximization problems that require an "optimal" penalty tuning parameter, we propose a data-averaging procedure whereby the estimated dimension emerges as the most favourable choice over a range of plausible penalty parameters. The proposed heuristic is compared to a large number of alternative criteria in simulations and an application to gene expression data. Extensive simulation studies reveal that none of the methods uniformly dominate the other and highlight the importance of subject-specific knowledge in choosing statistical methods for dimension learning. Our application results also suggest that gene expression data have a higher intrinsic dimension than previously thought. Overall, our proposed heuristic strikes a good balance and is the method of choice when model assumptions deviated moderately.

preprint2022arXiv

Measuring the severity of multi-collinearity in high dimensions

Multi-collinearity is a wide-spread phenomenon in modern statistical applications and when ignored, can negatively impact model selection and statistical inference. Classic tools and measures that were developed for "$n>p$" data are not applicable nor interpretable in the high-dimensional regime. Here we propose 1) new individualized measures that can be used to visualize patterns of multi-collinearity, and subsequently 2) global measures to assess the overall burden of multi-collinearity without limiting the observed data dimensions. We applied these measures to genomic applications to investigate patterns of multi-collinearity in genetic variations across individuals with diverse ancestral backgrounds. The measures were able to visually distinguish genomic regions of excessive multi-collinearity and contrast the level of multi-collinearity between different continental populations.

preprint2017arXiv

Nonparametric imputation method for nonresponse in surveys

Many imputation methods are based on statistical models that assume that the variable of interest is a noisy observation of a function of the auxiliary variables or covariates. Misspecification of this model may lead to severe errors in estimates and to misleading conclusions. A new imputation method for item nonresponse in surveys is proposed based on a nonparametric estimation of the functional dependence between the variable of interest and the auxiliary variables. We consider the use of smoothing spline estimation within an additive model framework to flexibly build an imputation model in the case of multiple auxiliary variables. The performance of our method is assessed via numerical experiments involving simulated and real data.

preprint2015arXiv

Embarrassingly Parallel Sequential Markov-chain Monte Carlo for Large Sets of Time Series

Bayesian computation crucially relies on Markov chain Monte Carlo (MCMC) algorithms. In the case of massive data sets, running the Metropolis-Hastings sampler to draw from the posterior distribution becomes prohibitive due to the large number of likelihood terms that need to be calculated at each iteration. In order to perform Bayesian inference for a large set of time series, we consider an algorithm that combines 'divide and conquer" ideas previously used to design MCMC algorithms for big data with a sequential MCMC strategy. The performance of the method is illustrated using a large set of financial data.

preprint2015arXiv

Stability of adversarial Markov chains, with an application to adaptive MCMC algorithms

We consider whether ergodic Markov chains with bounded step size remain bounded in probability when their transitions are modified by an adversary on a bounded subset. We provide counterexamples to show that the answer is no in general, and prove theorems to show that the answer is yes under various additional assumptions. We then use our results to prove convergence of various adaptive Markov chain Monte Carlo algorithms.

preprint2014arXiv

Additive Models for Conditional Copulas

Conditional copulas are flexible statistical tools that couple joint conditional and marginal conditional distributions. In a linear regression setting with more than one covariate and two dependent outcomes, we propose the use of additive models for conditional bivariate copula models and discuss computation and model selection tools for performing Bayesian inference. The method is illustrated using simulations and a real example.

preprint2012arXiv

Bayesian Latent Variable Modeling of Longitudinal Family Data for Genetic Pleiotropy Studies

Motivated by genetic association studies of pleiotropy, we propose here a Bayesian latent variable approach to jointly study multiple outcomes or phenotypes. The proposed method models both continuous and binary phenotypes, and it accounts for serial and familial correlations when longitudinal and pedigree data have been collected. We present a Bayesian estimation method for the model parameters, and we develop a novel MCMC algorithm that builds upon hierarchical centering and parameter expansion techniques to efficiently sample the posterior distribution. We discuss phenotype and model selection in the Bayesian setting, and we study the performance of two selection strategies based on Bayes factors and spike-and-slab priors. We evaluate the proposed method via extensive simulations and demonstrate its utility with an application to a genome-wide association study of various complication phenotypes related to type 1 diabetes.

preprint2012arXiv

Statistical Testing for Conditional Copulas

In conditional copula models, the copula parameter is deterministically linked to a covariate via the calibration function. The latter is of central interest for inference and is usually estimated nonparametrically. However, when a parametric model for the calibration function is appropriate, the resulting estimator exhibits significant gains in statistical efficiency and requires smaller computational costs. We develop methodology for testing a parametric formulation of the calibration function against a general alternative and propose a generalized likelihood ratio-type test that enables conditional copula model diagnostics. We derive the asymptotic null distribution of the proposed test and study its finite sample performance using simulations. The method is applied to two data examples.

preprint2011arXiv

Bayesian methods to overcome the winner's curse in genetic studies

Parameter estimates for associated genetic variants, report ed in the initial discovery samples, are often grossly inflated compared to the values observed in the follow-up replication samples. This type of bias is a consequence of the sequential procedure in which the estimated effect of an associated genetic marker must first pass a stringent significance threshold. We propose a hierarchical Bayes method in which a spike-and-slab prior is used to account for the possibility that the significant test result may be due to chance. We examine the robustness of the method using different priors corresponding to different degrees of confidence in the testing results and propose a Bayesian model averaging procedure to combine estimates produced by different models. The Bayesian estimators yield smaller variance compared to the conditional likelihood estimator and outperform the latter in studies with low power. We investigate the performance of the method with simulations and applications to four real data examples.

preprint2010arXiv

Interacting Multiple Try Algorithms with Different Proposal Distributions

We propose a new class of interacting Markov chain Monte Carlo (MCMC) algorithms designed for increasing the efficiency of a modified multiple-try Metropolis (MTM) algorithm. The extension with respect to the existing MCMC literature is twofold. The sampler proposed extends the basic MTM algorithm by allowing different proposal distributions in the multiple-try generation step. We exploit the structure of the MTM algorithm with different proposal distributions to naturally introduce an interacting MTM mechanism (IMTM) that expands the class of population Monte Carlo methods. We show the validity of the algorithm and discuss the choice of the selection weights and of the different proposals. We provide numerical studies which show that the new algorithm can perform better than the basic MTM algorithm and that the interaction mechanism allows the IMTM to efficiently explore the state space.

preprint2009arXiv

A Mixture-Based Approach to Regional Adaptation for MCMC

Recent advances in adaptive Markov chain Monte Carlo (AMCMC) include the need for regional adaptation in situations when the optimal transition kernel is different across different regions of the sample space. Motivated by these findings, we propose a mixture-based approach to determine the partition needed for regional AMCMC. The mixture model is fitted using an online EM algorithm (see Andrieu and Moulines, 2006) which allows us to bypass simultaneously the heavy computational load and to implement the regional adaptive algorithm with online recursion (RAPTOR). The method is tried on simulated as well as real data examples.

preprint2009arXiv

Nonparametric Covariate Adjustment for Receiver Operating Characteristic Curves

The accuracy of a diagnostic test is typically characterised using the receiver operating characteristic (ROC) curve. Summarising indexes such as the area under the ROC curve (AUC) are used to compare different tests as well as to measure the difference between two populations. Often additional information is available on some of the covariates which are known to influence the accuracy of such measures. We propose nonparametric methods for covariate adjustment of the AUC. Models with normal errors and non-normal errors are discussed and analysed separately. Nonparametric regression is used for estimating mean and variance functions in both scenarios. In the general noise case we propose a covariate-adjusted Mann-Whitney estimator for AUC estimation which effectively uses available data to construct working samples at any covariate value of interest and is computationally efficient for implementation. This provides a generalisation of the Mann-Whitney approach for comparing two populations by taking covariate effects into account. We derive asymptotic properties for the AUC estimators in both settings, including asymptotic normality, optimal strong uniform convergence rates and MSE consistency. The usefulness of the proposed methods is demonstrated through simulated and real data examples.

Radu V. Craiu

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

Perfecting MCMC Sampling: Recipes and Reservations

Exploring dimension learning via a penalized probabilistic principal component analysis

Measuring the severity of multi-collinearity in high dimensions

Nonparametric imputation method for nonresponse in surveys

Embarrassingly Parallel Sequential Markov-chain Monte Carlo for Large Sets of Time Series

Stability of adversarial Markov chains, with an application to adaptive MCMC algorithms

Additive Models for Conditional Copulas

Bayesian Latent Variable Modeling of Longitudinal Family Data for Genetic Pleiotropy Studies

Statistical Testing for Conditional Copulas

Bayesian methods to overcome the winner's curse in genetic studies

Interacting Multiple Try Algorithms with Different Proposal Distributions

A Mixture-Based Approach to Regional Adaptation for MCMC

Nonparametric Covariate Adjustment for Receiver Operating Characteristic Curves