Source author record

Longhai Li

Longhai Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Computation

Catalog footprint

What is connected

3works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Model diagnostics for censored regression via randomized survival probabilities

Residuals in normal regression are used to assess a model's goodness-of-fit (GOF) and discover directions for improving the model. However, there is a lack of residuals with a characterized reference distribution for censored regression. In this paper, we propose to diagnose censored regression with normalized randomized survival probabilities (RSP). The key idea of RSP is to replace the survival probability of a censored failure time with a uniform random number between 0 and the survival probability of the censored time. We prove that RSPs always have the uniform distribution on $(0,1)$ under the true model with the true generating parameters. Therefore, we can transform RSPs into normally-distributed residuals with the normal quantile function. We call such residuals by normalized RSP (NRSP residuals). We conduct simulation studies to investigate the sizes and powers of statistical tests based on NRSP residuals in detecting the incorrect choice of distribution family and non-linear effect in covariates. Our simulation studies show that, although the GOF tests with NRSP residuals are not as powerful as a traditional GOF test method, a non-linear test based on NRSP residuals has significantly higher power in detecting non-linearity. We also compared these model diagnostics methods with a breast-cancer recurrent-free time dataset. The results show that the NRSP residual diagnostics successfully captures a subtle non-linear relationship in the dataset, which is not detected by the graphical diagnostics with CS residuals and existing GOF tests.

preprint2019arXiv

Randomized Predictive P-values: A Versatile Model Diagnostic Tool with Unified Reference Distribution

Examining residuals such as Pearson and deviance residuals, is a standard tool for assessing normal regression. However, for discrete response, these residuals cluster on lines corresponding to distinct response values. Their distributions are far from normality; graphical and quantitative inspection of these residuals provides little information for model diagnosis. Marshall and Spiegelhalter (2003) defined a cross-validatory predictive p-value for identifying outliers. Predictive p-values are uniformly distributed for continuous response but not for discrete response. We propose to use randomized predictive p-values (RPP) for diagnosing models with discrete responses. RPPs can be transformed to "residuals" with normal distribution, called NRPPs by us. NRPPs can be used to diagnose all regression models with scalar response using the same way for diagnosing normal regression. The NRPPs are nearly the same as the randomized quantile residuals (RQR), which are previously proposed by Dunn and Smyth (1996) but remain little known by statisticians. This paper provides an exposition of RQR using the RPP perspective. The contributions of this exposition include: (1) we give a rigorous proof of uniformity of RPP and illustrative examples to explain the uniformity under the true model; (2) we conduct extensive simulation studies to demonstrate the normality of NRPPs under the true model; (3) our simulation studies also show that the NRPP method is a versatile diagnostic tool for detecting many kinds of model inadequacies due to lack of complexity. The effectiveness of NRPP is further demonstrated with a health utilization dataset.

preprint2017arXiv

Fully Bayesian Classification with Heavy-tailed Priors for Selection in High-dimensional Features with Grouping Structure

Feature selection is demanded in many modern scientific research problems that use high-dimensional data. A typical example is to find the most useful genes that are related to a certain disease (eg, cancer) from high-dimensional gene expressions. The expressions of genes have grouping structures, for example, a group of co-regulated genes that have similar biological functions tend to have similar expressions. Many statistical methods have been proposed to take the grouping structure into consideration in feature selection, including group LASSO, supervised group LASSO, and regression on group representatives. In this paper, we propose a fully Bayesian Robit regression method with heavy-tailed (sparsity) priors (shortened by FBRHT) for selecting features with grouping structure. The main features of FBRHT include that it discards more aggressively unrelated features than LASSO, and it can make feature selection within groups automatically without a pre-specified grouping structure. In this paper, we use simulated and real datasets to demonstrate that the predictive power of the sparse feature subsets selected by FBRHT are comparable with other much larger feature subsets selected by LASSO, group LASSO, supervised group LASSO, penalized logistic regression and random forest, and that the succinct feature subsets selected by FBRHT have significantly better predictive power than the feature subsets of the same size taken from the top features selected by the aforementioned methods.