Researcher profile

Yixin Fang

Yixin Fang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

A Targeted Learning Framework for Estimating Restricted Mean Survival Time Difference using Pseudo-observations

A targeted learning (TL) framework is developed to estimate the difference in the restricted mean survival time (RMST) for a clinical trial with time-to-event outcomes. The approach starts by defining the target estimand as the RMST difference between investigational and control treatments. Next, an efficient estimation method is introduced: a targeted minimum loss estimator (TMLE) utilizing pseudo-observations. Moreover, a version of the copy reference (CR) approach is developed to perform a sensitivity analysis for right-censoring. The proposed TL framework is demonstrated using a real data application.

preprint2014arXiv

A model-free estimation for the covariate-adjusted Youden index and its associated cut-point

In medical research, continuous markers are widely employed in diagnostic tests to distinguish diseased and non-diseased subjects. The accuracy of such diagnostic tests is commonly assessed using the receiver operating characteristic (ROC) curve. To summarize an ROC curve and determine its optimal cut-point, the Youden index is popularly used. In literature, estimation of the Youden index has been widely studied via various statistical modeling strategies on the conditional density. This paper proposes a new model-free estimation method, which directly estimates the covariate-adjusted cut-point without estimating the conditional density. Consequently, covariate-adjusted Youden index can be estimated based on the estimated cutpoint. The proposed method formulates the estimation problem in a large margin classification framework, which allows flexible modeling of the covariate-adjusted Youden index through kernel machines. The advantage of the proposed method is demonstrated in a variety of simulated experiments as well as a real application to Pima Indians diabetes study.

preprint2013arXiv

A note on selection stability: combining stability and prediction

Recently, many regularized procedures have been proposed for variable selection in linear regression, but their performance depends on the tuning parameter selection. Here a criterion for the tuning parameter selection is proposed, which combines the strength of both stability selection and cross-validation and therefore is referred as the prediction and stability selection (PASS). The selection consistency is established assuming the data generating model is a subset of the full model, and the small sample performance is demonstrated through some simulation studies where the assumption is either held or violated.

preprint2013arXiv

Consistent selection of tuning parameters via variable selection stability

Penalized regression models are popularly used in high-dimensional data analysis to conduct variable selection and model fitting simultaneously. Whereas success has been widely reported in literature, their performances largely depend on the tuning parameters that balance the trade-off between model fitting and model sparsity. Existing tuning criteria mainly follow the route of minimizing the estimated prediction error or maximizing the posterior model probability, such as cross-validation, AIC and BIC. This article introduces a general tuning parameter selection criterion based on a novel concept of variable selection stability. The key idea is to select the tuning parameters so that the resultant penalized regression model is stable in variable selection. The asymptotic selection consistency is established for both fixed and diverging dimensions. The effectiveness of the proposed criterion is also demonstrated in a variety of simulated examples as well as an application to the prostate cancer data.

preprint2013arXiv

Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices

Recently many regularized estimators of large covariance matrices have been proposed, and the tuning parameters in these estimators are usually selected via cross-validation. However, there is no guideline on the number of folds for conducting cross-validation and there is no comparison between cross-validation and the methods based on bootstrap. Through extensive simulations, we suggest 10-fold cross-validation (nine-tenths for training and one-tenth for validation) be appropriate when the estimation accuracy is measured in the Frobenius norm, while 2-fold cross-validation (half for training and half for validation) or reverse 3-fold cross-validation (one-third for training and two-thirds for validation) be appropriate in the operator norm. We also suggest the "optimal" cross-validation be more appropriate than the methods based on bootstrap for both types of norm.

preprint2012arXiv

A divergence formula for regularization methods with an L2 constraint

We derive a divergence formula for a group of regularization methods with an L2 constraint. The formula is useful for regularization parameter selection, because it provides an unbiased estimate for the number of degrees of freedom. We begin with deriving the formula for smoothing splines and then extend it to other settings such as penalized splines, ridge regression, and functional linear regression.