Researcher profile

Hung Hung

Hung Hung contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2022arXiv

Robust self-tuning semiparametric PCA for contaminated elliptical distribution

Principal component analysis (PCA) is one of the most popular dimension reduction methods. The usual PCA is known to be sensitive to the presence of outliers, and thus many robust PCA methods have been developed. Among them, the Tyler's M-estimator is shown to be the most robust scatter estimator under the elliptical distribution. However, when the underlying distribution is contaminated and deviates from ellipticity, Tyler's M-estimator might not work well. In this article, we apply the semiparametric theory to propose a robust semiparametric PCA. The merits of our proposal are twofold. First, it is robust to heavy-tailed elliptical distributions as well as robust to non-elliptical outliers. Second, it pairs well with a data-driven tuning procedure, which is based on active ratio and can adapt to different degrees of data outlyingness. Theoretical properties are derived, including the influence functions for various statistical functionals and asymptotic normality. Simulation studies and a data analysis demonstrate the superiority of our method.

preprint2020arXiv

A generalized information criterion for high-dimensional PCA rank selection

Principal component analysis (PCA) is the most commonly used statistical procedure for dimension reduction. An important issue for applying PCA is to determine the rank, which is the number of dominant eigenvalues of the covariance matrix. The Akaike information criterion (AIC) and Bayesian information criterion (BIC) are among the most widely used rank selection methods. Both use the number of free parameters for assessing model complexity. In this work, we adopt the generalized information criterion (GIC) to propose a new method for PCA rank selection under the high-dimensional framework. The GIC model complexity takes into account the sizes of covariance eigenvalues and can be better adaptive to practical applications. Asymptotic properties of GIC are derived and the selection consistency is established under the generalized spiked covariance model.

preprint2012arXiv

A Two-Stage Dimension Reduction Method for Induced Responses and Its Applications

Researchers in the biological sciences nowadays often encounter the curse of high-dimensionality, which many previously developed statistical models fail to overcome. To tackle this problem, sufficient dimension reduction aims to estimate the central subspace (CS), in which all the necessary information supplied by the covariates regarding the response of interest is contained. Subsequent statistical analysis can then be made in a lower-dimensional space while preserving relevant information. Oftentimes studies are interested in a certain transformation of the response (the induced response), instead of the original one, whose corresponding CS may vary. When estimating the CS of the induced response, existing dimension reduction methods may, however, suffer the problem of inefficiency. In this article, we propose a more efficient two-stage estimation procedure to estimate the CS of an induced response. This approach is further extended to the case of censored responses. An application for combining multiple biomarkers is also illustrated. Simulation studies and two data examples provide further evidence of the usefulness of the proposed method.

preprint2011arXiv

Matrix Variate Logistic Regression Model with Application to EEG Data

Logistic regression has been widely applied in the field of biomedical research for a long time. In some applications, covariates of interest have a natural structure, such as being a matrix, at the time of collection. The rows and columns of the covariate matrix then have certain physical meanings, and they must contain useful information regarding the response. If we simply stack the covariate matrix as a vector and fit the conventional logistic regression model, relevant information can be lost, and the problem of inefficiency will arise. Motivated from these reasons, we propose in this paper the matrix variate logistic (MV-logistic) regression model. Advantages of MV-logistic regression model include the preservation of the inherent matrix structure of covariates and the parsimony of parameters needed. In the EEG Database Data Set, we successfully extract the structural effects of covariate matrix, and a high classification accuracy is achieved.

preprint2011arXiv

Nonparametric Methodology for the Time-Dependent Partial Area under the ROC Curve

To assess the classification accuracy of a continuous diagnostic result, the receiver operating characteristic (ROC) curve is commonly used in applications. The partial area under the ROC curve (pAUC) is one of widely accepted summary measures due to its generality and ease of probability interpretation. In the field of life science, a direct extension of the pAUC into the time-to-event setting can be used to measure the usefulness of a biomarker for disease detection over time. Without using a trapezoidal rule, we propose nonparametric estimators, which are easily computed and have closed-form expressions, for the time-dependent pAUC. The asymptotic Gaussian processes of the estimators are established and the estimated variance-covariance functions are provided, which are essential in the construction of confidence intervals. The finite sample performance of the proposed inference procedures are investigated through a series of simulations. Our method is further applied to evaluate the classification ability of CD4 cell counts on patient's survival time in the AIDS Clinical Trials Group (ACTG) 175 study. In addition, the inferences can be generalized to compare the time-dependent pAUCs between patients received the prior antiretroviral therapy and those without it.

preprint2011arXiv

On Multilinear Principal Component Analysis of Order-Two Tensors

Principal Component Analysis (PCA) is a commonly used tool for dimension reduction in analyzing high dimensional data; Multilinear Principal Component Analysis (MPCA) has the potential to serve the similar function for analyzing tensor structure data. MPCA and other tensor decomposition methods have been proved effective to reduce the dimensions for both real data analyses and simulation studies (Ye, 2005; Lu, Plataniotis and Venetsanopoulos, 2008; Kolda and Bader, 2009; Li, Kim and Altman, 2010). In this paper, we investigate MPCA's statistical properties and provide explanations for its advantages. Conventional PCA, vectorizing the tensor data, may lead to inefficient and unstable prediction due to its extremely large dimensionality. On the other hand, MPCA, trying to preserve the data structure, searches for low-dimensional multilinear projections and decreases the dimensionality efficiently. The asymptotic theories for order-two MPCA, including asymptotic distributions for principal components, associated projections and the explained variance, are developed. Finally, MPCA is shown to improve conventional PCA on analyzing the {\sf Olivetti Faces} data set, by constructing more module oriented basis in reconstructing the test faces.