Researcher profile

Su-Yun Huang

Su-Yun Huang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

Robust self-tuning semiparametric PCA for contaminated elliptical distribution

Principal component analysis (PCA) is one of the most popular dimension reduction methods. The usual PCA is known to be sensitive to the presence of outliers, and thus many robust PCA methods have been developed. Among them, the Tyler's M-estimator is shown to be the most robust scatter estimator under the elliptical distribution. However, when the underlying distribution is contaminated and deviates from ellipticity, Tyler's M-estimator might not work well. In this article, we apply the semiparametric theory to propose a robust semiparametric PCA. The merits of our proposal are twofold. First, it is robust to heavy-tailed elliptical distributions as well as robust to non-elliptical outliers. Second, it pairs well with a data-driven tuning procedure, which is based on active ratio and can adapt to different degrees of data outlyingness. Theoretical properties are derived, including the influence functions for various statistical functionals and asymptotic normality. Simulation studies and a data analysis demonstrate the superiority of our method.

preprint2021arXiv

Two-stage dimension reduction for noisy high-dimensional images and application to Cryogenic Electron Microscopy

Principal component analysis (PCA) is arguably the most widely used dimension-reduction method for vector-type data. When applied to a sample of images, PCA requires vectorization of the image data, which in turn entails solving an eigenvalue problem for the sample covariance matrix. We propose herein a two-stage dimension reduction (2SDR) method for image reconstruction from high-dimensional noisy image data. The first stage treats the image as a matrix, which is a tensor of order 2, and uses multilinear principal component analysis (MPCA) for matrix rank reduction and image denoising. The second stage vectorizes the reduced-rank matrix and achieves further dimension and noise reduction. Simulation studies demonstrate excellent performance of 2SDR, for which we also develop an asymptotic theory that establishes consistency of its rank selection. Applications to cryo-EM (cryogenic electronic microscopy), which has revolutionized structural biology, organic and medical chemistry, cellular and molecular physiology in the past decade, are also provided and illustrated with benchmark cryo-EM datasets. Connections to other contemporaneous developments in image reconstruction and high-dimensional statistical inference are also discussed.

preprint2020arXiv

A Consistency Theorem for Randomized Singular Value Decomposition

The singular value decomposition (SVD) and the principal component analysis are fundamental tools and probably the most popular methods for data dimension reduction. The rapid growth in the size of data matrices has lead to a need for developing efficient large-scale SVD algorithms. Randomized SVD was proposed, and its potential was demonstrated for computing a low-rank SVD (Rokhlin et al., 2009). In this article, we provide a consistency theorem for the randomized SVD algorithm and a numerical example to show how the random projections to low dimension affect the consistency.

preprint2020arXiv

A generalized information criterion for high-dimensional PCA rank selection

Principal component analysis (PCA) is the most commonly used statistical procedure for dimension reduction. An important issue for applying PCA is to determine the rank, which is the number of dominant eigenvalues of the covariance matrix. The Akaike information criterion (AIC) and Bayesian information criterion (BIC) are among the most widely used rank selection methods. Both use the number of free parameters for assessing model complexity. In this work, we adopt the generalized information criterion (GIC) to propose a new method for PCA rank selection under the high-dimensional framework. The GIC model complexity takes into account the sizes of covariance eigenvalues and can be better adaptive to practical applications. Asymptotic properties of GIC are derived and the selection consistency is established under the generalized spiked covariance model.

preprint2011arXiv

On Multilinear Principal Component Analysis of Order-Two Tensors

Principal Component Analysis (PCA) is a commonly used tool for dimension reduction in analyzing high dimensional data; Multilinear Principal Component Analysis (MPCA) has the potential to serve the similar function for analyzing tensor structure data. MPCA and other tensor decomposition methods have been proved effective to reduce the dimensions for both real data analyses and simulation studies (Ye, 2005; Lu, Plataniotis and Venetsanopoulos, 2008; Kolda and Bader, 2009; Li, Kim and Altman, 2010). In this paper, we investigate MPCA's statistical properties and provide explanations for its advantages. Conventional PCA, vectorizing the tensor data, may lead to inefficient and unstable prediction due to its extremely large dimensionality. On the other hand, MPCA, trying to preserve the data structure, searches for low-dimensional multilinear projections and decreases the dimensionality efficiently. The asymptotic theories for order-two MPCA, including asymptotic distributions for principal components, associated projections and the explained variance, are developed. Finally, MPCA is shown to improve conventional PCA on analyzing the {\sf Olivetti Faces} data set, by constructing more module oriented basis in reconstructing the test faces.