Source author record

Xingdong Feng

Xingdong Feng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Applications Computation Machine Learning math.ST Statistics Theory

Catalog footprint

What is connected

5works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

$\ell_0$-Regularized High-dimensional Accelerated Failure Time Model

We develop a constructive approach for $\ell_0$-penalized estimation in the sparse accelerated failure time (AFT) model with high-dimensional covariates. Our proposed method is based on Stute's weighted least squares criterion combined with $\ell_0$-penalization. This method is a computational algorithm that generates a sequence of solutions iteratively, based on active sets derived from primal and dual information and root finding according to the KKT conditions. We refer to the proposed method as AFT-SDAR (for support detection and root finding). An important aspect of our theoretical results is that we directly concern the sequence of solutions generated based on the AFT-SDAR algorithm. We prove that the estimation errors of the solution sequence decay exponentially to the optimal error bound with high probability, as long as the covariate matrix satisfies a mild regularity condition which is necessary and sufficient for model identification even in the setting of high-dimensional linear regression. We also proposed an adaptive version of AFT-SDAR, or AFT-ASDAR, which determines the support size of the estimated coefficient in a data-driven fashion. We conduct simulation studies to demonstrate the superior performance of the proposed method over the lasso and MCP in terms of accuracy and speed. We also apply the proposed method to a real data set to illustrate its application.

preprint2020arXiv

Metric learning by Similarity Network for Deep Semi-Supervised Learning

Deep semi-supervised learning has been widely implemented in the real-world due to the rapid development of deep learning. Recently, attention has shifted to the approaches such as Mean-Teacher to penalize the inconsistency between two perturbed input sets. Although these methods may achieve positive results, they ignore the relationship information between data instances. To solve this problem, we propose a novel method named Metric Learning by Similarity Network (MLSN), which aims to learn a distance metric adaptively on different domains. By co-training with the classification network, similarity network can learn more information about pairwise relationships and performs better on some empirical tasks than state-of-art methods.

preprint2020arXiv

Parallel subgroup analysis of high-dimensional data via M-regression

It becomes an interesting problem to identify subgroup structures in data analysis as populations are probably heterogeneous in practice. In this paper, we consider M-estimators together with both concave and pairwise fusion penalties, which can deal with high-dimensional data containing some outliers. The penalties are applied both on covariates and treatment effects, where the estimation is expected to achieve both variable selection and data clustering simultaneously. An algorithm is proposed to process relatively large datasets based on parallel computing. We establish the convergence analysis of the proposed algorithm, the oracle property of the penalized M-estimators, and the selection consistency of the proposed criterion. Our numerical study demonstrates that the proposed method is promising to efficiently identify subgroups hidden in high-dimensional data.

preprint2014arXiv

Statistical inference based on robust low-rank data matrix approximation

The singular value decomposition is widely used to approximate data matrices with lower rank matrices. Feng and He [Ann. Appl. Stat. 3 (2009) 1634-1654] developed tests on dimensionality of the mean structure of a data matrix based on the singular value decomposition. However, the first singular values and vectors can be driven by a small number of outlying measurements. In this paper, we consider a robust alternative that moderates the effect of outliers in low-rank approximations. Under the assumption of random row effects, we provide the asymptotic representations of the robust low-rank approximation. These representations may be used in testing the adequacy of a low-rank approximation. We use oligonucleotide gene microarray data to demonstrate how robust singular value decomposition compares with the its traditional counterparts. Examples show that the robust methods often lead to a more meaningful assessment of the dimensionality of gene intensity data matrices.

preprint2010arXiv

Inference on low-rank data matrices with applications to microarray data

Probe-level microarray data are usually stored in matrices, where the row and column correspond to array and probe, respectively. Scientists routinely summarize each array by a single index as the expression level of each probe set (gene). We examine the adequacy of a unidimensional summary for characterizing the data matrix of each probe set. To do so, we propose a low-rank matrix model for the probe-level intensities, and develop a useful framework for testing the adequacy of unidimensionality against targeted alternatives. This is an interesting statistical problem where inference has to be made based on one data matrix whose entries are not i.i.d. We analyze the asymptotic properties of the proposed test statistics, and use Monte Carlo simulations to assess their small sample performance. Applications of the proposed tests to GeneChip data show that evidence against a unidimensional model is often indicative of practically relevant features of a probe set.

Xingdong Feng

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

$\ell_0$-Regularized High-dimensional Accelerated Failure Time Model

Metric learning by Similarity Network for Deep Semi-Supervised Learning

Parallel subgroup analysis of high-dimensional data via M-regression

Statistical inference based on robust low-rank data matrix approximation

Inference on low-rank data matrices with applications to microarray data