Researcher profile

Fangzheng Xie

Fangzheng Xie contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2022arXiv

An approximate Bayes factor based high dimensional MANOVA using Random Projections

High-dimensional mean vector testing problem for two or more groups remain a very active research area. In these setting, traditional tests are not applicable because they involve the inversion of rank deficient group covariance matrix. In current approaches, this problem is addressed by simply looking at a test assuming a sparse or diagonal covariance matrix potentially ignoring complex dependency between features. In this paper, we develop a Bayes factor (BF) based testing procedure for comparing two or more population means in (very) high dimensional settings. Two versions of the Bayes factor based test statistics are considered which are based on a Random projection (RP) approach. RPs are appealing since they make not assumption about the form of the dependency across features in the data. The final test statistic is based on an ensemble of Bayes factors corresponding to multiple replications of randomly projected data. Both proposed test statistics are compared through a battery of simulation settings. Finally they are applied to the analysis of a publicly available genomic single cell RNA-seq (scRNA-seq) dataset.

preprint2022arXiv

Eigenvector-Assisted Statistical Inference for Signal-Plus-Noise Matrix Models

In this paper, we develop a generalized Bayesian inference framework for a collection of signal-plus-noise matrix models arising in high-dimensional statistics and many applications. The framework is built upon an asymptotically unbiased estimating equation with the assistance of the leading eigenvectors of the data matrix. The solution to the estimating equation coincides with the maximizer of an appropriate statistical criterion function. The generalized posterior distribution is constructed by replacing the usual log-likelihood function in the Bayes formula with the criterion function. The proposed framework does not require the complete specification of the sampling distribution and is convenient for uncertainty quantification via a Markov Chain Monte Carlo sampler, circumventing the inconvenience of resampling the data matrix. Under mild regularity conditions, we establish the large sample properties of the estimating equation estimator and the generalized posterior distributions. In particular, the generalized posterior credible sets have the correct frequentist nominal coverage probability provided that the so-called generalized information equality holds. The validity and usefulness of the proposed framework are demonstrated through the analysis of synthetic datasets and the real-world ENZYMES network datasets.

preprint2022arXiv

Entrywise limit theorems of eigenvectors for signal-plus-noise matrix models with weak signals

We establish a finite-sample Berry-Esseen theorem for the entrywise limits of the eigenvectors for a broad collection of signal-plus-noise random matrix models under challenging weak signal regimes. The signal strength is characterized by a scaling factor $ρ_n$ through $nρ_n$, where $n$ is the dimension of the random matrix, and we allow $nρ_n$ to grow at the rate of $\log n$. The key technical contribution is a sharp finite-sample entrywise eigenvector perturbation bound. The existing error bounds on the two-to-infinity norms of the higher-order remainders are not sufficient when $nρ_n$ is proportional to $\log n$. We apply the general entrywise eigenvector analysis results to the symmetric noisy matrix completion problem, random dot product graphs, and two subsequent inference tasks for random graphs: the estimation of pure nodes in mixed membership stochastic block models and the hypothesis testing of the equality of latent positions in random graphs.

preprint2020arXiv

A theoretical framework of the scaled Gaussian stochastic process in prediction and calibration

Model calibration or data inversion is one of fundamental tasks in uncertainty quantification. In this work, we study the theoretical properties of the scaled Gaussian stochastic process (S-GaSP), to model the discrepancy between reality and imperfect mathematical models. We establish the explicit connection between Gaussian stochastic process (GaSP) and S-GaSP through the orthogonal series representation. The predictive mean estimator in the S-GaSP calibration model converges to the reality at the same rate as the GaSP with a suitable choice of the regularization and scaling parameters. We also show the calibrated mathematical model in the S-GaSP calibration converges to the one that minimizes the $L_2$ loss between the reality and mathematical model, whereas the GaSP model with other widely used covariance functions does not have this property. Numerical examples confirm the excellent finite sample performance of our approaches compared to a few recent approaches.