Researcher profile

Xianyang Zhang

Xianyang Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

A Modern Theory for High-dimensional Cox Regression Models

The proportional hazards model has been extensively used in many fields such as biomedicine to estimate and perform statistical significance testing on the effects of covariates influencing the survival time of patients. The classical theory of maximum partial-likelihood estimation (MPLE) is used by most software packages to produce inference, e.g., the coxph function in R and the PHREG procedure in SAS. In this paper, we investigate the asymptotic behavior of the MPLE in the regime in which the number of parameters p is of the same order as the number of samples n. The main results are (i) existence of the MPLE undergoes a sharp 'phase transition'; (ii) the classical MPLE theory leads to invalid inference in the high-dimensional regime. We show that the asymptotic behavior of the MPLE is governed by a new asymptotic theory. These findings are further corroborated through numerical studies. The main technical tool in our proofs is the Convex Gaussian Min-max Theorem (CGMT), which has not been previously used in the analysis of partial likelihood. Our results thus extend the scope of CGMT and shed new light on the use of CGMT for examining the existence of MPLE and non-separable objective functions.

preprint2026arXiv

Fair Regression under Demographic Parity: A Unified Framework

We propose a unified framework for fair regression tasks formulated as risk minimization problems subject to a demographic parity constraint. Unlike many existing approaches that are limited to specific loss functions or rely on challenging non-convex optimization, our framework is applicable to a broad spectrum of regression tasks. Examples include linear regression with squared loss, binary classification with cross-entropy loss, quantile regression with pinball loss, and robust regression with Huber loss. We derive a novel characterization of the fair risk minimizer, which yields a computationally efficient estimation procedure for general loss functions. Theoretically, we establish the asymptotic consistency of the proposed estimator and derive its convergence rates under mild assumptions. We illustrate the method's versatility through detailed discussions of several common loss functions. Numerical results demonstrate that our approach effectively minimizes risk while satisfying fairness constraints across various regression settings.

preprint2023arXiv

High Dimensional Analysis of Variance in Multivariate Linear Regression

In this paper, we develop a systematic theory for high dimensional analysis of variance in multivariate linear regression, where the dimension and the number of coefficients can both grow with the sample size. We propose a new \emph{U}~type test statistic to test linear hypotheses and establish a high dimensional Gaussian approximation result under fairly mild moment assumptions. Our general framework and theory can be applied to deal with the classical one-way multivariate ANOVA and the nonparametric one-way MANOVA in high dimensions. To implement the test procedure in practice, we introduce a sample-splitting based estimator of the second moment of the error covariance and discuss its properties. A simulation study shows that our proposed test outperforms some existing tests in various settings.

preprint2022arXiv

LinDA: linear models for differential abundance analysis of microbiome compositional data

Differential abundance analysis is at the core of statistical analysis of microbiome data. The compositional nature of microbiome sequencing data makes false positive control challenging. Here, we show that the compositional effects can be addressed by a simple, yet highly flexible and scalable, approach. The proposed method, LinDA, only requires fitting linear regression models on the centered log-ratio transformed data, and correcting the bias due to compositional effects. We show that LinDA enjoys asymptotic FDR control and can be extended to mixed-effect models for correlated microbiome data. Using simulations and real examples, we demonstrate the effectiveness of LinDA.

preprint2021arXiv

Kernel-Distance-Based Covariate Balancing

A common concern in observational studies focuses on properly evaluating the causal effect, which usually refers to the average treatment effect or the average treatment effect on the treated. In this paper, we propose a data preprocessing method, the Kernel-distance-based covariate balancing, for observational studies with binary treatments. This proposed method yields a set of unit weights for the treatment and control groups, respectively, such that the reweighted covariate distributions can satisfy a set of pre-specified balance conditions. This preprocessing methodology can effectively reduce confounding bias of subsequent estimation of causal effects. We demonstrate the implementation and performance of Kernel-distance-based covariate balancing with Monte Carlo simulation experiments and a real data analysis.

preprint2020arXiv

Covariate Adaptive False Discovery Rate Control with Applications to Omics-Wide Multiple Testing

Conventional multiple testing procedures often assume hypotheses for different features are exchangeable. However, in many scientific applications, additional covariate information regarding the patterns of signals and nulls are available. In this paper, we introduce an FDR control procedure in large-scale inference problem that can incorporate covariate information. We develop a fast algorithm to implement the proposed procedure and prove its asymptotic validity even when the underlying model is misspecified and the p-values are weakly dependent (e.g., strong mixing). Extensive simulations are conducted to study the finite sample performance of the proposed method and we demonstrate that the new approach improves over the state-of-the-art approaches by being flexible, robust, powerful and computationally efficient. We finally apply the method to several omics datasets arising from genomics studies with the aim to identify omics features associated with some clinical and biological phenotypes. We show that the method is overall the most powerful among competing methods, especially when the signal is sparse. The proposed Covariate Adaptive Multiple Testing procedure is implemented in the R package CAMT.

preprint2013arXiv

Fixed-smoothing asymptotics for time series

In this paper, we derive higher order Edgeworth expansions for the finite sample distributions of the subsampling-based t-statistic and the Wald statistic in the Gaussian location model under the so-called fixed-smoothing paradigm. In particular, we show that the error of asymptotic approximation is at the order of the reciprocal of the sample size and obtain explicit forms for the leading error terms in the expansions. The results are used to justify the second-order correctness of a new bootstrap method, the Gaussian dependent bootstrap, in the context of Gaussian location model.