Researcher profile

Ji Zhu

Ji Zhu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

Locally sparse varying coefficient mixed model with application to longitudinal microbiome differential abundance

Differential abundance (DA) analysis in microbiome studies has recently been used to uncover a plethora of associations between microbial composition and various health conditions. While current approaches to DA typically apply only to cross-sectional data, many studies feature a longitudinal design to better understand the underlying microbial dynamics. To study DA in longitudinal microbial studies, we introduce a novel varying coefficient mixed-effects model with local sparsity. The proposed method can identify time intervals of significant group differences while accounting for temporal dependence. Specifically, we exploit a penalized kernel smoothing approach for parameter estimation and include a random effect to account for serial correlation. In particular, our method operates effectively regardless of whether sampling times are shared across subjects, accommodating irregular sampling and missing observations. Simulation studies demonstrate the necessity of modeling dependence for precise estimation and support recovery. The application of our method to a longitudinal study of mice oral microbiome during cancer development revealed significant scientific insights that were otherwise not discernible through cross-sectional analyses. An R implementation is available at https://github.com/fontaine618/LSVCMM.

preprint2022arXiv

Adaptive Algorithm for Quantum Amplitude Estimation

Quantum amplitude estimation is a key sub-routine of a number of quantum algorithms with various applications. We propose an adaptive algorithm for interval estimation of amplitudes. The quantum part of the algorithm is based only on Grover's algorithm. The key ingredient is the introduction of an adjustment factor, which adjusts the amplitude of good states such that the amplitude after the adjustment, and the original amplitude, can be estimated without ambiguity in the subsequent step. We show with numerical studies that the proposed algorithm uses a similar number of quantum queries to achieve the same level of precision $ε$ compared to state-of-the-art algorithms, but the classical part, i.e., the non-quantum part, has substantially lower computational complexity. We rigorously prove that the number of oracle queries achieves $O(1/ε)$, i.e., a quadratic speedup over classical Monte Carlo sampling, and the computational complexity of the classical part achieves $O(\log(1/ε))$, both up to a double-logarithmic factor.

preprint2020arXiv

Detection of the number of principal components by extended AIC-type method

Estimating the number of principal components is one of the fundamental problems in many scientific fields such as signal processing (or the spiked covariance model). In this paper, we first demonstrate that, for fixed $p$, any penalty term of the form $k'(p-k'/2+1/2)C_n$ may lead to an asymptotically consistent estimator under the condition that $C_n\to\infty$ and $C_n/n\to0$. We also extend our results to the case $n,p\to\infty$, with $p/n\to c>0$. In this case, for $k=o(n^{\frac{1}{3}})$, we first investigate the limiting laws for the leading eigenvalues of the sample covariance matrix $S_n$ under the condition that $λ_k>1+\sqrt{c}$. At low SNR, since the AIC tends to underestimate the number of signals $k$, the AIC should be re-defined in this case. As a natural extension of the AIC for fixed $p$, we propose the extended AIC (EAIC), i.e., the AIC-type method with tuning parameter $γ=φ(c)=1/2+\sqrt{1/c}-\log(1+\sqrt{c})/c$, and demonstrate that the EAIC-type method, i.e., the AIC-type method with tuning parameter $γ>φ(c)$, can select the number of signals $k$ consistently. In the following two cases, (1) $p$ fixed, $n\to\infty$, (2) $n,p\to\infty$ with $p/n\to 0$, if the AIC is defined as the degeneration of the EAIC in the case $n,p\to\infty$ with $p/n\to c>0$, i.e., $γ=\lim_{c\rightarrow 0+0}φ(c)=1$, then we have essentially demonstrated that, to achieve the consistency of the AIC-type method in the above two cases, $γ>1$ is required. Moreover, we show that the EAIC-type method is essentially tuning-free and outperforms the well-known KN estimator proposed in Kritchman and Nadler (2008) and the BCF estimator proposed in Bai, Choi and Fujikoshi (2018). Numerical studies indicate that the proposed method works well.

preprint2020arXiv

High-dimensional Gaussian graphical model for network-linked data

Graphical models are commonly used to represent conditional dependence relationships between variables. There are multiple methods available for exploring them from high-dimensional data, but almost all of them rely on the assumption that the observations are independent and identically distributed. At the same time, observations connected by a network are becoming increasingly common, and tend to violate these assumptions. Here we develop a Gaussian graphical model for observations connected by a network with potentially different mean vectors, varying smoothly over the network. We propose an efficient estimation algorithm and demonstrate its effectiveness on both simulated and real data, obtaining meaningful and interpretable results on a statistics coauthorship network. We also prove that our method estimates both the inverse covariance matrix and the corresponding graph structure correctly under the assumption of network “cohesion”, which refers to the empirically observed phenomenon of network neighbors sharing similar traits.

preprint2020arXiv

Network cross-validation by edge sampling

While many statistical models and methods are now available for network analysis, resampling network data remains a challenging problem. Cross-validation is a useful general tool for model selection and parameter tuning, but is not directly applicable to networks since splitting network nodes into groups requires deleting edges and destroys some of the network structure. Here we propose a new network resampling strategy based on splitting node pairs rather than nodes applicable to cross-validation for a wide range of network model selection tasks. We provide a theoretical justification for our method in a general setting and examples of how our method can be used in specific network model selection and parameter tuning tasks. Numerical results on simulated networks and on a citation network of statisticians show that this cross-validation approach works well for model selection.

preprint2020arXiv

Network-Assisted Estimation for Large-dimensional Factor Model with Guaranteed Convergence Rate Improvement

Network structure is growing popular for capturing the intrinsic relationship between large-scale variables. In the paper we propose to improve the estimation accuracy for large-dimensional factor model when a network structure between individuals is observed. To fully excavate the prior network information, we construct two different penalties to regularize the factor loadings and shrink the idiosyncratic errors. Closed-form solutions are provided for the penalized optimization problems. Theoretical results demonstrate that the modified estimators achieve faster convergence rates and lower asymptotic mean squared errors when the underlying network structure among individuals is correct. An interesting finding is that even if the priori network is totally misleading, the proposed estimators perform no worse than conventional state-of-art methods. Furthermore, to facilitate the practical application, we propose a data-driven approach to select the tuning parameters, which is computationally efficient. We also provide an empirical criterion to determine the number of common factors. Simulation studies and application to the S&P100 weekly return dataset convincingly illustrate the superiority and adaptivity of the new approach.

preprint2019arXiv

Minorization-Maximization-based Steepest Ascent for Large-scale Survival Analysis with Time-Varying Effects: Application to the National Kidney Transplant Dataset

The time-varying effects model is a flexible and powerful tool for modeling the dynamic changes of covariate effects. However, in survival analysis, its computational burden increases quickly as the number of sample sizes or predictors grows. Traditional methods that perform well for moderate sample sizes and low-dimensional data do not scale to massive data. Analysis of national kidney transplant data with a massive sample size and large number of predictors defy any existing statistical methods and software. In view of these difficulties, we propose a Minorization-Maximization-based steepest ascent procedure for estimating the time-varying effects. Leveraging the block structure formed by the basis expansions, the proposed procedure iteratively updates the optimal block-wise direction along which the approximate increase in the log-partial likelihood is maximized. The resulting estimates ensure the ascent property and serve as refinements of the previous step. The performance of the proposed method is examined by simulations and applications to the analysis of national kidney transplant data.