Researcher profile

Jingfei Zhang

Jingfei Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2022arXiv

Bias-correction and Test for Mark-point Dependence with Replicated Marked Point Processes

Mark-point dependence plays a critical role in research problems that can be fitted into the general framework of marked point processes. In this work, we focus on adjusting for mark-point dependence when estimating the mean and covariance functions of the mark process, given independent replicates of the marked point process. We assume that the mark process is a Gaussian process and the point process is a log-Gaussian Cox process, where the mark-point dependence is generated through the dependence between two latent Gaussian processes. Under this framework, naive local linear estimators ignoring the mark-point dependence can be severely biased. We show that this bias can be corrected using a local linear estimator of the cross-covariance function and establish uniform convergence rates of the bias-corrected estimators. Furthermore, we propose a test statistic based on local linear estimators for mark-point independence, which is shown to converge to an asymptotic normal distribution in a parametric $\sqrt{n}$-convergence rate. Model diagnostics tools are developed for key model assumptions and a robust functional permutation test is proposed for a more general class of mark-point processes. The effectiveness of the proposed methods is demonstrated using extensive simulations and applications to two real data examples.

preprint2022arXiv

High Dimensional Gaussian Graphical Regression Models with Covariates

Though Gaussian graphical models have been widely used in many scientific fields, relatively limited progress has been made to link graph structures to external covariates. We propose a Gaussian graphical regression model, which regresses both the mean and the precision matrix of a Gaussian graphical model on covariates. In the context of co-expression quantitative trait locus (QTL) studies, our method can determine how genetic variants and clinical conditions modulate the subject-level network structures, and recover both the population-level and subject-level gene networks. Our framework encourages sparsity of covariate effects on both the mean and the precision matrix. In particular for the precision matrix, we stipulate simultaneous sparsity, i.e., group sparsity and element-wise sparsity, on effective covariates and their effects on network edges, respectively. We establish variable selection consistency first under the case with known mean parameters and then a more challenging case with unknown means depending on external covariates, and establish in both cases the $\ell_2$ convergence rates and the selection consistency of the estimated precision parameters. The utility and efficacy of our proposed method is demonstrated through simulation studies and an application to a co-expression QTL study with brain cancer patients.

preprint2022arXiv

Multi-task Learning for Gaussian Graphical Regressions with High Dimensional Covariates

Gaussian graphical regression is a powerful means that regresses the precision matrix of a Gaussian graphical model on covariates, permitting the numbers of the response variables and covariates to far exceed the sample size. Model fitting is typically carried out via separate node-wise lasso regressions, ignoring the network-induced structure among these regressions. Consequently, the error rate is high, especially when the number of nodes is large. We propose a multi-task learning estimator for fitting Gaussian graphical regression models; we design a cross-task group sparsity penalty and a within task element-wise sparsity penalty, which govern the sparsity of active covariates and their effects on the graph, respectively. For computation, we consider an efficient augmented Lagrangian algorithm, which solves subproblems with a semi-smooth Newton method. For theory, we show that the error rate of the multi-task learning based estimates has much improvement over that of the separate node-wise lasso estimates, because the cross-task penalty borrows information across tasks. To address the main challenge that the tasks are entangled in a complicated correlation structure, we establish a new tail probability bound for correlated heavy-tailed (sub-exponential) variables with an arbitrary correlation structure, a useful theoretical result in its own right. Finally, the utility of our method is demonstrated through simulations as well as an application to a gene co-expression network study with brain cancer patients.

preprint2022arXiv

Statistical Inference of Cell-type Proportions Estimated from Bulk Expression Data

There is a growing interest in cell-type-specific analysis from bulk samples with a mixture of different cell types. A critical first step in such analyses is the accurate estimation of cell-type proportions in a bulk sample. Although many methods have been proposed recently, quantifying the uncertainties associated with the estimated cell-type proportions has not been well studied. Lack of consideration of these uncertainties can lead to missed or false findings in downstream analyses. In this article, we introduce a flexible statistical deconvolution framework that allows a general and subject-specific covariance of bulk gene expressions. Under this framework, we propose a decorrelated constrained least squares method called DECALS that estimates cell-type proportions as well as the sampling distribution of the estimates. Simulation studies demonstrate that DECALS can accurately quantify the uncertainties in the estimated proportions whereas other methods fail. Applying DECALS to analyze bulk gene expression data of post mortem brain samples from the ROSMAP and GTEx projects, we show that taking into account the uncertainties in the estimated cell-type proportions can lead to more accurate identifications of cell-type-specific differentially expressed genes and transcripts between different subject groups, such as between Alzheimer's disease patients and controls and between males and females.

preprint2021arXiv

Latent Network Structure Learning from High Dimensional Multivariate Point Processes

Learning the latent network structure from large scale multivariate point process data is an important task in a wide range of scientific and business applications. For instance, we might wish to estimate the neuronal functional connectivity network based on spiking times recorded from a collection of neurons. To characterize the complex processes underlying the observed data, we propose a new and flexible class of nonstationary Hawkes processes that allow both excitatory and inhibitory effects. We estimate the latent network structure using an efficient sparse least squares estimation approach. Using a thinning representation, we establish concentration inequalities for the first and second order statistics of the proposed Hawkes process. Such theoretical results enable us to establish the non-asymptotic error bound and the selection consistency of the estimated parameters. Furthermore, we describe a least squares loss based statistic for testing if the background intensity is constant in time. We demonstrate the efficacy of our proposed method through simulation studies and an application to a neuron spike train data set.

preprint2021arXiv

Learning Human Activity Patterns using Clustered Point Processes with Active and Inactive States

Modeling event patterns is a central task in a wide range of disciplines. In applications such as studying human activity patterns, events often arrive clustered with sporadic and long periods of inactivity. Such heterogeneity in event patterns poses challenges for existing point process models. In this article, we propose a new class of clustered point processes that alternate between active and inactive states. The proposed model is flexible, highly interpretable, and can provide useful insights into event patterns. A composite likelihood approach and a composite EM estimation procedure are developed for efficient and numerically stable parameter estimation. We study both the computational and statistical properties of the estimator including convergence, consistency, and asymptotic normality. The proposed method is applied to Donald Trump's Twitter data to investigate if and how his behaviors evolved before, during, and after the presidential campaign. Additionally, we analyze large-scale social media data from Sina Weibo and identify interesting groups of users with distinct behaviors.

preprint2021arXiv

Sparse Tensor Additive Regression

Tensors are becoming prevalent in modern applications such as medical imaging and digital marketing. In this paper, we propose a sparse tensor additive regression (STAR) that models a scalar response as a flexible nonparametric function of tensor covariates. The proposed model effectively exploits the sparse and low-rank structures in the tensor additive regression. We formulate the parameter estimation as a non-convex optimization problem, and propose an efficient penalized alternating minimization algorithm. We establish a non-asymptotic error bound for the estimator obtained from each iteration of the proposed algorithm, which reveals an interplay between the optimization error and the statistical rate of convergence. We demonstrate the efficacy of STAR through extensive comparative simulation studies, and an application to the click-through-rate prediction in online advertising.

preprint2020arXiv

Detection of the number of principal components by extended AIC-type method

Estimating the number of principal components is one of the fundamental problems in many scientific fields such as signal processing (or the spiked covariance model). In this paper, we first demonstrate that, for fixed $p$, any penalty term of the form $k'(p-k'/2+1/2)C_n$ may lead to an asymptotically consistent estimator under the condition that $C_n\to\infty$ and $C_n/n\to0$. We also extend our results to the case $n,p\to\infty$, with $p/n\to c>0$. In this case, for $k=o(n^{\frac{1}{3}})$, we first investigate the limiting laws for the leading eigenvalues of the sample covariance matrix $S_n$ under the condition that $λ_k>1+\sqrt{c}$. At low SNR, since the AIC tends to underestimate the number of signals $k$, the AIC should be re-defined in this case. As a natural extension of the AIC for fixed $p$, we propose the extended AIC (EAIC), i.e., the AIC-type method with tuning parameter $γ=φ(c)=1/2+\sqrt{1/c}-\log(1+\sqrt{c})/c$, and demonstrate that the EAIC-type method, i.e., the AIC-type method with tuning parameter $γ>φ(c)$, can select the number of signals $k$ consistently. In the following two cases, (1) $p$ fixed, $n\to\infty$, (2) $n,p\to\infty$ with $p/n\to 0$, if the AIC is defined as the degeneration of the EAIC in the case $n,p\to\infty$ with $p/n\to c>0$, i.e., $γ=\lim_{c\rightarrow 0+0}φ(c)=1$, then we have essentially demonstrated that, to achieve the consistency of the AIC-type method in the above two cases, $γ>1$ is required. Moreover, we show that the EAIC-type method is essentially tuning-free and outperforms the well-known KN estimator proposed in Kritchman and Nadler (2008) and the BCF estimator proposed in Bai, Choi and Fujikoshi (2018). Numerical studies indicate that the proposed method works well.