Researcher profile

Jiguo Cao

Jiguo Cao contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2022arXiv

A Joint Estimation Approach to Sparse Additive Ordinary Differential Equations

Ordinary differential equations (ODEs) are widely used to characterize the dynamics of complex systems in real applications. In this article, we propose a novel joint estimation approach for generalized sparse additive ODEs where observations are allowed to be non-Gaussian. The new method is unified with existing collocation methods by considering the likelihood, ODE fidelity and sparse regularization simultaneously. We design a block coordinate descent algorithm for optimizing the non-convex and non-differentiable objective function. The global convergence of the algorithm is established. The simulation study and two applications demonstrate the superior performance of the proposed method in estimation and improved performance of identifying the sparse structure.

preprint2022arXiv

Dynamical Modeling for non-Gaussian Data with High-dimensional Sparse Ordinary Differential Equations

Ordinary differential equations (ODE) have been widely used for modeling dynamical complex systems. For high-dimensional ODE models where the number of differential equations is large, it remains challenging to estimate the ODE parameters and to identify the sparse structure of the ODE models. Most existing methods exploit the least-square based approach and are only applicable to Gaussian observations. However, as discrete data are ubiquitous in applications, it is of practical importance to develop dynamic modeling for non-Gaussian observations. New methods and algorithms are developed for both parameter estimation and sparse structure identification in high-dimensional linear ODE systems. First, the high-dimensional generalized profiling method is proposed as a likelihood-based approach with ODE fidelity and sparsity-inducing regularization, along with efficient computation based on parameter cascading. Second, two versions of the two-step collocation methods are extended to the non-Gaussian set-up by incorporating the iteratively reweighted least squares technique. Simulations show that the profiling procedure has excellent performance in latent process and derivative fitting and ODE parameter estimation, while the two-step collocation approach excels in identifying the sparse structure of the ODE system. The usefulness of the proposed methods is also demonstrated by analyzing three real datasets from Google trends, stock market sectors, and yeast cell cycle studies.

preprint2022arXiv

Functional Nonlinear Learning

Using representations of functional data can be more convenient and beneficial in subsequent statistical models than direct observations. These representations, in a lower-dimensional space, extract and compress information from individual curves. The existing representation learning approaches in functional data analysis usually use linear mapping in parallel to those from multivariate analysis, e.g., functional principal component analysis (FPCA). However, functions, as infinite-dimensional objects, sometimes have nonlinear structures that cannot be uncovered by linear mapping. Linear methods will be more overwhelmed given multivariate functional data. For that matter, this paper proposes a functional nonlinear learning (FunNoL) method to sufficiently represent multivariate functional data in a lower-dimensional feature space. Furthermore, we merge a classification model for enriching the ability of representations in predicting curve labels. Hence, representations from FunNoL can be used for both curve reconstruction and classification. Additionally, we have endowed the proposed model with the ability to address the missing observation problem as well as to further denoise observations. The resulting representations are robust to observations that are locally disturbed by uncontrollable random noises. We apply the proposed FunNoL method to several real data sets and show that FunNoL can achieve better classifications than FPCA, especially in the multivariate functional data setting. Simulation studies have shown that FunNoL provides satisfactory curve classification and reconstruction regardless of data sparsity.

preprint2022arXiv

General P-Splines for Non-Uniform B-Splines

We proposed a new penalized B-splines estimator, the general P-spline, to accommodate non-uniform B-splines on unevenly spaced knots. It is a complement to Eilers and Marx's standard P-spline tailored for uniform B-splines on equidistant knots. At its core, we derived a novel general difference penalty that accounts for irregular knot spacing, while still being easy to compute and interpret. Both P-spline variants are useful for practical smoothing, because either one can produce a more satisfactory fit than the other, depending on the knot sequence being used and the data being analyzed. Therefore, practitioners should try out both before betting on either one, for which we have implemented general P-spline in R packages gps and gps.mgcv. The new general P-spline is closely related to O'Sullivan spline (or O-spline) through a sandwich formula that links general difference penalty to derivative penalty. Though both penalties seem equally powerful in wiggliness control for their mathematical association and statistical similarity, simulation studies show that general P-spline either outperforms O-spline in terms of mean squared error, or performs equally well, making it a superior replacement of O-spline.

preprint2022arXiv

Machine Learning Based Multimodal Neuroimaging Genomics Dementia Score for Predicting Future Conversion to Alzheimer's Disease

Background: The increasing availability of databases containing both magnetic resonance imaging (MRI) and genetic data allows researchers to utilize multimodal data to better understand the characteristics of dementia of Alzheimer's type (DAT). Objective: The goal of this study was to develop and analyze novel biomarkers that can help predict the development and progression of DAT. Methods: We used feature selection and ensemble learning classifier to develop an image/genotype-based DAT score that represents a subject's likelihood of developing DAT in the future. Three feature types were used: MRI only, genetic only, and combined multimodal data. We used a novel data stratification method to better represent different stages of DAT. Using a pre-defined 0.5 threshold on DAT scores, we predicted whether or not a subject would develop DAT in the future. Results: Our results on Alzheimer's Disease Neuroimaging Initiative (ADNI) database showed that dementia scores using genetic data could better predict future DAT progression for currently normal control subjects (Accuracy=0.857) compared to MRI (Accuracy=0.143), while MRI can better characterize subjects with stable mild cognitive impairment (Accuracy=0.614) compared to genetics (Accuracy=0.356). Combining MRI and genetic data showed improved classification performance in the remaining stratified groups. Conclusion: MRI and genetic data can contribute to DAT prediction in different ways. MRI data reflects anatomical changes in the brain, while genetic data can detect the risk of DAT progression prior to the symptomatic onset. Combining information from multimodal data in the right way can improve prediction performance.

preprint2020arXiv

A Bayesian Spatial Model for Imaging Genetics

We develop a Bayesian bivariate spatial model for multivariate regression analysis applicable to studies examining the influence of genetic variation on brain structure. Our model is motivated by an imaging genetics study of the Alzheimer's Disease Neuroimaging Initiative (ADNI), where the objective is to examine the association between images of volumetric and cortical thickness values summarizing the structure of the brain as measured by magnetic resonance imaging (MRI) and a set of 486 SNPs from 33 Alzheimer's Disease (AD) candidate genes obtained from 632 subjects. A bivariate spatial process model is developed to accommodate the correlation structures typically seen in structural brain imaging data. First, we allow for spatial correlation on a graph structure in the imaging phenotypes obtained from a neighbourhood matrix for measures on the same hemisphere of the brain. Second, we allow for correlation in the same measures obtained from different hemispheres (left/right) of the brain. We develop a mean-field variational Bayes algorithm and a Gibbs sampling algorithm to fit the model. We also incorporate Bayesian false discovery rate (FDR) procedures to select SNPs. We implement the methodology in a new release of the R package bgsmtr. We show that the new spatial model demonstrates superior performance over a standard model in our application. Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu).

preprint2020arXiv

FuncNN: An R Package to Fit Deep Neural Networks Using Generalized Input Spaces

Neural networks have excelled at regression and classification problems when the input space consists of scalar variables. As a result of this proficiency, several popular packages have been developed that allow users to easily fit these kinds of models. However, the methodology has excluded the use of functional covariates and to date, there exists no software that allows users to build deep learning models with this generalized input space. To the best of our knowledge, the functional neural network (FuncNN) library is the first such package in any programming language; the library has been developed for R and is built on top of the keras architecture. Throughout this paper, several functions are introduced that provide users an avenue to easily build models, generate predictions, and run cross-validations. A summary of the underlying methodology is also presented. The ultimate contribution is a package that provides a set of general modelling and diagnostic tools for data problems in which there exist both functional and scalar covariates.

preprint2020arXiv

Spectral Dynamic Causal Modelling of Resting-State fMRI: Relating Effective Brain Connectivity in the Default Mode Network to Genetics

We conduct an imaging genetics study to explore how effective brain connectivity in the default mode network (DMN) may be related to genetics within the context of Alzheimer's disease and mild cognitive impairment. We develop an analysis of longitudinal resting-state functional magnetic resonance imaging (rs-fMRI) and genetic data obtained from a sample of 111 subjects with a total of 319 rs-fMRI scans from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. A Dynamic Causal Model (DCM) is fit to the rs-fMRI scans to estimate effective brain connectivity within the DMN and related to a set of single nucleotide polymorphisms (SNPs) contained in an empirical disease-constrained set which is obtained out-of-sample from 663 ADNI subjects having only genome-wide data. We examine longitudinal data in both a 4-region and an 6-region network and relate longitudinal effective brain connectivity networks estimated using spectral DCM to SNPs using both linear mixed effect (LME) models as well as function-on-scalar regression (FSR). In the former case we implement a parametric bootstrap for testing SNP coefficients and make comparisons with p-values obtained from the chi-squared null distribution. We also implement a parametric bootstrap approach for testing regression functions in FSR and we make comparisons between p-values obtained from the parametric bootstrap to p-values obtained using the F-distribution with degrees-of-freedom based on Satterthwaite's approximation. In both networks we report on exploratory patterns of associations with relatively high ranks that exhibit stability to the differing assumptions made by both FSR and LME.

preprint2015arXiv

A Smooth and Locally Sparse Estimator for Functional Linear Regression via Functional SCAD Penalty

In this paper, we propose a new regularization technique called "functional SCAD". We then combine this technique with the smoothing spline method to develop a smooth and locally sparse (i.e., zero on some sub-regions) estimator for the coefficient function in functional linear regression. The functional SCAD has a nice shrinkage property that enables our estimating procedure to identify the null subregions of the coefficient function without over shrinking the non-zero values of the coefficient function. Additionally, the smoothness of our estimated coefficient function is regularized by a roughness penalty rather than by controlling the number of knots. Our method is more theoretically sound and is computationally simpler than the other available methods. An asymptotic analysis shows that our estimator is consistent and can identify the null region with the probability tending to one. Furthermore, simulation studies show that our estimator has superior numerical performance. Finally, the practical merit of our method is demonstrated on two real applications.