Source author record

Didong Li

Didong Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.DG Methodology math.ST Statistics Theory Computation Genomics math.NA math.OC math.PR Numerical Analysis Quantitative Methods stat.OT

Catalog footprint

What is connected

10works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

On Statistical Inference for Rates of Change in Spatial Processes over Riemannian Manifolds

Statistical inference for spatial processes from partially realized or scattered data has seen voluminous developments in diverse areas ranging from environmental sciences to business and economics. Inference on the associated rates of change has seen some recent developments. The literature has been restricted to Euclidean domains, where inference is sought on directional derivatives, rates along a chosen direction of interest, at arbitrary locations. Inference for higher order rates, particularly directional curvature has also proved useful in these settings. Modern spatial data often arise from non-Euclidean domains. This manuscript particularly considers spatial processes defined over compact Riemannian manifolds. We develop a comprehensive inferential framework for spatial rates of change for such processes over vector fields. In doing so, we formalize smoothness of process realizations and construct differential processes -- the derivative and curvature processes. We derive conditions for kernels that ensure the existence of these processes and establish validity of the joint multivariate process consisting of the ``parent'' Gaussian process (GP) over the manifold and the associated differential processes. Predictive inference on these rates is devised conditioned on the realized process over the manifold. Manifolds arise as polyhedral meshes in practice. The success of our simulation experiments for assessing derivatives for processes observed over such meshes validate our theoretical findings. By enhancing our understanding of GPs on manifolds, this manuscript unlocks a variety of potential applications in machine learning and statistics where GPs have seen wide usage. We propose a fully model-based approach to inference on the differential processes arising from a spatial process from partially observed or realized data across scattered location on a manifold.

preprint2024arXiv

Contrastive linear regression

Contrastive dimension reduction methods have been developed for case-control study data to identify variation that is enriched in the foreground (case) data X relative to the background (control) data Y. Here, we develop contrastive regression for the setting when there is a response variable r associated with each foreground observation. This situation occurs frequently when, for example, the unaffected controls do not have a disease grade or intervention dosage but the affected cases have a disease grade or intervention dosage, as in autism severity, solid tumors stages, polyp sizes, or warfarin dosages. Our contrastive regression model captures shared low-dimensional variation between the predictors in the cases and control groups, and then explains the case-specific response variables through the variance that remains in the predictors after shared variation is removed. We show that, in one single-nucleus RNA sequencing dataset on autism severity in postmortem brain samples from donors with and without autism and in another single-cell RNA sequencing dataset on cellular differentiation in chronic rhinosinusitis with and without nasal polyps, our contrastive linear regression performs feature ranking and identifies biologically-informative predictors associated with response that cannot be identified using other approaches

preprint2022arXiv

Efficient Manifold and Subspace Approximations with Spherelets

In statistical dimensionality reduction, it is common to rely on the assumption that high dimensional data tend to concentrate near a lower dimensional manifold. There is a rich literature on approximating the unknown manifold, and on exploiting such approximations in clustering, data compression, and prediction. Most of the literature relies on linear or locally linear approximations. In this article, we propose a simple and general alternative, which instead uses spheres, an approach we refer to as spherelets. We develop spherical principal components analysis (SPCA), and provide theory on the convergence rate for global and local SPCA, while showing that spherelets can provide lower covering numbers and MSEs for many manifolds. Results relative to state-of-the-art competitors show gains in ability to accurately approximate manifolds with fewer components. Unlike most competitors, which simply output lower-dimensional features, our approach projects data onto the estimated manifold to produce fitted values that can be used for model assessment and cross validation. The methods are illustrated with applications to multiple data sets.

preprint2022arXiv

From the Greene--Wu Convolution to Gradient Estimation over Riemannian Manifolds

Over a complete Riemannian manifold of finite dimension, Greene and Wu introduced a convolution, known as Greene-Wu (GW) convolution. In this paper, we study properties of the GW convolution and apply it to non-Euclidean machine learning problems. In particular, we derive a new formula for how the curvature of the space would affect the curvature of the function through the GW convolution. Also, following the study of the GW convolution, a new method for gradient estimation over Riemannian manifolds is introduced.

preprint2021arXiv

Contrastive latent variable modeling with application to case-control sequencing experiments

High-throughput RNA-sequencing (RNA-seq) technologies are powerful tools for understanding cellular state. Often it is of interest to quantify and summarize changes in cell state that occur between experimental or biological conditions. Differential expression is typically assessed using univariate tests to measure gene-wise shifts in expression. However, these methods largely ignore changes in transcriptional correlation. Furthermore, there is a need to identify the low-dimensional structure of the gene expression shift to identify collections of genes that change between conditions. Here, we propose contrastive latent variable models designed for count data to create a richer portrait of differential expression in sequencing data. These models disentangle the sources of transcriptional variation in different conditions, in the context of an explicit model of variation at baseline. Moreover, we develop a model-based hypothesis testing framework that can test for global and gene subset-specific changes in expression. We test our model through extensive simulations and analyses with count-based gene expression data from perturbation and observational sequencing experiments. We find that our methods can effectively summarize and quantify complex transcriptional changes in case-control experimental sequencing data.

preprint2020arXiv

Geodesic Distance Estimation with Spherelets

Many statistical and machine learning approaches rely on pairwise distances between data points. The choice of distance metric has a fundamental impact on performance of these procedures, raising questions about how to appropriately calculate distances. When data points are real-valued vectors, by far the most common choice is the Euclidean distance. This article is focused on the problem of how to better calculate distances taking into account the intrinsic geometry of the data, assuming data are concentrated near an unknown subspace or manifold. The appropriate geometric distance corresponds to the length of the shortest path along the manifold, which is the geodesic distance. When the manifold is unknown, it is challenging to accurately approximate the geodesic distance. Current algorithms are either highly complex, and hence often impractical to implement, or based on simple local linear approximations and shortest path algorithms that may have inadequate accuracy. We propose a simple and general alternative, which uses pieces of spheres, or spherelets, to locally approximate the unknown subspace and thereby estimate the geodesic distance through paths over spheres. Theory is developed showing lower error for many manifolds, with applications in clustering, conditional density estimation and mean regression. The conclusion is supported through multiple simulation examples and real data sets.

preprint2020arXiv

Principal Ellipsoid Analysis (PEA): Efficient non-linear dimension reduction & clustering

Even with the rise in popularity of over-parameterized models, simple dimensionality reduction and clustering methods, such as PCA and k-means, are still routinely used in an amazing variety of settings. A primary reason is the combination of simplicity, interpretability and computational efficiency. The focus of this article is on improving upon PCA and k-means, by allowing non-linear relations in the data and more flexible cluster shapes, without sacrificing the key advantages. The key contribution is a new framework for Principal Elliptical Analysis (PEA), defining a simple and computationally efficient alternative to PCA that fits the best elliptical approximation through the data. We provide theoretical guarantees on the proposed PEA algorithm using Vapnik-Chervonenkis (VC) theory to show strong consistency and uniform concentration bounds. Toy experiments illustrate the performance of PEA, and the ability to adapt to non-linear structure and complex cluster shapes. In a rich variety of real data clustering applications, PEA is shown to do as well as k-means for simple datasets, while dramatically improving performance in more complex settings.

preprint2020arXiv

Random Lie Brackets that Induce Torsion: A Model for Noisy Vector Fields

We define and study a random Lie bracket that induces torsion in expectation. Almost all stochastic analysis on manifolds have assumed parallel transport. Mathematically this assumption is very reasonable. However, in many applied geometry and graphics problems parallel transport is not achieved, the "change in coordinates" are not exact due to noise. We formulate a stochastic model on a manifold for which parallel transport does not hold and analyze the consequences of this model with respect to classic quantities studied in Riemannian geometry. We first define a stochastic lie bracket that induces a stochastic covariant derivative. We then study the connection implied by the stochastic covariant derivative and note that the stochastic lie bracket induces torsion. We then state the induced stochastic geodesic equations and a stochastic differential equation for parallel transport. We also derive the curvature tensors for our construction and a stochastic Laplace-Beltrami operator. We close with a discussion of the motivation and relevance of our construction.

preprint2014arXiv

Principal bundles over statistical manifolds

In this paper, we introduce the concept of principal bundles on statistical manifolds. After necessary preliminaries on information geometry and principal bundles on manifolds, we study the $α$-structure of frame bundles over statistical manifolds with respect to $α$-connections, by giving geometric structures. The manifold of one-dimensional normal distributions appears in the end as an application and a concrete example.

preprint2014arXiv

Riemannian Holonomy Groups of Statistical Manifolds

Normal distribution manifolds play essential roles in the theory of information geometry, so do holonomy groups in classification of Riemannian manifolds. After some necessary preliminaries on information geometry and holonomy groups, it is presented that the corresponding Riemannian holonomy group of the $d$-dimensional normal distribution is $SO\left(\frac{d\left(d+3\right)}{2}\right)$, for all $d\in\mathbb{N}$. As a generalization on exponential family, a list of holonomy groups follows.

Didong Li

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

On Statistical Inference for Rates of Change in Spatial Processes over Riemannian Manifolds

Contrastive linear regression

Efficient Manifold and Subspace Approximations with Spherelets

From the Greene--Wu Convolution to Gradient Estimation over Riemannian Manifolds

Contrastive latent variable modeling with application to case-control sequencing experiments

Geodesic Distance Estimation with Spherelets

Principal Ellipsoid Analysis (PEA): Efficient non-linear dimension reduction & clustering

Random Lie Brackets that Induce Torsion: A Model for Noisy Vector Fields

Principal bundles over statistical manifolds

Riemannian Holonomy Groups of Statistical Manifolds