Source author record

Zhigang Yao

Zhigang Yao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory Machine Learning Computation Methodology

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Curvature-driven manifold fitting under unbounded isotropic noise

Manifold fitting aims to reconstruct a low-dimensional manifold from high-dimensional data, whose framework is established by Fefferman et al. \cite{fefferman2020reconstruction,fefferman2021reconstruction}. This paper studies the recovery of a compact $C^3$ submanifold $\mathcal{M} \subset \mathbb{R}^D$ with dimension $d<D$ and positive reach $τ$ from observations $Y = X + ξ$, where $X$ is uniformly distributed on $\mathcal{M}$ and $ξ\sim \mathcal{N}(0, σ^2 I_D)$ denotes isotropic Gaussian noise. To project any points $z$ in a tubular neighborhood $Γ$ of $\mathcal{M}$ onto $\mathcal{M}$, we construct a sample-based estimator $F:Γ\to\mathbb{R}^D$ by a normalized local kernel with the theoretically derived bandwidth $r = c_Dσ$. Under a sample size of $O(σ^{-3d-5})$, we establish with high probability the uniform asymptotic expansion \[ F(z) = π(z) + \frac{d}{2} H_{π(z)} σ^2 + O(σ^3), \qquad z \in Γ, \] where $π(z)$ is the projection of $z$ onto $\mathcal{M}$ and $H_{π(z)}$ is the mean curvature vector of $\mathcal{M}$ at $π(z)$. The resulting manifold $F(Γ)$ has reach bounded below by $c τ$ for $c>0$ and achieves a state-of-the-art Hausdorff distance of $O(σ^2)$ to $\mathcal{M}$. Numerical experiments confirm the quadratic decay of the reconstruction error and demonstrate the computational efficiency of the estimator $F$. Our work provides a curvature-driven framework for denoising and reconstructing manifolds with second-order accuracy.

preprint2022arXiv

Manifold Fitting in Ambient Space

Modern sample points in many applications no longer comprise real vectors in a real vector space but sample points of much more complex structures, which may be represented as points in a space with a certain underlying geometric structure, namely a manifold. Manifold learning is an emerging field for learning the underlying structure. The study of manifold learning can be split into two main branches: dimension reduction and manifold fitting. With the aim of combining statistics and geometry, we address the problem of manifold fitting in the ambient space. Inspired by the relation between the eigenvalues of the Laplace-Beltrami operator and the geometry of a manifold, we aim to find a small set of points that preserve the geometry of the underlying manifold. From this relationship, we extend the idea of subsampling to sample points in high-dimensional space and employ the Moving Least Squares (MLS) approach to approximate the underlying manifold. We analyze the two core steps in our proposed method theoretically and also provide the bounds for the MLS approach. Our simulation results and theoretical analysis demonstrate the superiority of our method in estimating the underlying manifold.

preprint2014arXiv

A statistical approach to the inverse problem in magnetoencephalography

Magnetoencephalography (MEG) is an imaging technique used to measure the magnetic field outside the human head produced by the electrical activity inside the brain. The MEG inverse problem, identifying the location of the electrical sources from the magnetic signal measurements, is ill-posed, that is, there are an infinite number of mathematically correct solutions. Common source localization methods assume the source does not vary with time and do not provide estimates of the variability of the fitted model. Here, we reformulate the MEG inverse problem by considering time-varying locations for the sources and their electrical moments and we model their time evolution using a state space model. Based on our predictive model, we investigate the inverse problem by finding the posterior source distribution given the multiple channels of observations at each time rather than fitting fixed source parameters. Our new model is more realistic than common models and allows us to estimate the variation of the strength, orientation and position. We propose two new Monte Carlo methods based on sequential importance sampling. Unlike the usual MCMC sampling scheme, our new methods work in this situation without needing to tune a high-dimensional transition kernel which has a very high cost. The dimensionality of the unknown parameters is extremely large and the size of the data is even larger. We use Parallel Virtual Machine (PVM) to speed up the computation.

preprint2014arXiv

Partial Correlation Screening for Estimating Large Precision Matrices, with Applications to Classification

We propose Partial Correlation Screening (PCS) as a new row-by-row approach to estimating a large precision matrix $Ω$. To estimate the $i$-th row of $Ω$, $1 \leq i \leq p$, PCS uses a Screen step and a Clean step. In the Screen step, PCS recruits a (small) subset of indices using a stage-wise algorithm, where in each stage, the algorithm updates the set of recruited indices by adding the index $j$ that has the largest (in magnitude) empirical partial correlation with $i$. In the Clean step, PCS re-investigates all recruited indices and use them to reconstruct the $i$-th row of $Ω$. PCS is computationally efficient and modest in memory use: to estimate a row of $Ω$, it only needs a few rows (determined sequentially) of the empirical covariance matrix. This enables PCS to execute the estimation of a large precision matrix (e.g., $p=10K$) in a few minutes, and open doors to estimating much larger precision matrices. We use PCS for classification. Higher Criticism Thresholding (HCT) is a recent classifier that enjoys optimality, but to exploit its full potential in practice, one needs a good estimate of the precision matrix $Ω$. Combining HCT with any approach to estimating $Ω$ gives a new classifier: examples include HCT-PCS and HCT-glasso. We have applied HCT-PCS to two large microarray data sets ($p = 8K$ and $10K$) for classification, where it not only significantly outperforms HCT-glasso, but also is competitive to the Support Vector Machine (SVM) and Random Forest (RF). The results suggest that PCS gives more useful estimates of $Ω$ than the glasso. We set up a general theoretical framework and show that in a broad context, PCS fully recovers the support of $Ω$ and HCT-PCS yields optimal classification behavior. Our proofs shed interesting light on the behavior of stage-wise procedures.

preprint2013arXiv

Optimal classification in sparse Gaussian graphic model

Consider a two-class classification problem where the number of features is much larger than the sample size. The features are masked by Gaussian noise with mean zero and covariance matrix $Σ$, where the precision matrix $Ω=Σ^{-1}$ is unknown but is presumably sparse. The useful features, also unknown, are sparse and each contributes weakly (i.e., rare and weak) to the classification decision. By obtaining a reasonably good estimate of $Ω$, we formulate the setting as a linear regression model. We propose a two-stage classification method where we first select features by the method of Innovated Thresholding (IT), and then use the retained features and Fisher's LDA for classification. In this approach, a crucial problem is how to set the threshold of IT. We approach this problem by adapting the recent innovation of Higher Criticism Thresholding (HCT). We find that when useful features are rare and weak, the limiting behavior of HCT is essentially just as good as the limiting behavior of ideal threshold, the threshold one would choose if the underlying distribution of the signals is known (if only). Somewhat surprisingly, when $Ω$ is sufficiently sparse, its off-diagonal coordinates usually do not have a major influence over the classification decision. Compared to recent work in the case where $Ω$ is the identity matrix [Proc. Natl. Acad. Sci. USA 105 (2008) 14790-14795; Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 367 (2009) 4449-4470], the current setting is much more general, which needs a new approach and much more sophisticated analysis. One key component of the analysis is the intimate relationship between HCT and Fisher's separation. Another key component is the tight large-deviation bounds for empirical processes for data with unconventional correlation structures, where graph theory on vertex coloring plays an important role.