Source author record

Wang Zhou

Wang Zhou appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.PR math.ST Statistics Theory math.CV Methodology math-ph math.MP Computer Vision cond-mat.other Machine Learning physics.ins-det

Catalog footprint

What is connected

26works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Factor Modelling for Clustering High-dimensional Time Series

We propose a new unsupervised learning method for clustering a large number of time series based on a latent factor structure. Each cluster is characterized by its own cluster-specific factors in addition to some common factors which impact on all the time series concerned. Our setting also offers the flexibility that some time series may not belong to any clusters. The consistency with explicit convergence rates is established for the estimation of the common factors, the cluster-specific factors, the latent clusters. Numerical illustration with both simulated data as well as a real data example is also reported. As a spin-off, the proposed new approach also advances significantly the statistical inference for the factor model of Lam and Yao (2012).

preprint2022arXiv

Testing Kronecker Product Covariance Matrices for High-dimensional Matrix-Variate Data

Kronecker product covariance structure provides an efficient way to modeling the inter-correlations of matrix-variate data. In this paper, we propose testing statistics for Kronecker product covariance matrix based on linear spectral statistics of renormalized sample covariance matrices. Central limit theorem is proved for the linear spectral statistics with explicit formulas for mean and covariance functions, which fills the gap in the literature. We then theoretically justify that the proposed testing statistics have well-controlled sizes and strong powers. To facilitate practical usefulness, we further propose a bootstrap resampling algorithm to approximate the limiting distributions of associated linear spectral statistics. Consistency of the bootstrap procedure is guaranteed under mild conditions. A more general model which allows the existence of noises will also be discussed. In the simulations, the empirical sizes of the proposed testing procedure and its bootstrapped version are close to corresponding theoretical values, while the powers converge to one quickly as the dimension and sample size grow.

preprint2021arXiv

On eigenvalues of a high-dimensional spatial-sign covariance matrix

This paper investigates limiting properties of eigenvalues of multivariate sample spatial-sign covariance matrices when both the number of variables and the sample size grow to infinity. The underlying p-variate populations are general enough to include the popular independent components model and the family of elliptical distributions. A first result of the paper establishes that the distribution of the eigenvalues converges to a deterministic limit that belongs to the family of generalized Marcenko-Pastur distributions. Furthermore, a new central limit theorem is established for a class of linear spectral statistics. We develop two applications of these results to robust statistics for a high-dimensional shape matrix. First, two statistics are proposed for testing the sphericity. Next, a spectrum-corrected estimator using the sample spatial-sign covariance matrix is proposed. Simulation experiments show that in high dimension, the sample spatial-sign covariance matrix provides a valid and robust tool for mitigating influence of outliers.

preprint2020arXiv

Lifelong Object Detection

Recent advances in object detection have benefited significantly from rapid developments in deep neural networks. However, neural networks suffer from the well-known issue of catastrophic forgetting, which makes continual or lifelong learning problematic. In this paper, we leverage the fact that new training classes arrive in a sequential manner and incrementally refine the model so that it additionally detects new object classes in the absence of previous training data. Specifically, we consider the representative object detector, Faster R-CNN, for both accurate and efficient prediction. To prevent abrupt performance degradation due to catastrophic forgetting, we propose to apply knowledge distillation on both the region proposal network and the region classification network, to retain the detection of previously trained classes. A pseudo-positive-aware sampling strategy is also introduced for distillation sample selection. We evaluate the proposed method on PASCAL VOC 2007 and MS COCO benchmarks and show competitive mAP and 6x inference speed improvement, which makes the approach more suitable for real-time applications. Our implementation will be publicly available.

preprint2020arXiv

Robust Covariance Estimation for High-dimensional Compositional Data with Application to Microbial Communities Analysis

Microbial communities analysis is drawing growing attention due to the rapid development of high-throughput sequencing techniques nowadays. The observed data has the following typical characteristics: it is high-dimensional, compositional (lying in a simplex) and even would be leptokurtic and highly skewed due to the existence of overly abundant taxa, which makes the conventional correlation analysis infeasible to study the co-occurrence and co-exclusion relationship between microbial taxa. In this article, we address the challenges of covariance estimation for this kind of data. Assuming the basis covariance matrix lying in a well-recognized class of sparse covariance matrices, we adopt a proxy matrix known as centered log-ratio covariance matrix in the literature, which is approximately indistinguishable from the real basis covariance matrix as the dimensionality tends to infinity. We construct a Median-of-Means (MOM) estimator for the centered log-ratio covariance matrix and propose a thresholding procedure that is adaptive to the variability of individual entries. By imposing a much weaker finite fourth moment condition compared with the sub-Gaussianity condition in the literature, we derive the optimal rate of convergence under the spectral norm. In addition, we also provide theoretical guarantee on support recovery. The adaptive thresholding procedure of the MOM estimator is easy to implement and gains robustness when outliers or heavy-tailedness exist. Thorough simulation studies are conducted to show the advantages of the proposed procedure over some state-of-the-arts methods. At last, we apply the proposed method to analyze a microbiome dataset in human gut. The R script for implementing the method is available at https://github.com/heyongstat/RCEC.

preprint2020arXiv

Statistical inference in massive datasets by empirical likelihood

In this paper, we propose a new statistical inference method for massive data sets, which is very simple and efficient by combining divide-and-conquer method and empirical likelihood. Compared with two popular methods (the bag of little bootstrap and the subsampled double bootstrap), we make full use of data sets, and reduce the computation burden. Extensive numerical studies and real data analysis demonstrate the effectiveness and flexibility of our proposed method. Furthermore, the asymptotic property of our method is derived.

preprint2020arXiv

The rate of convergence of harmonic explorer to SLE4

Using the estimate of the difference between the discrete harmonic function and its corresponding continuous version we derive a rate of convergence of the Loewner driving function for the harmonic explorer to the Brownian motion with speed 4 on the real line. Based on this convergence rate, the derivative estimate for chordal $\mbox{SLE}_4$, and the estimate of tip structure modulus for harmonic explorer paths, we obtain an explicit power-law rate of convergence of the harmonic explorer paths to the trace of chordal $\mbox{SLE}_4$ in the supremum distance.

preprint2015arXiv

Convergence of the empirical spectral distribution function of Beta matrices

Let $\mathbf{B}_n=\mathbf {S}_n(\mathbf {S}_n+α_n\mathbf {T}_N)^{-1}$, where $\mathbf {S}_n$ and $\mathbf {T}_N$ are two independent sample covariance matrices with dimension $p$ and sample sizes $n$ and $N$, respectively. This is the so-called Beta matrix. In this paper, we focus on the limiting spectral distribution function and the central limit theorem of linear spectral statistics of $\mathbf {B}_n$. Especially, we do not require $\mathbf {S}_n$ or $\mathbf {T}_N$ to be invertible. Namely, we can deal with the case where $p>\max\{n,N\}$ and $p<n+N$. Therefore, our results cover many important applications which cannot be simply deduced from the corresponding results for multivariate $F$ matrices.

preprint2015arXiv

Spectral statistics of large dimensional Spearman's rank correlation matrix and its application

Let $\mathbf{Q}=(Q_1,\ldots,Q_n)$ be a random vector drawn from the uniform distribution on the set of all $n!$ permutations of $\{1,2,\ldots,n\}$. Let $\mathbf{Z}=(Z_1,\ldots,Z_n)$, where $Z_j$ is the mean zero variance one random variable obtained by centralizing and normalizing $Q_j$, $j=1,\ldots,n$. Assume that $\mathbf {X}_i,i=1,\ldots ,p$ are i.i.d. copies of $\frac{1}{\sqrt{p}}\mathbf{Z}$ and $X=X_{p,n}$ is the $p\times n$ random matrix with $\mathbf{X}_i$ as its $i$th row. Then $S_n=XX^*$ is called the $p\times n$ Spearman's rank correlation matrix which can be regarded as a high dimensional extension of the classical nonparametric statistic Spearman's rank correlation coefficient between two independent random variables. In this paper, we establish a CLT for the linear spectral statistics of this nonparametric random matrix model in the scenario of high dimension, namely, $p=p(n)$ and $p/n\to c\in(0,\infty)$ as $n\to\infty$. We propose a novel evaluation scheme to estimate the core quantity in Anderson and Zeitouni's cumulant method in [Ann. Statist. 36 (2008) 2553-2576] to bypass the so-called joint cumulant summability. In addition, we raise a two-step comparison approach to obtain the explicit formulae for the mean and covariance functions in the CLT. Relying on this CLT, we then construct a distribution-free statistic to test complete independence for components of random vectors. Owing to the nonparametric property, we can use this test on generally distributed random variables including the heavy-tailed ones.

preprint2015arXiv

The logarithmic law of random determinant

Consider the square random matrix $A_n=(a_{ij})_{n,n}$, where $\{a_{ij}:=a_{ij}^{(n)},i,j=1,\ldots,n\}$ is a collection of independent real random variables with means zero and variances one. Under the additional moment condition \[\sup_n\max_{1\leq i,j\leq n}\mathbb{E}a_{ij}^4<\infty,\] we prove Girko's logarithmic law of $\det A_n$ in the sense that as $n\rightarrow\infty$ \begin{eqnarray*}\frac{\log|\det A_n|-(1/2)\log(n-1)!}{\sqrt{(1/2)\log n}}\stackrel{d}{ \longrightarrow}N(0,1).\end{eqnarray*}

preprint2015arXiv

Universality for the largest eigenvalue of sample covariance matrices with general population

This paper is aimed at deriving the universality of the largest eigenvalue of a class of high-dimensional real or complex sample covariance matrices of the form $\mathcal{W}_N=Σ^{1/2}XX^*Σ^{1/2}$. Here, $X=(x_{ij})_{M,N}$ is an $M\times N$ random matrix with independent entries $x_{ij},1\leq i\leq M,1\leq j\leq N$ such that $\mathbb{E}x_{ij}=0$, $\mathbb{E}|x_{ij}|^2=1/N$. On dimensionality, we assume that $M=M(N)$ and $N/M\rightarrow d\in(0,\infty)$ as $N\rightarrow\infty$. For a class of general deterministic positive-definite $M\times M$ matrices $Σ$, under some additional assumptions on the distribution of $x_{ij}$'s, we show that the limiting behavior of the largest eigenvalue of $\mathcal{W}_N$ is universal, via pursuing a Green function comparison strategy raised in [Probab. Theory Related Fields 154 (2012) 341-407, Adv. Math. 229 (2012) 1435-1515] by Erdős, Yau and Yin for Wigner matrices and extended by Pillai and Yin [Ann. Appl. Probab. 24 (2014) 935-1001] to sample covariance matrices in the null case ($Σ=I$). Consequently, in the standard complex case ($\mathbb{E}x_{ij}^2=0$), combing this universality property and the results known for Gaussian matrices obtained by El Karoui in [Ann. Probab. 35 (2007) 663-714] (nonsingular case) and Onatski in [Ann. Appl. Probab. 18 (2008) 470-490] (singular case), we show that after an appropriate normalization the largest eigenvalue of $\mathcal{W}_N$ converges weakly to the type 2 Tracy-Widom distribution $\mathrm{TW}_2$. Moreover, in the real case, we show that when $Σ$ is spiked with a fixed number of subcritical spikes, the type 1 Tracy-Widom limit $\mathrm{TW}_1$ holds for the normalized largest eigenvalue of $\mathcal {W}_N$, which extends a result of Féral and Péché in [J. Math. Phys. 50 (2009) 073302] to the scenario of nondiagonal $Σ$ and more generally distributed $X$.

preprint2014arXiv

Canonical correlation coefficients of high-dimensional normal vectors: finite rank case

Consider a normal vector $\mathbf{z}=(\mathbf{x}',\mathbf{y}')'$, consisting of two sub-vectors $\mathbf{x}$ and $\mathbf{y}$ with dimensions $p$ and $q$ respectively. With $n$ independent observations of $\mathbf{z}$ at hand, we study the correlation between $\mathbf{x}$ and $\mathbf{y}$, from the perspective of the Canonical Correlation Analysis, under the high-dimensional setting: both $p$ and $q$ are proportional to the sample size $n$. In this paper, we focus on the case that $Σ_{\mathbf{x}\mathbf{y}}$ is of finite rank $k$, i.e. there are $k$ nonzero canonical correlation coefficients, whose squares are denoted by $r_1\geq\cdots\geq r_k>0$. Under the additional assumptions $(p+q)/n\to y\in (0,1)$ and $p/q\not\to 1$, we study the sample counterparts of $r_i,i=1,\ldots,k$, i.e. the largest k eigenvalues of the sample canonical correlation matrix $S_{\mathbf{x}\mathbf{x}}^{-1}S_{\mathbf{x}\mathbf{y}}S_{\mathbf{y}\mathbf{y}}^{-1}S_{\mathbf{y}\mathbf{x}}$, namely $λ_1\geq\cdots\geq λ_k$. We show that there exists a threshold $r_c\in(0,1)$, such that for each $i\in\{1,\ldots,k\}$, when $r_i\leq r_c$, $λ_i$ converges almost surely to the right edge of the limiting spectral distribution of the sample canonical correlation matrix, denoted by $d_r$. When $r_i>r_c$, $λ_i$ possesses an almost sure limit in $(d_r,1]$, from which we can recover $r_i$ in turn, thus provide an estimate of the latter in the high-dimensional scenario.

preprint2014arXiv

Random conformal welding for finitely connected regions

Given a finitely connected region $Ω$ of the Riemann sphere whose complement consists of $m$ mutually disjoint closed disks $\bar{U}_j$, the random homeomorphism $h_j$ on the boundary component $\partial U_j$ is constructed using the exponential Gaussian free field. The existence and uniqueness of random conformal welding of $Ω$ with $h_j$ is established by investigating a non-uniformly elliptic Betrami equation with a random complex dilatation. This generalizes the result of Astala, Jones, Kupiainen and Saksman to multiply connected domains.

preprint2014arXiv

Test of Independence for High-dimensional Random Vectors Based on Block Correlation Matrices

In this paper, we are concerned with the independence test for $k$ high-dimensional sub-vectors of a normal vector, with fixed positive integer $k$. A natural high-dimensional extension of the classical sample correlation matrix, namely block correlation matrix, is raised for this purpose. We then construct the so-called Schott type statistic as our test statistic, which turns out to be a particular linear spectral statistic of the block correlation matrix. Interestingly, the limiting behavior of the Schott type statistic can be figured out with the aid of the Free Probability Theory and the Random Matrix Theory. Specifically, we will bring the so-called real second order freeness for Haar distributed orthogonal matrices, derived in \cite{MP2013}, into the framework of this high-dimensional testing problem. Our test does not require the sample size to be larger than the total or any partial sum of the dimensions of the $k$ sub-vectors. Simulated results show the effect of the Schott type statistic, in contrast to those of the statistics proposed in \cite{JY2013} and \cite{JBZ2013}, is satisfactory. Real data analysis is also used to illustrate our method.

preprint2013arXiv

Conformal invariance of the exploration path in 2-d critical bond percolation in the square lattice

In this paper we present the proof of the convergence of the critical bond percolation exploration process on the square lattice to the trace of SLE$_{6}$. This is an important conjecture in mathematical physics and probability. The case of critical site percolation on the hexagonal lattice was established in the seminal work of Smirnov via proving Cardy's formula. Our proof uses a series of transformations and conditioning to construct a pair of paths: the $+\partial$CBP and the $-\partial$CBP. The convergence in the site percolation case on the hexagonal lattice allows us to obtain certain estimates on the scaling limit of the $+\partial$CBP and the $-\partial$CBP. By considering a path which is the concatenation of $+\partial$CBPs and $-\partial$CBPs in an alternating manner, we can prove the convergence in the case of bond percolation on the square lattice.

preprint2013arXiv

SLE curves and natural parametrization

Developing the theory of two-sided radial and chordal $\mathit{SLE}$, we prove that the natural parametrization on $\mathit{SLE}_κ$ curves is well defined for all $κ<8$. Our proof uses a two-interior-point local martingale.

preprint2013arXiv

Universality for a global property of the eigenvectors of Wigner matrices

Let $M_n$ be an $n\times n$ real (resp. complex) Wigner matrix and $U_nΛ_n U_n^*$ be its spectral decomposition. Set $(y_1,y_2...,y_n)^T=U_n^*x$, where $x=(x_1,x_2,...,$ $x_n)^T$ is a real (resp. complex) unit vector. Under the assumption that the elements of $M_n$ have 4 matching moments with those of GOE (resp. GUE), we show that the process $X_n(t)=\sqrt{\frac{βn}{2}}\sum_{i=1}^{\lfloor nt\rfloor}(|y_i|^2-\frac1n)$ converges weakly to the Brownian bridge for any $\mathbf{x}$ such that $||x||_\infty\rightarrow 0$ as $n\rightarrow \infty$, where $β=1$ for the real case and $β=2$ for the complex case. Such a result indicates that the othorgonal (resp. unitary) matrices with columns being the eigenvectors of Wigner matrices are asymptotically Haar distributed on the orthorgonal (resp. unitary) group from a certain perspective.

preprint2012arXiv

Central limit theorem for partial linear eigenvalue statistics of Wigner matrices

In this paper, we study the complex Wigner matrices $M_n=\frac{1}{\sqrt{n}}W_n$ whose eigenvalues are typically in the interval $[-2,2]$. Let $λ_1\leq λ_2...\leqλ_n$ be the ordered eigenvalues of $M_n$. Under the assumption of four matching moments with the Gaussian Unitary Ensemble(GUE), for test function $f$ 4-times continuously differentiable on an open interval including $[-2,2]$, we establish central limit theorems for two types of partial linear statistics of the eigenvalues. The first type is defined with a threshold $u$ in the bulk of the Wigner semicircle law as $\mathcal{A}_n[f; u]=\sum_{l=1}^nf(λ_l)\mathbf{1}_{\{λ_l\leq u\}}$. And the second one is $\mathcal{B}_n[f; k]=\sum_{l=1}^{k}f(λ_l)$ with positive integer $k=k_n$ such that $k/n\rightarrow y\in (0,1)$ as $n$ tends to infinity. Moreover, we derive a weak convergence result for a partial sum process constructed from $\mathcal{B}_n[f; \lfloor nt\rfloor]$.

preprint2012arXiv

Nonparametric estimate of spectral density functions of sample covariance matrices: A first step

The density function of the limiting spectral distribution of general sample covariance matrices is usually unknown. We propose to use kernel estimators which are proved to be consistent. A simulation study is also conducted to show the performance of the estimators.

preprint2011arXiv

A Note on Rate of Convergence in Probability to Semicircular Law

In the present paper, we prove that under the assumption of the finite sixth moment for elements of a Wigner matrix, the convergence rate of its empirical spectral distribution to the Wigner semicircular law in probability is $O(n^{-1/2})$ when the dimension $n$ tends to infinity.

preprint2011arXiv

Generalized four-point characterization method for resistive and capacitive contacts

In this paper, a four-point characterization method is developed for resistive samples connected to either resistive or capacitive contacts. Provided the circuit equivalent of the complete measurement system is known including coaxial cable and connector capacitances as well as source output and amplifier input impedances, a frequency range and capacitive scaling factor can be determined, whereby four-point characterization can be performed. The technique is demonstrated with a discrete element test sample over a wide frequency range using lock-in measurement techniques from 1 Hz - 100 kHz. The data fit well with a circuit simulation of the entire measurement system. A high impedance preamplifier input stage gives best results, since lock-in input impedances may differ from manufacturer specifications. The analysis presented here establishes the utility of capacitive contacts for four-point characterizations at low frequency.

preprint2011arXiv

New estimators of spectral distributions of Wigner matrices

We introduce kernel estimators for the semicircle law. In this first part of our general theory on the estimators, we prove the consistency and conduct simulation study to show the performance of the estimators. We also point out that Wigner's semicircle law for our new estimators and the classical empirical spectral distributions is still true when the elements of Wigner matrices don't have finite variances but are in the domain of attraction of normal law.

preprint2011arXiv

Tracy-Widom law for the extreme eigenvalues of sample correlation matrices

Let the sample correlation matrix be $W=YY^T$, where $Y=(y_{ij})_{p,n}$ with $y_{ij}=x_{ij}/\sqrt{\sum_{j=1}^nx_{ij}^2}$. We assume $\{x_{ij}: 1\leq i\leq p, 1\leq j\leq n\}$ to be a collection of independent symmetric distributed random variables with sub-exponential tails. Moreover, for any $i$, we assume $x_{ij}, 1\leq j\leq n$ to be identically distributed. We assume $0<p<n$ and $p/n\rightarrow y$ with some $y\in(0,1)$ as $p,n\rightarrow\infty$. In this paper, we provide the Tracy-Widom law ($TW_1$) for both the largest and smallest eigenvalues of $W$. If $x_{ij}$ are i.i.d. standard normal, we can derive the $TW_1$ for both the largest and smallest eigenvalues of the matrix $\mathcal{R}=RR^T$, where $R=(r_{ij})_{p,n}$ with $r_{ij}=(x_{ij}-\bar x_i)/\sqrt{\sum_{j=1}^n(x_{ij}-\bar x_i)^2}$, $\bar x_i=n^{-1}\sum_{j=1}^nx_{ij}$.

preprint2011arXiv

Universality of sample covariance matrices: CLT of the smoothed empirical spectral distribution

A central limit theorem (CLT) for the smoothed empirical spectral distribution of sample covariance matrices is established. Moreover, the CLTs for the smoothed quantiles of Marcenko and Pastur's law have been also developed.

preprint2010arXiv

Central limit theorem of nonparametric estimate of spectral density functions of sample covariance matrices

A consistent kernel estimator of the limiting spectral distribution of general sample covariance matrices was introduced in Jing, Pan, Shao and Zhou (2010). The central limit theorem of the kernel estimator is proved in this paper.

preprint2010arXiv

Functional CLT for sample covariance matrices

Using Bernstein polynomial approximations, we prove the central limit theorem for linear spectral statistics of sample covariance matrices, indexed by a set of functions with continuous fourth order derivatives on an open interval including $[(1-\sqrt{y})^2,(1+\sqrt{y})^2]$, the support of the Marucenko--Pastur law. We also derive the explicit expressions for asymptotic mean and covariance functions.

Wang Zhou

What is connected

Connect this record

See the researcher in context

Building this map preview

26 published item(s)

Factor Modelling for Clustering High-dimensional Time Series

Testing Kronecker Product Covariance Matrices for High-dimensional Matrix-Variate Data

On eigenvalues of a high-dimensional spatial-sign covariance matrix

Lifelong Object Detection

Robust Covariance Estimation for High-dimensional Compositional Data with Application to Microbial Communities Analysis

Statistical inference in massive datasets by empirical likelihood

The rate of convergence of harmonic explorer to SLE4

Convergence of the empirical spectral distribution function of Beta matrices

Spectral statistics of large dimensional Spearman's rank correlation matrix and its application

The logarithmic law of random determinant

Universality for the largest eigenvalue of sample covariance matrices with general population

Canonical correlation coefficients of high-dimensional normal vectors: finite rank case

Random conformal welding for finitely connected regions

Test of Independence for High-dimensional Random Vectors Based on Block Correlation Matrices

Conformal invariance of the exploration path in 2-d critical bond percolation in the square lattice

SLE curves and natural parametrization

Universality for a global property of the eigenvectors of Wigner matrices

Central limit theorem for partial linear eigenvalue statistics of Wigner matrices

Nonparametric estimate of spectral density functions of sample covariance matrices: A first step

A Note on Rate of Convergence in Probability to Semicircular Law

Generalized four-point characterization method for resistive and capacitive contacts

New estimators of spectral distributions of Wigner matrices

Tracy-Widom law for the extreme eigenvalues of sample correlation matrices

Universality of sample covariance matrices: CLT of the smoothed empirical spectral distribution

Central limit theorem of nonparametric estimate of spectral density functions of sample covariance matrices

Functional CLT for sample covariance matrices