Source author record

Zongming Ma

Zongming Ma appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory Machine Learning Methodology Information Theory math.IT math.PR Social and Information Networks Data Structures and Algorithms math.NA

Catalog footprint

What is connected

24works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Community Detection with Contextual Multilayer Networks

In this paper, we study community detection when we observe $m$ sparse networks and a high dimensional covariate matrix, all encoding the same community structure among $n$ subjects. In the asymptotic regime where the number of features $p$ and the number of subjects $n$ grows proportionally, we derive an exact formula of asymptotic minimum mean square error (MMSE) for estimating the common community structure in the balanced two block case. The formula implies the necessity of integrating information from multiple data sources. Consequently, it induces a sharp threshold of phase transition between the regime where detection (i.e., weak recovery) is possible and the regime where no procedure performs better than a random guess. The asymptotic MMSE depends on the covariate signal-to-noise ratio in a more subtle way than the phase transition threshold does. In the special case of $m=1$, our asymptotic MMSE formula complements the pioneering work of Deshpande et. al. (2018) which found the sharp threshold when $m=1$.

preprint2022arXiv

Global and Individualized Community Detection in Inhomogeneous Multilayer Networks

In network applications, it has become increasingly common to obtain datasets in the form of multiple networks observed on the same set of subjects, where each network is obtained in a related but different experiment condition or application scenario. Such datasets can be modeled by multilayer networks where each layer is a separate network itself while different layers are associated and share some common information. The present paper studies community detection in a stylized yet informative inhomogeneous multilayer network model. In our model, layers are generated by different stochastic block models, the community structures of which are (random) perturbations of a common global structure while the connecting probabilities in different layers are not related. Focusing on the symmetric two block case, we establish minimax rates for both global estimation of the common structure and individualized estimation of layer-wise community structures. Both minimax rates have sharp exponents. In addition, we provide an efficient algorithm that is simultaneously asymptotic minimax optimal for both estimation tasks under mild conditions. The optimal rates depend on the parity of the number of most informative layers, a phenomenon that is caused by inhomogeneity across layers. The method is extended to handle multiple and potentially asymmetric community cases. We demonstrate its effectiveness on both simulated examples and a real multi-modal single-cell dataset.

preprint2022arXiv

Nonconvex Matrix Completion with Linearly Parameterized Factors

Techniques of matrix completion aim to impute a large portion of missing entries in a data matrix through a small portion of observed ones. In practice including collaborative filtering, prior information and special structures are usually employed in order to improve the accuracy of matrix completion. In this paper, we propose a unified nonconvex optimization framework for matrix completion with linearly parameterized factors. In particular, by introducing a condition referred to as Correlated Parametric Factorization, we can conduct a unified geometric analysis for the nonconvex objective by establishing uniform upper bounds for low-rank estimation resulting from any local minimum. Perhaps surprisingly, the condition of Correlated Parametric Factorization holds for important examples including subspace-constrained matrix completion and skew-symmetric matrix completion. The effectiveness of our unified nonconvex optimization method is also empirically illustrated by extensive numerical simulations.

preprint2022arXiv

Sample canonical correlation coefficients of high-dimensional random vectors with finite rank correlations

Consider two random vectors $\widetilde{\mathbf x} \in \mathbb R^p$ and $\widetilde{\mathbf y} \in \mathbb R^q$ of the forms $\widetilde{\mathbf x}=A\mathbf z+\mathbf C_1^{1/2}\mathbf x$ and $\widetilde{\mathbf y}=B\mathbf z+\mathbf C_2^{1/2}\mathbf y$, where $\mathbf x\in \mathbb R^p$, $\mathbf y\in \mathbb R^q$ and $\mathbf z\in \mathbb R^r$ are independent vectors with i.i.d. entries of mean 0 and variance 1, $\mathbf C_1$ and $\mathbf C_2$ are $p \times p$ and $q\times q$ deterministic covariance matrices, and $A$ and $B$ are $p\times r$ and $q\times r$ deterministic matrices. With $n$ independent observations of $(\widetilde{\mathbf x},\widetilde{\mathbf y})$, we study the sample canonical correlations between $\widetilde{\mathbf x} $ and $\widetilde{\mathbf y}$. We consider the high-dimensional setting with finite rank correlations. Let $t_1\ge t_2 \ge \cdots\ge t_r$ be the squares of the nontrivial population canonical correlation coefficients, and let $\widetildeλ_1 \ge\widetildeλ_2\ge\cdots\ge\widetildeλ_{p\wedge q}$ be the squares of the sample canonical correlation coefficients. If the entries of $\mathbf x$, $\mathbf y$ and $\mathbf z$ are i.i.d. Gaussian, then the following dichotomy has been shown in [7] for a fixed threshold $t_c \in(0, 1)$: for $1\le i \le r$, if $t_i < t_c$, then $\widetildeλ_i$ converges to the right-edge $λ_+$ of the limiting eigenvalue spectrum of the sample canonical correlation matrix; if $t_i>t_c$, then $\widetildeλ_i$ converges to a deterministic limit $θ_i \in (λ_+,1)$ determined by $t_i$. In this paper, we prove that these results hold universally under the sharp fourth moment conditions on the entries of $\mathbf x$ and $\mathbf y$. Moreover, we prove the results in full generality, in the sense that they also hold for near-degenerate $t_i$'s and for $t_i$'s that are close to the threshold $t_c$.

preprint2020arXiv

Community detection in sparse latent space models

We show that a simple community detection algorithm originated from stochastic blockmodel literature achieves consistency, and even optimality, for a broad and flexible class of sparse latent space models. The class of models includes latent eigenmodels (arXiv:0711.1146). The community detection algorithm is based on spectral clustering followed by local refinement via normalized edge counting.

preprint2020arXiv

Efficient random graph matching via degree profiles

Random graph matching refers to recovering the underlying vertex correspondence between two random graphs with correlated edges; a prominent example is when the two random graphs are given by Erdős-Rényi graphs $G(n,\frac{d}{n})$. This can be viewed as an average-case and noisy version of the graph isomorphism problem. Under this model, the maximum likelihood estimator is equivalent to solving the intractable quadratic assignment problem. This work develops an $\tilde{O}(n d^2+n^2)$-time algorithm which perfectly recovers the true vertex correspondence with high probability, provided that the average degree is at least $d = Ω(\log^2 n)$ and the two graphs differ by at most $δ= O( \log^{-2}(n) )$ fraction of edges. For dense graphs and sparse graphs, this can be improved to $δ= O( \log^{-2/3}(n) )$ and $δ= O( \log^{-2}(d) )$ respectively, both in polynomial time. The methodology is based on appropriately chosen distance statistics of the degree profiles (empirical distribution of the degrees of neighbors). Before this work, the best known result achieves $δ=O(1)$ and $n^{o(1)} \leq d \leq n^c$ for some constant $c$ with an $n^{O(\log n)}$-time algorithm \cite{barak2018nearly} and $δ=\tilde O((d/n)^4)$ and $d = \tildeΩ(n^{4/5})$ with a polynomial-time algorithm \cite{dai2018performance}.

preprint2016arXiv

Adaptive Estimation in Two-way Sparse Reduced-rank Regression

This paper studies the problem of estimating a large coefficient matrix in a multiple response linear regression model when the coefficient matrix could be both of low rank and sparse in the sense that most nonzero entries concentrate on a few rows and columns. We are especially interested in the high dimensional settings where the number of predictors and/or response variables can be much larger than the number of observations. We propose a new estimation scheme, which achieves competitive numerical performance and at the same time allows fast computation. Moreover, we show that (a slight variant of) the proposed estimator achieves near optimal non-asymptotic minimax rates of estimation under a collection of squared Schatten norm losses simultaneously by providing both the error bounds for the estimator and minimax lower bounds. The effectiveness of the proposed algorithm is also demonstrated on an \textit{in vivo} calcium imaging dataset.

preprint2016arXiv

Community Detection in Degree-Corrected Block Models

Community detection is a central problem of network data analysis. Given a network, the goal of community detection is to partition the network nodes into a small number of clusters, which could often help reveal interesting structures. The present paper studies community detection in Degree-Corrected Block Models (DCBMs). We first derive asymptotic minimax risks of the problem for a misclassification proportion loss under appropriate conditions. The minimax risks are shown to depend on degree-correction parameters, community sizes, and average within and between community connectivities in an intuitive and interpretable way. In addition, we propose a polynomial time algorithm to adaptively perform consistent and even asymptotically optimal community detection in DCBMs.

preprint2016arXiv

Optimal Estimation and Rank Detection for Sparse Spiked Covariance Matrices

This paper considers sparse spiked covariance matrix models in the high-dimensional setting and studies the minimax estimation of the covariance matrix and the principal subspace as well as the minimax rank detection. The optimal rate of convergence for estimating the spiked covariance matrix under the spectral norm is established, which requires significantly different techniques from those for estimating other structured covariance matrices such as bandable or sparse covariance matrices. We also establish the minimax rate under the spectral norm for estimating the principal subspace, the primary object of interest in principal component analysis. In addition, the optimal rate for the rank detection boundary is obtained. This result also resolves the gap in a recent paper by Berthet and Rigollet [1] where the special case of rank one is considered.

preprint2016arXiv

Sparse CCA: Adaptive Estimation and Computational Barriers

Canonical correlation analysis is a classical technique for exploring the relationship between two sets of variables. It has important applications in analyzing high dimensional datasets originated from genomics, imaging and other fields. This paper considers adaptive minimax and computationally tractable estimation of leading sparse canonical coefficient vectors in high dimensions. First, we establish separate minimax estimation rates for canonical coefficient vectors of each set of random variables under no structural assumption on marginal covariance matrices. Second, we propose a computationally feasible estimator to attain the optimal rates adaptively under an additional sample size condition. Finally, we show that a sample size condition of this kind is needed for any randomized polynomial-time estimator to be consistent, assuming hardness of certain instances of the Planted Clique detection problem. The result is faithful to the Gaussian models used in the paper. As a byproduct, we obtain the first computational lower bounds for sparse PCA under the Gaussian single spiked covariance model.

preprint2015arXiv

Achieving Optimal Misclassification Proportion in Stochastic Block Model

Community detection is a fundamental statistical problem in network data analysis. Many algorithms have been proposed to tackle this problem. Most of these algorithms are not guaranteed to achieve the statistical optimality of the problem, while procedures that achieve information theoretic limits for general parameter spaces are not computationally tractable. In this paper, we present a computationally feasible two-stage method that achieves optimal statistical performance in misclassification proportion for stochastic block model under weak regularity conditions. Our two-stage procedure consists of a generic refinement step that can take a wide range of weakly consistent community detection procedures as initializer, to which the refinement stage applies and outputs a community assignment achieving optimal misclassification proportion with high probability. The practical effectiveness of the new algorithm is demonstrated by competitive numerical results.

preprint2015arXiv

Computational barriers in minimax submatrix detection

This paper studies the minimax detection of a small submatrix of elevated mean in a large matrix contaminated by additive Gaussian noise. To investigate the tradeoff between statistical performance and computational cost from a complexity-theoretic perspective, we consider a sequence of discretized models which are asymptotically equivalent to the Gaussian model. Under the hypothesis that the planted clique detection problem cannot be solved in randomized polynomial time when the clique size is of smaller order than the square root of the graph size, the following phase transition phenomenon is established: when the size of the large matrix $p\to\infty$, if the submatrix size $k=Θ(p^α)$ for any $α\in(0,{2}/{3})$, computational complexity constraints can incur a severe penalty on the statistical performance in the sense that any randomized polynomial-time test is minimax suboptimal by a polynomial factor in $p$; if $k=Θ(p^α)$ for any $α\in({2}/{3},1)$, minimax optimal detection can be attained within constant factors in linear time. Using Schatten norm loss as a representative example, we show that the hardness of attaining the minimax estimation rate can crucially depend on the loss function. Implications on the hardness of support recovery are also obtained.

preprint2015arXiv

Discussion of "Frequentist coverage of adaptive nonparametric Bayesian credible sets"

Discussion of "Frequentist coverage of adaptive nonparametric Bayesian credible sets" by Szabó, van der Vaart and van Zanten [arXiv:1310.4489v5].

preprint2015arXiv

Kernel Additive Principal Components

Additive principal components (APCs for short) are a nonlinear generalization of linear principal components. We focus on smallest APCs to describe additive nonlinear constraints that are approximately satisfied by the data. Thus APCs fit data with implicit equations that treat the variables symmetrically, as opposed to regression analyses which fit data with explicit equations that treat the data asymmetrically by singling out a response variable. We propose a regularized data-analytic procedure for APC estimation using kernel methods. In contrast to existing approaches to APCs that are based on regularization through subspace restriction, kernel methods achieve regularization through shrinkage and therefore grant distinctive flexibility in APC estimation by allowing the use of infinite-dimensional functions spaces for searching APC transformation while retaining computational feasibility. To connect population APCs and kernelized finite-sample APCs, we study kernelized population APCs and their associated eigenproblems, which eventually lead to the establishment of consistency of the estimated APCs. Lastly, we discuss an iterative algorithm for computing kernelized finite-sample APCs.

preprint2015arXiv

Minimax estimation in sparse canonical correlation analysis

Canonical correlation analysis is a widely used multivariate statistical technique for exploring the relation between two sets of variables. This paper considers the problem of estimating the leading canonical correlation directions in high-dimensional settings. Recently, under the assumption that the leading canonical correlation directions are sparse, various procedures have been proposed for many high-dimensional applications involving massive data sets. However, there has been few theoretical justification available in the literature. In this paper, we establish rate-optimal nonasymptotic minimax estimation with respect to an appropriate loss function for a wide range of model spaces. Two interesting phenomena are observed. First, the minimax rates are not affected by the presence of nuisance parameters, namely the covariance matrices of the two sets of random variables, though they need to be estimated in the canonical correlation analysis problem. Second, we allow the presence of the residual canonical correlation directions. However, they do not influence the minimax rates under a mild condition on eigengap. A generalized sin-theta theorem and an empirical process bound for Gaussian quadratic forms under rank constraint are used to establish the minimax upper bounds, which may be of independent interest.

preprint2015arXiv

Optimal Rates of Convergence for Noisy Sparse Phase Retrieval via Thresholded Wirtinger Flow

This paper considers the noisy sparse phase retrieval problem: recovering a sparse signal $x \in \mathbb{R}^p$ from noisy quadratic measurements $y_j = (a_j' x )^2 + ε_j$, $j=1, \ldots, m$, with independent sub-exponential noise $ε_j$. The goals are to understand the effect of the sparsity of $x$ on the estimation precision and to construct a computationally feasible estimator to achieve the optimal rates. Inspired by the Wirtinger Flow [12] proposed for noiseless and non-sparse phase retrieval, a novel thresholded gradient descent algorithm is proposed and it is shown to adaptively achieve the minimax optimal rates of convergence over a wide range of sparsity levels when the $a_j$'s are independent standard Gaussian random vectors, provided that the sample size is sufficiently large compared to the sparsity of $x$.

preprint2014arXiv

Rate Optimal Denoising of Simultaneously Sparse and Low Rank Matrices

We study minimax rates for denoising simultaneously sparse and low rank matrices in high dimensions. We show that an iterative thresholding algorithm achieves (near) optimal rates adaptively under mild conditions for a large class of loss functions. Numerical experiments on synthetic datasets also demonstrate the competitive performance of the proposed method.

preprint2014arXiv

Sparse PCA: Optimal rates and adaptive estimation

Principal component analysis (PCA) is one of the most commonly used statistical procedures with a wide range of applications. This paper considers both minimax and adaptive estimation of the principal subspace in the high dimensional setting. Under mild technical conditions, we first establish the optimal rates of convergence for estimating the principal subspace which are sharp with respect to all the parameters, thus providing a complete characterization of the difficulty of the estimation problem in term of the convergence rate. The lower bound is obtained by calculating the local metric entropy and an application of Fano's lemma. The rate optimal estimator is constructed using aggregation, which, however, might not be computationally feasible. We then introduce an adaptive procedure for estimating the principal subspace which is fully data driven and can be computed efficiently. It is shown that the estimator attains the optimal rates of convergence simultaneously over a large collection of the parameter spaces. A key idea in our construction is a reduction scheme which reduces the sparse PCA problem to a high-dimensional multivariate regression problem. This method is potentially also useful for other related problems.

preprint2013arXiv

Optimal hypothesis testing for high dimensional covariance matrices

This paper considers testing a covariance matrix $Σ$ in the high dimensional setting where the dimension $p$ can be comparable or much larger than the sample size $n$. The problem of testing the hypothesis $H_0:Σ=Σ_0$ for a given covariance matrix $Σ_0$ is studied from a minimax point of view. We first characterize the boundary that separates the testable region from the non-testable region by the Frobenius norm when the ratio between the dimension $p$ over the sample size $n$ is bounded. A test based on a $U$-statistic is introduced and is shown to be rate optimal over this asymptotic regime. Furthermore, it is shown that the power of this test uniformly dominates that of the corrected likelihood ratio test (CLRT) over the entire asymptotic regime under which the CLRT is applicable. The power of the $U$-statistic based test is also analyzed when $p/n$ is unbounded.

preprint2013arXiv

Sparse principal component analysis and iterative thresholding

Principal component analysis (PCA) is a classical dimension reduction method which projects data onto the principal subspace spanned by the leading eigenvectors of the covariance matrix. However, it behaves poorly when the number of features p is comparable to, or even much larger than, the sample size n. In this paper, we propose a new iterative thresholding approach for estimating principal subspaces in the setting where the leading eigenvectors are sparse. Under a spiked covariance model, we find that the new approach recovers the principal subspace and leading eigenvectors consistently, and even optimally, in a range of high-dimensional sparse settings. Simulated examples also demonstrate its competitive performance.

preprint2013arXiv

Volume Ratio, Sparsity, and Minimaxity under Unitarily Invariant Norms

The current paper presents a novel machinery for studying non-asymptotic minimax estimation of high-dimensional matrices, which yields tight minimax rates for a large collection of loss functions in a variety of problems. Based on the convex geometry of finite-dimensional Banach spaces, we first develop a volume ratio approach for determining minimax estimation rates of unconstrained normal mean matrices under all squared unitarily invariant norm losses. In addition, we establish the minimax rates for estimating mean matrices with submatrix sparsity, where the sparsity constraint introduces an additional term in the rate whose dependence on the norm differs completely from the rate of the unconstrained problem. Moreover, the approach is applicable to the matrix completion problem under the low-rank constraint. The new method also extends beyond the normal mean model. In particular, it yields tight rates in covariance matrix estimation and Poisson rate matrix estimation problems for all unitarily invariant norms.

preprint2012arXiv

Accuracy of the Tracy--Widom limits for the extreme eigenvalues in white Wishart matrices

The distributions of the largest and the smallest eigenvalues of a $p$-variate sample covariance matrix $S$ are of great importance in statistics. Focusing on the null case where $nS$ follows the standard Wishart distribution $W_p(I,n)$, we study the accuracy of their scaling limits under the setting: $n/p\rightarrow γ\in(0,\infty)$ as $n\rightarrow \infty$. The limits here are the orthogonal Tracy--Widom law and its reflection about the origin. With carefully chosen rescaling constants, the approximation to the rescaled largest eigenvalue distribution by the limit attains accuracy of order ${\mathrm {O}({\min(n,p)^{-2/3}})}$. If $γ>1$, the same order of accuracy is obtained for the smallest eigenvalue after incorporating an additional log transform. Numerical results show that the relative error of approximation at conventional significance levels is reduced by over 50% in rectangular and over 75% in `thin' data matrix settings, even with $\min(n,p)$ as small as 2.

preprint2012arXiv

Fast approach to the Tracy-Widom law at the edge of GOE and GUE

We study the rate of convergence for the largest eigenvalue distributions in the Gaussian unitary and orthogonal ensembles to their Tracy-Widom limits. We show that one can achieve an $O(N^{-2/3})$ rate with particular choices of the centering and scaling constants. The arguments here also shed light on more complicated cases of Laguerre and Jacobi ensembles, in both unitary and orthogonal versions. Numerical work shows that the suggested constants yield reasonable approximations, even for surprisingly small values of N.

preprint2011arXiv

A Sparse SVD Method for High-dimensional Data

We present a new computational approach to approximating a large, noisy data table by a low-rank matrix with sparse singular vectors. The approximation is obtained from thresholded subspace iterations that produce the singular vectors simultaneously, rather than successively as in competing proposals. We introduce novel ways to estimate thresholding parameters which obviate the need for computationally expensive cross-validation. We also introduce a way to sparsely initialize the algorithm for computational savings that allow our algorithm to outperform the vanilla SVD on the full data table when the signal is sparse. A comparison with two existing sparse SVD methods suggests that our algorithm is computationally always faster and statistically always at least comparable to the better of the two competing algorithms.

Zongming Ma

What is connected

Connect this record

See the researcher in context

Building this map preview

24 published item(s)

Community Detection with Contextual Multilayer Networks

Global and Individualized Community Detection in Inhomogeneous Multilayer Networks

Nonconvex Matrix Completion with Linearly Parameterized Factors

Sample canonical correlation coefficients of high-dimensional random vectors with finite rank correlations

Community detection in sparse latent space models

Efficient random graph matching via degree profiles

Adaptive Estimation in Two-way Sparse Reduced-rank Regression

Community Detection in Degree-Corrected Block Models

Optimal Estimation and Rank Detection for Sparse Spiked Covariance Matrices

Sparse CCA: Adaptive Estimation and Computational Barriers

Achieving Optimal Misclassification Proportion in Stochastic Block Model

Computational barriers in minimax submatrix detection

Discussion of "Frequentist coverage of adaptive nonparametric Bayesian credible sets"

Kernel Additive Principal Components

Minimax estimation in sparse canonical correlation analysis

Optimal Rates of Convergence for Noisy Sparse Phase Retrieval via Thresholded Wirtinger Flow

Rate Optimal Denoising of Simultaneously Sparse and Low Rank Matrices

Sparse PCA: Optimal rates and adaptive estimation

Optimal hypothesis testing for high dimensional covariance matrices

Sparse principal component analysis and iterative thresholding

Volume Ratio, Sparsity, and Minimaxity under Unitarily Invariant Norms

Accuracy of the Tracy--Widom limits for the extreme eigenvalues in white Wishart matrices

Fast approach to the Tracy-Widom law at the edge of GOE and GUE

A Sparse SVD Method for High-dimensional Data