Researcher profile

Zongming Ma

Zongming Ma contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2023arXiv

Community Detection with Contextual Multilayer Networks

In this paper, we study community detection when we observe $m$ sparse networks and a high dimensional covariate matrix, all encoding the same community structure among $n$ subjects. In the asymptotic regime where the number of features $p$ and the number of subjects $n$ grows proportionally, we derive an exact formula of asymptotic minimum mean square error (MMSE) for estimating the common community structure in the balanced two block case. The formula implies the necessity of integrating information from multiple data sources. Consequently, it induces a sharp threshold of phase transition between the regime where detection (i.e., weak recovery) is possible and the regime where no procedure performs better than a random guess. The asymptotic MMSE depends on the covariate signal-to-noise ratio in a more subtle way than the phase transition threshold does. In the special case of $m=1$, our asymptotic MMSE formula complements the pioneering work of Deshpande et. al. (2018) which found the sharp threshold when $m=1$.

preprint2022arXiv

Global and Individualized Community Detection in Inhomogeneous Multilayer Networks

In network applications, it has become increasingly common to obtain datasets in the form of multiple networks observed on the same set of subjects, where each network is obtained in a related but different experiment condition or application scenario. Such datasets can be modeled by multilayer networks where each layer is a separate network itself while different layers are associated and share some common information. The present paper studies community detection in a stylized yet informative inhomogeneous multilayer network model. In our model, layers are generated by different stochastic block models, the community structures of which are (random) perturbations of a common global structure while the connecting probabilities in different layers are not related. Focusing on the symmetric two block case, we establish minimax rates for both global estimation of the common structure and individualized estimation of layer-wise community structures. Both minimax rates have sharp exponents. In addition, we provide an efficient algorithm that is simultaneously asymptotic minimax optimal for both estimation tasks under mild conditions. The optimal rates depend on the parity of the number of most informative layers, a phenomenon that is caused by inhomogeneity across layers. The method is extended to handle multiple and potentially asymmetric community cases. We demonstrate its effectiveness on both simulated examples and a real multi-modal single-cell dataset.

preprint2022arXiv

Nonconvex Matrix Completion with Linearly Parameterized Factors

Techniques of matrix completion aim to impute a large portion of missing entries in a data matrix through a small portion of observed ones. In practice including collaborative filtering, prior information and special structures are usually employed in order to improve the accuracy of matrix completion. In this paper, we propose a unified nonconvex optimization framework for matrix completion with linearly parameterized factors. In particular, by introducing a condition referred to as Correlated Parametric Factorization, we can conduct a unified geometric analysis for the nonconvex objective by establishing uniform upper bounds for low-rank estimation resulting from any local minimum. Perhaps surprisingly, the condition of Correlated Parametric Factorization holds for important examples including subspace-constrained matrix completion and skew-symmetric matrix completion. The effectiveness of our unified nonconvex optimization method is also empirically illustrated by extensive numerical simulations.

preprint2022arXiv

Sample canonical correlation coefficients of high-dimensional random vectors with finite rank correlations

Consider two random vectors $\widetilde{\mathbf x} \in \mathbb R^p$ and $\widetilde{\mathbf y} \in \mathbb R^q$ of the forms $\widetilde{\mathbf x}=A\mathbf z+\mathbf C_1^{1/2}\mathbf x$ and $\widetilde{\mathbf y}=B\mathbf z+\mathbf C_2^{1/2}\mathbf y$, where $\mathbf x\in \mathbb R^p$, $\mathbf y\in \mathbb R^q$ and $\mathbf z\in \mathbb R^r$ are independent vectors with i.i.d. entries of mean 0 and variance 1, $\mathbf C_1$ and $\mathbf C_2$ are $p \times p$ and $q\times q$ deterministic covariance matrices, and $A$ and $B$ are $p\times r$ and $q\times r$ deterministic matrices. With $n$ independent observations of $(\widetilde{\mathbf x},\widetilde{\mathbf y})$, we study the sample canonical correlations between $\widetilde{\mathbf x} $ and $\widetilde{\mathbf y}$. We consider the high-dimensional setting with finite rank correlations. Let $t_1\ge t_2 \ge \cdots\ge t_r$ be the squares of the nontrivial population canonical correlation coefficients, and let $\widetildeλ_1 \ge\widetildeλ_2\ge\cdots\ge\widetildeλ_{p\wedge q}$ be the squares of the sample canonical correlation coefficients. If the entries of $\mathbf x$, $\mathbf y$ and $\mathbf z$ are i.i.d. Gaussian, then the following dichotomy has been shown in [7] for a fixed threshold $t_c \in(0, 1)$: for $1\le i \le r$, if $t_i < t_c$, then $\widetildeλ_i$ converges to the right-edge $λ_+$ of the limiting eigenvalue spectrum of the sample canonical correlation matrix; if $t_i>t_c$, then $\widetildeλ_i$ converges to a deterministic limit $θ_i \in (λ_+,1)$ determined by $t_i$. In this paper, we prove that these results hold universally under the sharp fourth moment conditions on the entries of $\mathbf x$ and $\mathbf y$. Moreover, we prove the results in full generality, in the sense that they also hold for near-degenerate $t_i$&#39;s and for $t_i$&#39;s that are close to the threshold $t_c$.

preprint2020arXiv

Community detection in sparse latent space models

We show that a simple community detection algorithm originated from stochastic blockmodel literature achieves consistency, and even optimality, for a broad and flexible class of sparse latent space models. The class of models includes latent eigenmodels (arXiv:0711.1146). The community detection algorithm is based on spectral clustering followed by local refinement via normalized edge counting.

preprint2020arXiv

Efficient random graph matching via degree profiles

Random graph matching refers to recovering the underlying vertex correspondence between two random graphs with correlated edges; a prominent example is when the two random graphs are given by Erdős-Rényi graphs $G(n,\frac{d}{n})$. This can be viewed as an average-case and noisy version of the graph isomorphism problem. Under this model, the maximum likelihood estimator is equivalent to solving the intractable quadratic assignment problem. This work develops an $\tilde{O}(n d^2+n^2)$-time algorithm which perfectly recovers the true vertex correspondence with high probability, provided that the average degree is at least $d = Ω(\log^2 n)$ and the two graphs differ by at most $δ= O( \log^{-2}(n) )$ fraction of edges. For dense graphs and sparse graphs, this can be improved to $δ= O( \log^{-2/3}(n) )$ and $δ= O( \log^{-2}(d) )$ respectively, both in polynomial time. The methodology is based on appropriately chosen distance statistics of the degree profiles (empirical distribution of the degrees of neighbors). Before this work, the best known result achieves $δ=O(1)$ and $n^{o(1)} \leq d \leq n^c$ for some constant $c$ with an $n^{O(\log n)}$-time algorithm \cite{barak2018nearly} and $δ=\tilde O((d/n)^4)$ and $d = \tildeΩ(n^{4/5})$ with a polynomial-time algorithm \cite{dai2018performance}.