Source author record

Dong Xia

Dong Xia appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.ST Statistics Theory Information Theory math.IT Methodology math.OC Applications math.NT math.PR Social and Information Networks

Catalog footprint

What is connected

15works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Prediction-powered Inference by Mixture of Experts

The rapidly expanding artificial intelligence (AI) industry has produced diverse yet powerful prediction tools, each with its own network architecture, training strategy, data-processing pipeline, and domain-specific strengths. These tools create new opportunities for semi-supervised inference, in which labeled data are limited and expensive to obtain, whereas unlabeled data are abundant and widely available. Given a collection of predictors, we treat them as a mixture of experts (MOE) and introduce an MOE-powered semi-supervised inference framework built upon prediction-powered inference (PPI). Motivated by the variance reduction principle underlying PPI, the proposed framework seeks the mixture of experts that achieves the smallest possible variance. Compared with standard PPI, the MOE-powered inference framework adapts to the unknown performance of individual predictors, benefits from their collective predictive power, and enjoys a best-expert guarantee. The framework is flexible and applies to mean estimation, linear regression, quantile estimation, and general M-estimation. We develop non-asymptotic theory for the MOE-powered inference framework and establish upper bounds on the coverage error of the resulting confidence intervals. Numerical experiments demonstrate the practical effectiveness of MOE-powered inference and corroborate our theoretical findings.

preprint2022arXiv

Generalized Low-rank plus Sparse Tensor Estimation by Fast Riemannian Optimization

We investigate a generalized framework to estimate a latent low-rank plus sparse tensor, where the low-rank tensor often captures the multi-way principal components and the sparse tensor accounts for potential model mis-specifications or heterogeneous signals that are unexplainable by the low-rank part. The framework is flexible covering both linear and non-linear models, and can easily handle continuous or categorical variables. We propose a fast algorithm by integrating the Riemannian gradient descent and a novel gradient pruning procedure. Under suitable conditions, the algorithm converges linearly and can simultaneously estimate both the low-rank and sparse tensors. The statistical error bounds of final estimates are established in terms of the gradient of loss function. The error bounds are generally sharp under specific statistical models, e.g., the robust tensor PCA and the community detection in hypergraph networks with outlier vertices. Moreover, our method achieves non-trivial error bounds for heavy-tailed tensor PCA whenever the noise has a finite $2+\varepsilon$ moment. We apply our method to analyze the international trade flow dataset and the statistician hypergraph co-authorship network, both yielding new and interesting findings.

preprint2022arXiv

Optimal Estimation and Computational Limit of Low-rank Gaussian Mixtures

Structural matrix-variate observations routinely arise in diverse fields such as multi-layer network analysis and brain image clustering. While data of this type have been extensively investigated with fruitful outcomes being delivered, the fundamental questions like its statistical optimality and computational limit are largely under-explored. In this paper, we propose a low-rank Gaussian mixture model (LrMM) assuming each matrix-valued observation has a planted low-rank structure. Minimax lower bounds for estimating the underlying low-rank matrix are established allowing a whole range of sample sizes and signal strength. Under a minimal condition on signal strength, referred to as the information-theoretical limit or statistical limit, we prove the minimax optimality of a maximum likelihood estimator which, in general, is computationally infeasible. If the signal is stronger than a certain threshold, called the computational limit, we design a computationally fast estimator based on spectral aggregation and demonstrate its minimax optimality. Moreover, when the signal strength is smaller than the computational limit, we provide evidences based on the low-degree likelihood ratio framework to claim that no polynomial-time algorithm can consistently recover the underlying low-rank matrix. Our results reveal multiple phase transitions in the minimax error rates and the statistical-to-computational gap. Numerical experiments confirm our theoretical findings. We further showcase the merit of our spectral aggregation method on the worldwide food trading dataset.

preprint2022arXiv

Provable Tensor-Train Format Tensor Completion by Riemannian Optimization

The tensor train (TT) format enjoys appealing advantages in handling structural high-order tensors. The recent decade has witnessed the wide applications of TT-format tensors from diverse disciplines, among which tensor completion has drawn considerable attention. Numerous fast algorithms, including the Riemannian gradient descent (RGrad), have been proposed for the TT-format tensor completion. However, the theoretical guarantees of these algorithms are largely missing or sub-optimal, partly due to the complicated and recursive algebraic operations in TT-format decomposition. Moreover, existing results established for the tensors of other formats, for example, Tucker and CP, are inapplicable because the algorithms treating TT-format tensors are substantially different and more involved. In this paper, we provide, to our best knowledge, the first theoretical guarantees of the convergence of RGrad algorithm for TT-format tensor completion, under a nearly optimal sample size condition. The RGrad algorithm converges linearly with a constant contraction rate that is free of tensor condition number without the necessity of re-conditioning. We also propose a novel approach, referred to as the sequential second-order moment method, to attain a warm initialization under a similar sample size requirement. As a byproduct, our result even significantly refines the prior investigation of RGrad algorithm for matrix completion. Lastly, statistically (near) optimal rate is derived for RGrad algorithm if the observed entries consist of random sub-Gaussian noise. Numerical experiments confirm our theoretical discovery and showcase the computational speedup gained by the TT-format decomposition.

preprint2020arXiv

Community Detection for Hypergraph Networks via Regularized Tensor Power Iteration

To date, social network analysis has been largely focused on pairwise interactions. The study of higher-order interactions, via a hypergraph network, brings in new insights. We study community detection in a hypergraph network. A popular approach is to project the hypergraph to a graph and then apply community detection methods for graph networks, but we show that this approach may cause unwanted information loss. We propose a new method for community detection that operates directly on the hypergraph. At the heart of our method is a regularized higher-order orthogonal iteration (reg-HOOI) algorithm that computes an approximate low-rank decomposition of the network adjacency tensor. Compared with existing tensor decomposition methods such as HOSVD and vanilla HOOI, reg-HOOI yields better performance, especially when the hypergraph is sparse. Given the output of tensor decomposition, we then generalize the community detection method SCORE (Jin, 2015) from graph networks to hypergraph networks. We call our new method Tensor-SCORE. In theory, we introduce a degree-corrected block model for hypergraphs (hDCBM), and show that Tensor-SCORE yields consistent community detection for a wide range of network sparsity and degree heterogeneity. As a byproduct, we derive the rates of convergence on estimating the principal subspace by reg-HOOI, with different initializations, including the two new initialization methods we propose, a diagonal-removed HOSVD and a randomized graph projection. We apply our method to several real hypergraph networks which yields encouraging results. It suggests that exploring higher-order interactions provides additional information not seen in graph representations.

preprint2020arXiv

Community Detection on Mixture Multi-layer Networks via Regularized Tensor Decomposition

We study the problem of community detection in multi-layer networks, where pairs of nodes can be related in multiple modalities. We introduce a general framework, i.e., mixture multi-layer stochastic block model (MMSBM), which includes many earlier models as special cases. We propose a tensor-based algorithm (TWIST) to reveal both global/local memberships of nodes, and memberships of layers. We show that the TWIST procedure can accurately detect the communities with small misclassification error as the number of nodes and/or the number of layers increases. Numerical studies confirm our theoretical findings. To our best knowledge, this is the first systematic study on the mixture multi-layer networks using tensor decomposition. The method is applied to two real datasets: worldwide trading networks and malaria parasite genes networks, yielding new and interesting findings.

preprint2020arXiv

Deterministic Zeckendorf Games

Zeckendorf proved that every positive integer can be written uniquely as the sum of non-adjacent Fibonacci numbers. We further explore a two-player Zeckendorf game introduced in Baird-Smith, Epstein, Flint, and Miller: Given a fixed integer $n$ and an initial decomposition of $n = nF_1$, players alternate using moves related to the recurrence relation $F_{n+1} = F_n + F_{n_1}$, and the last player to move wins. We improve the upper bound on the number of moves possible and show that it is of the same order in $n$ as the lower bound; this is an improvement by a logarithm over previous work. The new upper bound is $3n - 3Z(n) - IZ(n) + 1$, and the existing lower bound is sharp at $n - Z(n)$ moves, where $Z(n)$ is the number of terms in the Zeckendorf decomposition of $n$ and $IZ(n)$ is the sum of indices in the same Zeckendorf decomposition of $n$. We also studied four deterministic variants of the game, where there was a fixed order on which available move one takes: Combine Largest, Split Largest, Combine Smallest and Split Smallest. We prove that Combine Largest and Split Largest realize the lower bound. Split Smallest has the largest number of moves over all possible games, and is close to the new upper bound. For Combine Split games, the number of moves grows linearly with $n$.

preprint2020arXiv

Statistical Inferences of Linear Forms for Noisy Matrix Completion

We introduce a flexible framework for making inferences about general linear forms of a large matrix based on noisy observations of a subset of its entries. In particular, under mild regularity conditions, we develop a universal procedure to construct asymptotically normal estimators of its linear forms through double-sample debiasing and low-rank projection whenever an entry-wise consistent estimator of the matrix is available. These estimators allow us to subsequently construct confidence intervals for and test hypotheses about the linear forms. Our proposal was motivated by a careful perturbation analysis of the empirical singular spaces under the noisy matrix completion model which might be of independent interest. The practical merits of our proposed inference procedure are demonstrated on both simulated and real-world data examples.

preprint2020arXiv

Tensor SVD: Statistical and Computational Limits

In this paper, we propose a general framework for tensor singular value decomposition (tensor SVD), which focuses on the methodology and theory for extracting the hidden low-rank structure from high-dimensional tensor data. Comprehensive results are developed on both the statistical and computational limits for tensor SVD. This problem exhibits three different phases according to the signal-to-noise ratio (SNR). In particular, with strong SNR, we show that the classical higher-order orthogonal iteration achieves the minimax optimal rate of convergence in estimation; with weak SNR, the information-theoretical lower bound implies that it is impossible to have consistent estimation in general; with moderate SNR, we show that the non-convex maximum likelihood estimation provides optimal solution, but with NP-hard computational cost; moreover, under the hardness hypothesis of hypergraphic planted clique detection, there are no polynomial-time algorithms performing consistently in general.

preprint2016arXiv

Estimation of low rank density matrices: bounds in Schatten norms and other distances

Let ${\mathcal S}_m$ be the set of all $m\times m$ density matrices (Hermitian positively semi-definite matrices of unit trace). Consider a problem of estimation of an unknown density matrix $ρ\in {\mathcal S}_m$ based on outcomes of $n$ measurements of observables $X_1,\dots, X_n\in {\mathbb H}_m$ (${\mathbb H}_m$ being the space of $m\times m$ Hermitian matrices) for a quantum system identically prepared $n$ times in state $ρ.$ Outcomes $Y_1,\dots, Y_n$ of such measurements could be described by a trace regression model in which ${\mathbb E}_ρ(Y_j|X_j)={\rm tr}(ρX_j), j=1,\dots, n.$ The design variables $X_1,\dots, X_n$ are often sampled at random from the uniform distribution in an orthonormal basis $\{E_1,\dots, E_{m^2}\}$ of ${\mathbb H}_m$ (such as Pauli basis). The goal is to estimate the unknown density matrix $ρ$ based on the data $(X_1,Y_1), \dots, (X_n,Y_n).$ Let $$ \hat Z:=\frac{m^2}{n}\sum_{j=1}^n Y_j X_j $$ and let $\check ρ$ be the projection of $\hat Z$ onto the convex set ${\mathcal S}_m$ of density matrices. It is shown that for estimator $\check ρ$ the minimax lower bounds in classes of low rank density matrices (established earlier) are attained up logarithmic factors for all Schatten $p$-norm distances, $p\in [1,\infty]$ and for Bures version of quantum Hellinger distance. Moreover, for a slightly modified version of estimator $\check ρ$ the same property holds also for quantum relative entropy (Kullback-Leibler) distance between density matrices.

preprint2016arXiv

Optimal Estimation of Low Rank Density Matrices

The density matrices are positively semi-definite Hermitian matrices of unit trace that describe the state of a quantum system. The goal of the paper is to develop minimax lower bounds on error rates of estimation of low rank density matrices in trace regression models used in quantum state tomography (in particular, in the case of Pauli measurements) with explicit dependence of the bounds on the rank and other complexity parameters. Such bounds are established for several statistically relevant distances, including quantum versions of Kullback-Leibler divergence (relative entropy distance) and of Hellinger distance (so called Bures distance), and Schatten $p$-norm distances. Sharp upper bounds and oracle inequalities for least squares estimator with von Neumann entropy penalization are obtained showing that minimax lower bounds are attained (up to logarithmic factors) for these distances.

preprint2015arXiv

Exploring Sparsity in Multi-class Linear Discriminant Analysis

Recent studies in the literature have paid much attention to the sparsity in linear classification tasks. One motivation of imposing sparsity assumption on the linear discriminant direction is to rule out the noninformative features, making hardly contribution to the classification problem. Most of those work were focused on the scenarios of binary classification. In the presence of multi-class data, preceding researches recommended individually pairwise sparse linear discriminant analysis(LDA). However, further sparsity should be explored. In this paper, an estimator of grouped LASSO type is proposed to take advantage of sparsity for multi-class data. It enjoys appealing non-asymptotic properties which allows insignificant correlations among features. This estimator exhibits superior capability on both simulated and real data.

preprint2015arXiv

Perturbation of linear forms of singular vectors under Gaussian noise

Let $A\in\mathbb{R}^{m\times n}$ be a matrix of rank $r$ with singular value decomposition (SVD) $A=\sum_{k=1}^rσ_k (u_k\otimes v_k),$ where $\{σ_k, k=1,\ldots,r\}$ are singular values of $A$ (arranged in a non-increasing order) and $u_k\in {\mathbb R}^m, v_k\in {\mathbb R}^n, k=1,\ldots, r$ are the corresponding left and right orthonormal singular vectors. Let $\tilde{A}=A+X$ be a noisy observation of $A,$ where $X\in\mathbb{R}^{m\times n}$ is a random matrix with i.i.d. Gaussian entries, $X_{ij}\sim\mathcal{N}(0,τ^2),$ and consider its SVD $\tilde{A}=\sum_{k=1}^{m\wedge n}\tildeσ_k(\tilde{u}_k\otimes\tilde{v}_k)$ with singular values $\tildeσ_1\geq\ldots\geq\tildeσ_{m\wedge n}$ and singular vectors $\tilde{u}_k,\tilde{v}_k,k=1,\ldots, m\wedge n.$ The goal of this paper is to develop sharp concentration bounds for linear forms $\langle \tilde u_k,x\rangle, x\in {\mathbb R}^m$ and $\langle \tilde v_k,y\rangle, y\in {\mathbb R}^n$ of the perturbed (empirical) singular vectors in the case when the singular values of $A$ are distinct and, more generally, concentration bounds for bilinear forms of projection operators associated with SVD. In particular, the results imply upper bounds of the order $O\biggl(\sqrt{\frac{\log(m+n)}{m\vee n}}\biggr)$ (holding with a high probability) on $$\max_{1\leq i\leq m}\big|\big<\tilde{u}_k-\sqrt{1+b_k}u_k,e_i^m\big>\big|\ \ {\rm and} \ \ \max_{1\leq j\leq n}\big|\big<\tilde{v}_k-\sqrt{1+b_k}v_k,e_j^n\big>\big|,$$ where $b_k$ are properly chosen constants characterizing the bias of empirical singular vectors $\tilde u_k, \tilde v_k$ and $\{e_i^m,i=1,\ldots,m\}, \{e_j^n,j=1,\ldots,n\}$ are the canonical bases of $\mathbb{R}^m, {\mathbb R}^n,$ respectively.

preprint2014arXiv

Optimal Schatten-q and Ky-Fan-k Norm Rate of Low Rank Matrix Estimation

In this paper, we consider low rank matrix estimation using either matrix-version Dantzig Selector $\hat{A}_λ^d$ or matrix-version LASSO estimator $\hat{A}_λ^L$. We consider sub-Gaussian measurements, $i.e.$, the measurements $X_1,\ldots,X_n\in\mathbb{R}^{m\times m}$ have $i.i.d.$ sub-Gaussian entries. Suppose $\textrm{rank}(A_0)=r$. We proved that, when $n\geq Cm[r^2\vee r\log(m)\log(n)]$ for some $C>0$, both $\hat{A}_λ^d$ and $\hat{A}_λ^L$ can obtain optimal upper bounds(except some logarithmic terms) for estimation accuracy under spectral norm. By applying metric entropy of Grassmann manifolds, we construct (near) matching minimax lower bound for estimation accuracy under spectral norm. We also give upper bounds and matching minimax lower bound(except some logarithmic terms) for estimation accuracy under Schatten-q norm for every $1\leq q\leq\infty$. As a direct corollary, we show both upper bounds and minimax lower bounds of estimation accuracy under Ky-Fan-k norms for every $1\leq k\leq m$.

preprint2011arXiv

Energy-Efficient Full Diversity Collaborative Unitary Space-Time Block Code Design via Unique Factorization of Signals

In this paper, a novel concept called a \textit{uniquely factorable constellation pair} (UFCP) is proposed for the systematic design of a noncoherent full diversity collaborative unitary space-time block code by normalizing two Alamouti codes for a wireless communication system having two transmitter antennas and a single receiver antenna. It is proved that such a unitary UFCP code assures the unique identification of both channel coefficients and transmitted signals in a noise-free case as well as full diversity for the noncoherent maximum likelihood (ML) receiver in a noise case. To further improve error performance, an optimal unitary UFCP code is designed by appropriately and uniquely factorizing a pair of energy-efficient cross quadrature amplitude modulation (QAM) constellations to maximize the coding gain subject to a transmission bit rate constraint. After a deep investigation of the fractional coding gain function, a technical approach developed in this paper to maximizing the coding gain is to carefully design an energy scale to compress the first three largest energy points in the corner of the QAM constellations in the denominator of the objective as well as carefully design a constellation triple forming two UFCPs, with one collaborating with the other two so as to make the accumulated minimum Euclidean distance along the two transmitter antennas in the numerator of the objective as large as possible and at the same time, to avoid as many corner points of the QAM constellations with the largest energy as possible to achieve the minimum of the numerator. In other words, the optimal coding gain is attained by intelligent constellations collaboration and efficient energy compression.

Dong Xia

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

Prediction-powered Inference by Mixture of Experts

Generalized Low-rank plus Sparse Tensor Estimation by Fast Riemannian Optimization

Optimal Estimation and Computational Limit of Low-rank Gaussian Mixtures

Provable Tensor-Train Format Tensor Completion by Riemannian Optimization

Community Detection for Hypergraph Networks via Regularized Tensor Power Iteration

Community Detection on Mixture Multi-layer Networks via Regularized Tensor Decomposition

Deterministic Zeckendorf Games

Statistical Inferences of Linear Forms for Noisy Matrix Completion

Tensor SVD: Statistical and Computational Limits

Estimation of low rank density matrices: bounds in Schatten norms and other distances

Optimal Estimation of Low Rank Density Matrices

Exploring Sparsity in Multi-class Linear Discriminant Analysis

Perturbation of linear forms of singular vectors under Gaussian noise

Optimal Schatten-q and Ky-Fan-k Norm Rate of Low Rank Matrix Estimation

Energy-Efficient Full Diversity Collaborative Unitary Space-Time Block Code Design via Unique Factorization of Signals