Source author record

T. Tony Cai

T. Tony Cai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory Machine Learning Methodology Information Theory math.IT math.PR Applications Distributed, Parallel, and Cluster Computing math.NA math.OC

Catalog footprint

What is connected

50works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

Nonparametric Bandits with Single-Index Rewards: Optimality and Adaptivity

Contextual bandits are a central framework for sequential decision-making, with applications ranging from recommendation systems to clinical trials. While nonparametric methods can flexibly model complex reward structures, they suffer from the curse of dimensionality. We address this challenge using a single-index model, which projects high-dimensional covariates onto a one-dimensional subspace while preserving nonparametric flexibility. We first develop a nonasymptotic theory for offline single-index regression for each arm, combining maximum rank correlation for index estimation with local polynomial regression. Building on this foundation, we propose a single-index bandit algorithm and establish its convergence rate. We further derive a matching lower bound, showing that the algorithm achieves minimax-optimal regret independent of the ambient dimension $d$, thereby overcoming the curse of dimensionality. We also establish an impossibility result for adaptation: without additional assumptions, no policy can adapt to unknown smoothness levels. Under a standard self-similarity condition, however, we construct a policy that remains minimax-optimal while automatically adapting to the unknown smoothness. Finally, as the dimension $d$ increases, our algorithm continues to achieve minimax-optimal regret, revealing a phase transition that characterizes the fundamental limits of single-index bandit learning.

preprint2022arXiv

Estimation and Inference with Proxy Data and its Genetic Applications

Existing high-dimensional statistical methods are largely established for analyzing individual-level data. In this work, we study estimation and inference for high-dimensional linear models where we only observe "proxy data", which include the marginal statistics and sample covariance matrix that are computed based on different sets of individuals. We develop a rate optimal method for estimation and inference for the regression coefficient vector and its linear functionals based on the proxy data. Moreover, we show the intrinsic limitations in the proxy-data based inference: the minimax optimal rate for estimation is slower than that in the conventional case where individual data are observed; the power for testing and multiple testing does not go to one as the signal strength goes to infinity. These interesting findings are illustrated through simulation studies and an analysis of a dataset concerning the genetic associations of hindlimb muscle weight in a mouse population.

preprint2022arXiv

On the Non-Asymptotic Concentration of Heteroskedastic Wishart-type Matrix

This paper focuses on the non-asymptotic concentration of the heteroskedastic Wishart-type matrices. Suppose $Z$ is a $p_1$-by-$p_2$ random matrix and $Z_{ij} \sim N(0,σ_{ij}^2)$ independently, we prove the expected spectral norm of Wishart matrix deviations (i.e., $\mathbb{E} \left\|ZZ^\top - \mathbb{E} ZZ^\top\right\|$) is upper bounded by \begin{equation*} \begin{split} (1+ε)\left\{2σ_Cσ_R + σ_C^2 + Cσ_Rσ_*\sqrt{\log(p_1 \wedge p_2)} + Cσ_*^2\log(p_1 \wedge p_2)\right\}, \end{split} \end{equation*} where $σ_C^2 := \max_j \sum_{i=1}^{p_1}σ_{ij}^2$, $σ_R^2 := \max_i \sum_{j=1}^{p_2}σ_{ij}^2$ and $σ_*^2 := \max_{i,j}σ_{ij}^2$. A minimax lower bound is developed that matches this upper bound. Then, we derive the concentration inequalities, moments, and tail bounds for the heteroskedastic Wishart-type matrix under more general distributions, such as sub-Gaussian and heavy-tailed distributions. Next, we consider the cases where $Z$ has homoskedastic columns or rows (i.e., $σ_{ij} \approx σ_i$ or $σ_{ij} \approx σ_j$) and derive the rate-optimal Wishart-type concentration bounds. Finally, we apply the developed tools to identify the sharp signal-to-noise ratio threshold for consistent clustering in the heteroskedastic clustering problem.

preprint2022arXiv

Sparse Group Lasso: Optimal Sample Complexity, Convergence Rate, and Statistical Inference

We study sparse group Lasso for high-dimensional double sparse linear regression, where the parameter of interest is simultaneously element-wise and group-wise sparse. This problem is an important instance of the simultaneously structured model -- an actively studied topic in statistics and machine learning. In the noiseless case, matching upper and lower bounds on sample complexity are established for the exact recovery of sparse vectors and for stable estimation of approximately sparse vectors, respectively. In the noisy case, upper and matching minimax lower bounds for estimation error are obtained. We also consider the debiased sparse group Lasso and investigate its asymptotic property for the purpose of statistical inference. Finally, numerical studies are provided to support the theoretical results.

preprint2020arXiv

Distributed Gaussian Mean Estimation under Communication Constraints: Optimal Rates and Communication-Efficient Algorithms

We study distributed estimation of a Gaussian mean under communication constraints in a decision theoretical framework. Minimax rates of convergence, which characterize the tradeoff between the communication costs and statistical accuracy, are established in both the univariate and multivariate settings. Communication-efficient and statistically optimal procedures are developed. In the univariate case, the optimal rate depends only on the total communication budget, so long as each local machine has at least one bit. However, in the multivariate case, the minimax rate depends on the specific allocations of the communication budgets among the local machines. Although optimal estimation of a Gaussian mean is relatively simple in the conventional setting, it is quite involved under the communication constraints, both in terms of the optimal procedure design and lower bound argument. The techniques developed in this paper can be of independent interest. An essential step is the decomposition of the minimax estimation problem into two stages, localization and refinement. This critical decomposition provides a framework for both the lower bound analysis and optimal procedure design.

preprint2020arXiv

Optimal Permutation Recovery in Permuted Monotone Matrix Model

Motivated by recent research on quantifying bacterial growth dynamics based on genome assemblies, we consider a permuted monotone matrix model $Y=ΘΠ+Z$, where the rows represent different samples, the columns represent contigs in genome assemblies and the elements represent log-read counts after preprocessing steps and Guanine-Cytosine (GC) adjustment. In this model, $Θ$ is an unknown mean matrix with monotone entries for each row, $Π$ is a permutation matrix that permutes the columns of $Θ$, and $Z$ is a noise matrix. This paper studies the problem of estimation/recovery of $Π$ given the observed noisy matrix $Y$. We propose an estimator based on the best linear projection, which is shown to be minimax rate-optimal for both exact recovery, as measured by the 0-1 loss, and partial recovery, as quantified by the normalized Kendall's tau distance. Simulation studies demonstrate the superior empirical performance of the proposed estimator over alternative methods. We demonstrate the methods using a synthetic metagenomics dataset of 45 closely related bacterial species and a real metagenomic dataset to compare the bacterial growth dynamics between the responders and the non-responders of the IBD patients after 8 weeks of treatment.

preprint2020arXiv

Rate-Optimal Perturbation Bounds for Singular Subspaces with Applications to High-Dimensional Statistics

Perturbation bounds for singular spaces, in particular Wedin's $\sin Θ$ theorem, are a fundamental tool in many fields including high-dimensional statistics, machine learning, and applied mathematics. In this paper, we establish separate perturbation bounds, measured in both spectral and Frobenius $\sin Θ$ distances, for the left and right singular subspaces. Lower bounds, which show that the individual perturbation bounds are rate-optimal, are also given. The new perturbation bounds are applicable to a wide range of problems. In this paper, we consider in detail applications to low-rank matrix denoising and singular space estimation, high-dimensional clustering, and canonical correlation analysis (CCA). In particular, separate matching upper and lower bounds are obtained for estimating the left and right singular spaces. To the best of our knowledge, this is the first result that gives different optimal rates for the left and right singular spaces under the same perturbation. In addition to these problems, applications to other high-dimensional problems such as community detection in bipartite networks, multidimensional scaling, and cross-covariance matrix estimation are also discussed.

preprint2020arXiv

Transfer Learning for High-dimensional Linear Regression: Prediction, Estimation, and Minimax Optimality

This paper considers the estimation and prediction of a high-dimensional linear regression in the setting of transfer learning, using samples from the target model as well as auxiliary samples from different but possibly related regression models. When the set of "informative" auxiliary samples is known, an estimator and a predictor are proposed and their optimality is established. The optimal rates of convergence for prediction and estimation are faster than the corresponding rates without using the auxiliary samples. This implies that knowledge from the informative auxiliary samples can be transferred to improve the learning performance of the target problem. In the case that the set of informative auxiliary samples is unknown, we propose a data-driven procedure for transfer learning, called Trans-Lasso, and reveal its robustness to non-informative auxiliary samples and its efficiency in knowledge transfer. The proposed procedures are demonstrated in numerical studies and are applied to a dataset concerning the associations among gene expressions. It is shown that Trans-Lasso leads to improved performance in gene expression prediction in a target tissue by incorporating the data from multiple different tissues as auxiliary samples.

preprint2020arXiv

Two Robust Tools for Inference about Causal Effects with Invalid Instruments

Instrumental variables have been widely used to estimate the causal effect of a treatment on an outcome. Existing confidence intervals for causal effects based on instrumental variables assume that all of the putative instrumental variables are valid; a valid instrumental variable is a variable that affects the outcome only by affecting the treatment and is not related to unmeasured confounders. However, in practice, some of the putative instrumental variables are likely to be invalid. This paper presents two tools to conduct valid inference and tests in the presence of invalid instruments. First, we propose a simple and general approach to construct confidence intervals based on taking unions of well-known confidence intervals. Second, we propose a novel test for the null causal effect based on a collider bias. Our two proposals, especially when fused together, outperform traditional instrumental variable confidence intervals when invalid instruments are present, and can also be used as a sensitivity analysis when there is concern that instrumental variables assumptions are violated. The new approach is applied to a Mendelian randomization study on the causal effect of low-density lipoprotein on the incidence of cardiovascular diseases.

preprint2017arXiv

Weighted Message Passing and Minimum Energy Flow for Heterogeneous Stochastic Block Models with Side Information

We study the misclassification error for community detection in general heterogeneous stochastic block models (SBM) with noisy or partial label information. We establish a connection between the misclassification rate and the notion of minimum energy on the local neighborhood of the SBM. We develop an optimally weighted message passing algorithm to reconstruct labels for SBM based on the minimum energy flow and the eigenvectors of a certain Markov transition matrix. The general SBM considered in this paper allows for unequal-size communities, degree heterogeneity, and different connection probabilities among blocks. We focus on how to optimally weigh the message passing to improve misclassification.

preprint2016arXiv

A simple and robust confidence interval for causal effects with possibly invalid instruments

Instrumental variables have been widely used to estimate the causal effect of a treatment on an outcome. Existing confidence intervals for causal effects based on instrumental variables assume that all of the putative instrumental variables are valid; a valid instrumental variable is a variable that affects the outcome only by affecting the treatment and is not related to unmeasured confounders. However, in practice, some of the putative instrumental variables are likely to be invalid. This paper presents a simple and general approach to construct a confidence interval that is robust to possibly invalid instruments. The robust confidence interval has theoretical guarantees on having the correct coverage and can also be used to assess the sensitivity of inference when instrumental variables assumptions are violated. The paper also shows that the robust confidence interval outperforms traditional confidence intervals popular in instrumental variables literature when invalid instruments are present. The new approach is applied to a developmental economics study of the causal effect of income on food expenditures.

preprint2016arXiv

A Sparse PCA Approach to Clustering

We discuss a clustering method for Gaussian mixture model based on the sparse principal component analysis (SPCA) method and compare it with the IF-PCA method. We also discuss the dependent case where the covariance matrix $Σ$ is not necessarily diagonal.

preprint2016arXiv

Accuracy Assessment for High-dimensional Linear Regression

This paper considers point and interval estimation of the $\ell_q$ loss of an estimator in high-dimensional linear regression with random design. We establish the minimax rate for estimating the $\ell_{q}$ loss and the minimax expected length of confidence intervals for the $\ell_{q}$ loss of rate-optimal estimators of the regression vector, including commonly used estimators such as Lasso, scaled Lasso, square-root Lasso and Dantzig Selector. Adaptivity of the confidence intervals for the $\ell_{q}$ loss is also studied. Both the setting of known identity design covariance matrix and known noise level and the setting of unknown design covariance matrix and unknown noise level are studied. The results reveal interesting and significant differences between estimating the $\ell_2$ loss and $\ell_q$ loss with $1\le q <2$ as well as between the two settings. New technical tools are developed to establish rate sharp lower bounds for the minimax estimation error and the expected length of minimax and adaptive confidence intervals for the $\ell_q$ loss. A significant difference between loss estimation and the traditional parameter estimation is that for loss estimation the constraint is on the performance of the estimator of the regression vector, but the lower bounds are on the difficulty of estimating its $\ell_q$ loss. The technical tools developed in this paper can also be of independent interest.

preprint2016arXiv

Global testing against sparse alternatives in time-frequency analysis

In this paper, an over-sampled periodogram higher criticism (OPHC) test is proposed for the global detection of sparse periodic effects in a complex-valued time series. An explicit minimax detection boundary is established between the rareness and weakness of the complex sinusoids hidden in the series. The OPHC test is shown to be asymptotically powerful in the detectable region. Numerical simulations illustrate and verify the effectiveness of the proposed test. Furthermore, the periodogram over-sampled by $O(\log N)$ is proven universally optimal in global testing for periodicities under a mild minimum separation condition.

preprint2016arXiv

Inference via Message Passing on Partially Labeled Stochastic Block Models

We study the community detection and recovery problem in partially-labeled stochastic block models (SBM). We develop a fast linearized message-passing algorithm to reconstruct labels for SBM (with $n$ nodes, $k$ blocks, $p,q$ intra and inter block connectivity) when $δ$ proportion of node labels are revealed. The signal-to-noise ratio ${\sf SNR}(n,k,p,q,δ)$ is shown to characterize the fundamental limitations of inference via local algorithms. On the one hand, when ${\sf SNR}>1$, the linearized message-passing algorithm provides the statistical inference guarantee with mis-classification rate at most $\exp(-({\sf SNR}-1)/2)$, thus interpolating smoothly between strong and weak consistency. This exponential dependence improves upon the known error rate $({\sf SNR}-1)^{-1}$ in the literature on weak recovery. On the other hand, when ${\sf SNR}<1$ (for $k=2$) and ${\sf SNR}<1/4$ (for general growing $k$), we prove that local algorithms suffer an error rate at least $\frac{1}{2} - \sqrt{δ\cdot {\sf SNR}}$, which is only slightly better than random guess for small $δ$.

preprint2016arXiv

Minimax Rate-optimal Estimation of High-dimensional Covariance Matrices with Incomplete Data

Missing data occur frequently in a wide range of applications. In this paper, we consider estimation of high-dimensional covariance matrices in the presence of missing observations under a general missing completely at random model in the sense that the missingness is not dependent on the values of the data. Based on incomplete data, estimators for bandable and sparse covariance matrices are proposed and their theoretical and numerical properties are investigated. Minimax rates of convergence are established under the spectral norm loss and the proposed estimators are shown to be rate-optimal under mild regularity conditions. Simulation studies demonstrate that the estimators perform well numerically. The methods are also illustrated through an application to data from four ovarian cancer studies. The key technical tools developed in this paper are of independent interest and potentially useful for a range of related problems in high-dimensional statistical inference with missing data.

preprint2016arXiv

On Detection and Structural Reconstruction of Small-World Random Networks

In this paper, we study detection and fast reconstruction of the celebrated Watts-Strogatz (WS) small-world random graph model \citep{watts1998collective} which aims to describe real-world complex networks that exhibit both high clustering and short average length properties. The WS model with neighborhood size $k$ and rewiring probability probability $β$ can be viewed as a continuous interpolation between a deterministic ring lattice graph and the Erdős-Rényi random graph. We study both the computational and statistical aspects of detecting the deterministic ring lattice structure (or local geographical links, strong ties) in the presence of random connections (or long range links, weak ties), and for its recovery. The phase diagram in terms of $(k,β)$ is partitioned into several regions according to the difficulty of the problem. We propose distinct methods for the various regions.

preprint2016arXiv

Optimal Estimation of Co-heritability in High-dimensional Linear Models

Co-heritability is an important concept that characterizes the genetic associations within pairs of quantitative traits. There has been significant recent interest in estimating the co-heritability based on data from the genome-wide association studies (GWAS). This paper introduces two measures of co-heritability in the high-dimensional linear model framework, including the inner product of the two regression vectors and a normalized inner product by their lengths. Functional de-biased estimators (FDEs) are developed to estimate these two co-heritability measures. In addition, estimators of quadratic functionals of the regression vectors are proposed. Both theoretical and numerical properties of the estimators are investigated. In particular, minimax rates of convergence are established and the proposed estimators of the inner product, the quadratic functionals and the normalized inner product are shown to be rate-optimal. Simulation results show that the FDEs significantly outperform the naive plug-in estimates. The FDEs are also applied to analyze a yeast segregant data set with multiple traits to estimate heritability and co-heritability among the traits.

preprint2015arXiv

Computational and Statistical Boundaries for Submatrix Localization in a Large Noisy Matrix

The interplay between computational efficiency and statistical accuracy in high-dimensional inference has drawn increasing attention in the literature. In this paper, we study computational and statistical boundaries for submatrix localization. Given one observation of (one or multiple non-overlapping) signal submatrix (of magnitude $λ$ and size $k_m \times k_n$) contaminated with a noise matrix (of size $m \times n$), we establish two transition thresholds for the signal to noise $λ/σ$ ratio in terms of $m$, $n$, $k_m$, and $k_n$. The first threshold, $\sf SNR_c$, corresponds to the computational boundary. Below this threshold, it is shown that no polynomial time algorithm can succeed in identifying the submatrix, under the \textit{hidden clique hypothesis}. We introduce adaptive linear time spectral algorithms that identify the submatrix with high probability when the signal strength is above the threshold $\sf SNR_c$. The second threshold, $\sf SNR_s$, captures the statistical boundary, below which no method can succeed with probability going to one in the minimax sense. The exhaustive search method successfully finds the submatrix above this threshold. The results show an interesting phenomenon that $\sf SNR_c$ is always significantly larger than $\sf SNR_s$, which implies an essential gap between statistical optimality and computational efficiency for submatrix localization.

preprint2015arXiv

Confidence Intervals for High-Dimensional Linear Regression: Minimax Rates and Adaptivity

Confidence sets play a fundamental role in statistical inference. In this paper, we consider confidence intervals for high dimensional linear regression with random design. We first establish the convergence rates of the minimax expected length for confidence intervals in the oracle setting where the sparsity parameter is given. The focus is then on the problem of adaptation to sparsity for the construction of confidence intervals. Ideally, an adaptive confidence interval should have its length automatically adjusted to the sparsity of the unknown regression vector, while maintaining a prespecified coverage probability. It is shown that such a goal is in general not attainable, except when the sparsity parameter is restricted to a small region over which the confidence intervals have the optimal length of the usual parametric rate. It is further demonstrated that the lack of adaptivity is not due to the conservativeness of the minimax framework, but is fundamentally caused by the difficulty of learning the bias accurately.

preprint2015arXiv

Geometric Inference for General High-Dimensional Linear Inverse Problems

This paper presents a unified geometric framework for the statistical analysis of a general ill-posed linear inverse model which includes as special cases noisy compressed sensing, sign vector recovery, trace regression, orthogonal matrix estimation, and noisy matrix completion. We propose computationally feasible convex programs for statistical inference including estimation, confidence intervals and hypothesis testing. A theoretical framework is developed to characterize the local estimation rate of convergence and to provide statistical inference guarantees. Our results are built based on the local conic geometry and duality. The difficulty of statistical inference is captured by the geometric characterization of the local tangent cone through the Gaussian width and Sudakov minoration estimate.

preprint2015arXiv

High-Dimensional Gaussian Copula Regression: Adaptive Estimation and Statistical Inference

We develop adaptive estimation and inference methods for high-dimensional Gaussian copula regression that achieve the same performance without the knowledge of the marginal transformations as that for high-dimensional linear regression. Using a Kendall's tau based covariance matrix estimator, an $\ell_1$ regularized estimator is proposed and a corresponding de-biased estimator is developed for the construction of the confidence intervals and hypothesis tests. Theoretical properties of the procedures are studied and the proposed estimation and inference methods are shown to be adaptive to the unknown monotone marginal transformations. Prediction of the response for a given value of the covariates is also considered. The procedures are easy to implement and perform well numerically. The methods are also applied to analyze the Communities and Crime Unnormalized Data from the UCI Machine Learning Repository.

preprint2015arXiv

Inference for High-dimensional Differential Correlation Matrices

Motivated by differential co-expression analysis in genomics, we consider in this paper estimation and testing of high-dimensional differential correlation matrices. An adaptive thresholding procedure is introduced and theoretical guarantees are given. Minimax rate of convergence is established and the proposed estimator is shown to be adaptively rate-optimal over collections of paired correlation matrices with approximately sparse differences. Simulation results show that the procedure significantly outperforms two other natural methods that are based on separate estimation of the individual correlation matrices. The procedure is also illustrated through an analysis of a breast cancer dataset, which provides evidence at the gene co-expression level that several genes, of which a subset has been previously verified, are associated with the breast cancer. Hypothesis testing on the differential correlation matrices is also considered. A test, which is particularly well suited for testing against sparse alternatives, is introduced. In addition, other related problems, including estimation of a single sparse correlation matrix, estimation of the differential covariance matrices, and estimation of the differential cross-correlation matrices, are also discussed.

preprint2015arXiv

Optimal Estimation of A Quadratic Functional and Detection of Simultaneous Signals

Motivated by applications in genomics, this paper studies the problem of optimal estimation of a quadratic functional of two normal mean vectors, $Q(μ, θ) = \frac{1}{n}\sum_{i=1}^nμ_i^2θ_i^2$, with a particular focus on the case where both mean vectors are sparse. We propose optimal estimators of $Q(μ, θ)$ for different regimes and establish the minimax rates of convergence over a family of parameter spaces. The optimal rates exhibit interesting phase transitions in this family. The simultaneous signal detection problem is also considered under the minimax framework. It is shown that the proposed estimators for $Q(μ, θ)$ naturally lead to optimal testing procedures.

preprint2015arXiv

Optimal Rates of Convergence for Noisy Sparse Phase Retrieval via Thresholded Wirtinger Flow

This paper considers the noisy sparse phase retrieval problem: recovering a sparse signal $x \in \mathbb{R}^p$ from noisy quadratic measurements $y_j = (a_j' x )^2 + ε_j$, $j=1, \ldots, m$, with independent sub-exponential noise $ε_j$. The goals are to understand the effect of the sparsity of $x$ on the estimation precision and to construct a computationally feasible estimator to achieve the optimal rates. Inspired by the Wirtinger Flow [12] proposed for noiseless and non-sparse phase retrieval, a novel thresholded gradient descent algorithm is proposed and it is shown to adaptively achieve the minimax optimal rates of convergence over a wide range of sparsity levels when the $a_j$'s are independent standard Gaussian random vectors, provided that the sample size is sufficiently large compared to the sparsity of $x$.

preprint2015arXiv

Robust and computationally feasible community detection in the presence of arbitrary outlier nodes

Community detection, which aims to cluster $N$ nodes in a given graph into $r$ distinct groups based on the observed undirected edges, is an important problem in network data analysis. In this paper, the popular stochastic block model (SBM) is extended to the generalized stochastic block model (GSBM) that allows for adversarial outlier nodes, which are connected with the other nodes in the graph in an arbitrary way. Under this model, we introduce a procedure using convex optimization followed by $k$-means algorithm with $k=r$. Both theoretical and numerical properties of the method are analyzed. A theoretical guarantee is given for the procedure to accurately detect the communities with small misclassification rate under the setting where the number of clusters can grow with $N$. This theoretical result admits to the best-known result in the literature of computationally feasible community detection in SBM without outliers. Numerical results show that our method is both computationally fast and robust to different kinds of outliers, while some popular computationally fast community detection algorithms, such as spectral clustering applied to adjacency matrices or graph Laplacians, may fail to retrieve the major clusters due to a small portion of outliers. We apply a slight modification of our method to a political blogs data set, showing that our method is competent in practice and comparable to existing computationally feasible methods in the literature. To the best of the authors' knowledge, our result is the first in the literature in terms of clustering communities with fast growing numbers under the GSBM where a portion of arbitrary outlier nodes exist.

preprint2015arXiv

Structured Matrix Completion with Applications to Genomic Data Integration

Matrix completion has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian cancer genomic studies with different extent of genomic measurements, which enables us to construct more accurate prediction rules for ovarian cancer survival.

preprint2014arXiv

Discussion: "A significance test for the lasso"

Discussion of "A significance test for the lasso" by Richard Lockhart, Jonathan Taylor, Ryan J. Tibshirani, Robert Tibshirani [arXiv:1301.7161].

preprint2014arXiv

Instrumental Variables Estimation with Some Invalid Instruments and its Application to Mendelian Randomization

Instrumental variables have been widely used for estimating the causal effect between exposure and outcome. Conventional estimation methods require complete knowledge about all the instruments' validity; a valid instrument must not have a direct effect on the outcome and not be related to unmeasured confounders. Often, this is impractical as highlighted by Mendelian randomization studies where genetic markers are used as instruments and complete knowledge about instruments' validity is equivalent to complete knowledge about the involved genes' functions. In this paper, we propose a method for estimation of causal effects when this complete knowledge is absent. It is shown that causal effects are identified and can be estimated as long as less than $50$% of instruments are invalid, without knowing which of the instruments are invalid. We also introduce conditions for identification when the 50% threshold is violated. A fast penalized $\ell_1$ estimation method, called sisVIVE, is introduced for estimating the causal effect without knowing which instruments are valid, with theoretical guarantees on its performance. The proposed method is demonstrated on simulated data and a real Mendelian randomization study concerning the effect of body mass index on health-related quality of life index. An R package \emph{sisVIVE} is available online.

preprint2014arXiv

Rate-Optimal Detection of Very Short Signal Segments

Motivated by a range of applications in engineering and genomics, we consider in this paper detection of very short signal segments in three settings: signals with known shape, arbitrary signals, and smooth signals. Optimal rates of detection are established for the three cases and rate-optimal detectors are constructed. The detectors are easily implementable and are based on scanning with linear and quadratic statistics. Our analysis reveals both similarities and differences in the strategy and fundamental difficulty of detection among these three settings.

preprint2014arXiv

ROP: Matrix recovery via rank-one projections

Estimation of low-rank matrices is of significant interest in a range of contemporary applications. In this paper, we introduce a rank-one projection model for low-rank matrix recovery and propose a constrained nuclear norm minimization method for stable recovery of low-rank matrices in the noisy case. The procedure is adaptive to the rank and robust against small perturbations. Both upper and lower bounds for the estimation accuracy under the Frobenius norm loss are obtained. The proposed estimator is shown to be rate-optimal under certain conditions. The estimator is easy to implement via convex programming and performs well numerically. The techniques and main results developed in the paper also have implications to other related statistical problems. An application to estimation of spiked covariance matrices from one-dimensional random projections is considered. The results demonstrate that it is still possible to accurately estimate the covariance matrix of a high-dimensional distribution based only on one-dimensional projections.

preprint2014arXiv

Sparse PCA: Optimal rates and adaptive estimation

Principal component analysis (PCA) is one of the most commonly used statistical procedures with a wide range of applications. This paper considers both minimax and adaptive estimation of the principal subspace in the high dimensional setting. Under mild technical conditions, we first establish the optimal rates of convergence for estimating the principal subspace which are sharp with respect to all the parameters, thus providing a complete characterization of the difficulty of the estimation problem in term of the convergence rate. The lower bound is obtained by calculating the local metric entropy and an application of Fano's lemma. The rate optimal estimator is constructed using aggregation, which, however, might not be computationally feasible. We then introduce an adaptive procedure for estimating the principal subspace which is fully data driven and can be computed efficiently. It is shown that the estimator attains the optimal rates of convergence simultaneously over a large collection of the parameter spaces. A key idea in our construction is a reduction scheme which reduces the sparse PCA problem to a high-dimensional multivariate regression problem. This method is potentially also useful for other related problems.

preprint2013arXiv

A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion

We consider in this paper the problem of noisy 1-bit matrix completion under a general non-uniform sampling distribution using the max-norm as a convex relaxation for the rank. A max-norm constrained maximum likelihood estimate is introduced and studied. The rate of convergence for the estimate is obtained. Information-theoretical methods are used to establish a minimax lower bound under the general sampling model. The minimax upper and lower bounds together yield the optimal rate of convergence for the Frobenius norm loss. Computational algorithms and numerical performance are also discussed.

preprint2013arXiv

Adaptive confidence intervals for regression functions under shape constraints

Adaptive confidence intervals for regression functions are constructed under shape constraints of monotonicity and convexity. A natural benchmark is established for the minimum expected length of confidence intervals at a given function in terms of an analytic quantity, the local modulus of continuity. This bound depends not only on the function but also the assumed function class. These benchmarks show that the constructed confidence intervals have near minimum expected length for each individual function, while maintaining a given coverage probability for functions within the class. Such adaptivity is much stronger than adaptive minimaxity over a collection of large parameter spaces.

preprint2013arXiv

Compressed Sensing and Affine Rank Minimization under Restricted Isometry

This paper establishes new restricted isometry conditions for compressed sensing and affine rank minimization. It is shown for compressed sensing that $δ_{k}^A+θ_{k,k}^A < 1$ guarantees the exact recovery of all $k$ sparse signals in the noiseless case through the constrained $\ell_1$ minimization. Furthermore, the upper bound 1 is sharp in the sense that for any $ε> 0$, the condition $δ_k^A + θ_{k, k}^A < 1+ε$ is not sufficient to guarantee such exact recovery using any recovery method. Similarly, for affine rank minimization, if $δ_{r}^\mathcal{M}+θ_{r,r}^\mathcal{M}< 1$ then all matrices with rank at most $r$ can be reconstructed exactly in the noiseless case via the constrained nuclear norm minimization; and for any $ε> 0$, $δ_r^\mathcal{M} +θ_{r,r}^\mathcal{M} < 1+ε$ does not ensure such exact recovery using any method. Moreover, in the noisy case the conditions $δ_{k}^A+θ_{k,k}^A < 1$ and $δ_{r}^\mathcal{M}+θ_{r,r}^\mathcal{M}< 1$ are also sufficient for the stable recovery of sparse signals and low-rank matrices respectively. Applications and extensions are also discussed.

preprint2013arXiv

Law of Log Determinant of Sample Covariance Matrix and Optimal Estimation of Differential Entropy for High-Dimensional Gaussian Distributions

Differential entropy and log determinant of the covariance matrix of a multivariate Gaussian distribution have many applications in coding, communications, signal processing and statistical inference. In this paper we consider in the high dimensional setting optimal estimation of the differential entropy and the log-determinant of the covariance matrix. We first establish a central limit theorem for the log determinant of the sample covariance matrix in the high dimensional setting where the dimension $p(n)$ can grow with the sample size $n$. An estimator of the differential entropy and the log determinant is then considered. Optimal rate of convergence is obtained. It is shown that in the case $p(n)/n \rightarrow 0$ the estimator is asymptotically sharp minimax. The ultra-high dimensional setting where $p(n) > n$ is also discussed.

preprint2013arXiv

Optimal hypothesis testing for high dimensional covariance matrices

This paper considers testing a covariance matrix $Σ$ in the high dimensional setting where the dimension $p$ can be comparable or much larger than the sample size $n$. The problem of testing the hypothesis $H_0:Σ=Σ_0$ for a given covariance matrix $Σ_0$ is studied from a minimax point of view. We first characterize the boundary that separates the testable region from the non-testable region by the Frobenius norm when the ratio between the dimension $p$ over the sample size $n$ is bounded. A test based on a $U$-statistic is introduced and is shown to be rate optimal over this asymptotic regime. Furthermore, it is shown that the power of this test uniformly dominates that of the corrected likelihood ratio test (CLRT) over the entire asymptotic regime under which the CLRT is applicable. The power of the $U$-statistic based test is also analyzed when $p/n$ is unbounded.

preprint2013arXiv

Optimal rates of convergence for sparse covariance matrix estimation

This paper considers estimation of sparse covariance matrices and establishes the optimal rate of convergence under a range of matrix operator norm and Bregman divergence losses. A major focus is on the derivation of a rate sharp minimax lower bound. The problem exhibits new features that are significantly different from those that occur in the conventional nonparametric function estimation problems. Standard techniques fail to yield good results, and new tools are thus needed. We first develop a lower bound technique that is particularly well suited for treating "two-directional" problems such as estimating sparse covariance matrices. The result can be viewed as a generalization of Le Cam's method in one direction and Assouad's Lemma in another. This lower bound technique is of independent interest and can be used for other matrix estimation problems. We then establish a rate sharp minimax lower bound for estimating sparse covariance matrices under the spectral norm by applying the general lower bound technique. A thresholding estimator is shown to attain the optimal rate of convergence under the spectral norm. The results are then extended to the general matrix $\ell_w$ operator norms for $1\le w\le \infty$. In addition, we give a unified result on the minimax rate of convergence for sparse covariance matrix estimation under a class of Bregman divergence losses.

preprint2013arXiv

Sharp RIP Bound for Sparse Signal and Low-Rank Matrix Recovery

This paper establishes a sharp condition on the restricted isometry property (RIP) for both the sparse signal recovery and low-rank matrix recovery. It is shown that if the measurement matrix $A$ satisfies the RIP condition $δ_k^A<1/3$, then all $k$-sparse signals $β$ can be recovered exactly via the constrained $\ell_1$ minimization based on $y=Aβ$. Similarly, if the linear map $\cal M$ satisfies the RIP condition $δ_r^{\cal M}<1/3$, then all matrices $X$ of rank at most $r$ can be recovered exactly via the constrained nuclear norm minimization based on $b={\cal M}(X)$. Furthermore, in both cases it is not possible to do so in general when the condition does not hold. In addition, noisy cases are considered and oracle inequalities are given under the sharp RIP condition.

preprint2013arXiv

Sparse Representation of a Polytope and Recovery of Sparse Signals and Low-rank Matrices

This paper considers compressed sensing and affine rank minimization in both noiseless and noisy cases and establishes sharp restricted isometry conditions for sparse signal and low-rank matrix recovery. The analysis relies on a key technical tool which represents points in a polytope by convex combinations of sparse vectors. The technique is elementary while leads to sharp results. It is shown that for any given constant $t\ge {4/3}$, in compressed sensing $δ_{tk}^A < \sqrt{(t-1)/t}$ guarantees the exact recovery of all $k$ sparse signals in the noiseless case through the constrained $\ell_1$ minimization, and similarly in affine rank minimization $δ_{tr}^\mathcal{M}< \sqrt{(t-1)/t}$ ensures the exact reconstruction of all matrices with rank at most $r$ in the noiseless case via the constrained nuclear norm minimization. Moreover, for any $ε>0$, $δ_{tk}^A<\sqrt{\frac{t-1}{t}}+ε$ is not sufficient to guarantee the exact recovery of all $k$-sparse signals for large $k$. Similar result also holds for matrix recovery. In addition, the conditions $δ_{tk}^A < \sqrt{(t-1)/t}$ and $δ_{tr}^\mathcal{M}< \sqrt{(t-1)/t}$ are also shown to be sufficient respectively for stable recovery of approximately sparse signals and low-rank matrices in the noisy case.

preprint2012arXiv

A reproducing kernel Hilbert space approach to functional linear regression

We study in this paper a smoothness regularization method for functional linear regression and provide a unified treatment for both the prediction and estimation problems. By developing a tool on simultaneous diagonalization of two positive definite kernels, we obtain shaper results on the minimax rates of convergence and show that smoothness regularized estimators achieve the optimal rates of convergence for both prediction and estimation under conditions weaker than those for the functional principal components based methods developed in the literature. Despite the generality of the method of regularization, we show that the procedure is easily implementable. Numerical results are obtained to illustrate the merits of the method and to demonstrate the theoretical developments.

preprint2012arXiv

Adaptive covariance matrix estimation through block thresholding

Estimation of large covariance matrices has drawn considerable recent attention, and the theoretical focus so far has mainly been on developing a minimax theory over a fixed parameter space. In this paper, we consider adaptive covariance matrix estimation where the goal is to construct a single procedure which is minimax rate optimal simultaneously over each parameter space in a large collection. A fully data-driven block thresholding estimator is proposed. The estimator is constructed by carefully dividing the sample covariance matrix into blocks and then simultaneously estimating the entries in a block by thresholding. The estimator is shown to be optimally rate adaptive over a wide range of bandable covariance matrices. A simulation study is carried out and shows that the block thresholding estimator performs well numerically. Some of the technical tools developed in this paper can also be of independent interest.

preprint2012arXiv

Estimating Sparse Precision Matrix: Optimal Rates of Convergence and Adaptive Estimation

Precision matrix is of significant importance in a wide range of applications in multivariate analysis. This paper considers adaptive minimax estimation of sparse precision matrices in the high dimensional setting. Optimal rates of convergence are established for a range of matrix norm losses. A fully data driven estimator based on adaptive constrained $\ell_1$ minimization is proposed and its rate of convergence is obtained over a collection of parameter spaces. The estimator, called ACLIME, is easy to implement and performs well numerically. A major step in establishing the minimax rate of convergence is the derivation of a rate-sharp lower bound. A "two-directional" lower bound technique is applied to obtain the minimax lower bound. The upper and lower bounds together yield the optimal rates ofconvergence for sparse precision matrix estimation and show that the ACLIME estimator is adaptively minimax rate optimal for a collection of parameter spaces and a range of matrix norm losses simultaneously.

preprint2012arXiv

Minimax and Adaptive Inference in Nonparametric Function Estimation

Since Stein's 1956 seminal paper, shrinkage has played a fundamental role in both parametric and nonparametric inference. This article discusses minimaxity and adaptive minimaxity in nonparametric function estimation. Three interrelated problems, function estimation under global integrated squared error, estimation under pointwise squared error, and nonparametric confidence intervals, are considered. Shrinkage is pivotal in the development of both the minimax theory and the adaptation theory. While the three problems are closely connected and the minimax theories bear some similarities, the adaptation theories are strikingly different. For example, in a sharp contrast to adaptive point estimation, in many common settings there do not exist nonparametric confidence intervals that adapt to the unknown smoothness of the underlying function. A concise account of these theories is given. The connections as well as differences among these problems are discussed and illustrated through examples.

preprint2012arXiv

Optimal Detection For Sparse Mixtures

Detection of sparse signals arises in a wide range of modern scientific studies. The focus so far has been mainly on Gaussian mixture models. In this paper, we consider the detection problem under a general sparse mixture model and obtain an explicit expression for the detection boundary. It is shown that the fundamental limits of detection is governed by the behavior of the log-likelihood ratio evaluated at an appropriate quantile of the null distribution. We also establish the adaptive optimality of the higher criticism procedure across all sparse mixtures satisfying certain mild regularity conditions. In particular, the general results obtained in this paper recover and extend in a unified manner the previously known results on sparse detection far beyond the conventional Gaussian model and other exponential families.

preprint2012arXiv

Optimal estimation of the mean function based on discretely sampled functional data: Phase transition

The problem of estimating the mean of random functions based on discretely sampled data arises naturally in functional data analysis. In this paper, we study optimal estimation of the mean function under both common and independent designs. Minimax rates of convergence are established and easily implementable rate-optimal estimators are introduced. The analysis reveals interesting and different phase transition phenomena in the two cases. Under the common design, the sampling frequency solely determines the optimal rate of convergence when it is relatively small and the sampling frequency has no effect on the optimal rate when it is large. On the other hand, under the independent design, the optimal rate of convergence is determined jointly by the sampling frequency and the number of curves when the sampling frequency is relatively small. When it is large, the sampling frequency has no effect on the optimal rate. Another interesting contrast between the two settings is that smoothing is necessary under the independent design, while, somewhat surprisingly, it is not essential under the common design.

preprint2011arXiv

Testing composite hypotheses, Hermite polynomials and optimal estimation of a nonsmooth functional

A general lower bound is developed for the minimax risk when estimating an arbitrary functional. The bound is based on testing two composite hypotheses and is shown to be effective in estimating the nonsmooth functional ${\frac{1}{n}}\sum|θ_i|$ from an observation $Y\sim N(θ,I_n)$. This problem exhibits some features that are significantly different from those that occur in estimating conventional smooth functionals. This is a setting where standard techniques fail to yield sharp results. A sharp minimax lower bound is established by applying the general lower bound technique based on testing two composite hypotheses. A key step is the construction of two special priors and bounding the chi-square distance between two normal mixtures. An estimator is constructed using approximation theory and Hermite polynomials and is shown to be asymptotically sharp minimax when the means are bounded by a given value $M$. It is shown that the minimax risk equals $β_*^2M^2({\frac{\log\log n}{\log n}})^2$ asymptotically, where $β_*$ is the Bernstein constant. The general techniques and results developed in the present paper can also be used to solve other related problems.

preprint2010arXiv

Nonparametric regression in exponential families

Most results in nonparametric regression theory are developed only for the case of additive noise. In such a setting many smoothing techniques including wavelet thresholding methods have been developed and shown to be highly adaptive. In this paper we consider nonparametric regression in exponential families with the main focus on the natural exponential families with a quadratic variance function, which include, for example, Poisson regression, binomial regression and gamma regression. We propose a unified approach of using a mean-matching variance stabilizing transformation to turn the relatively complicated problem of nonparametric regression in exponential families into a standard homoscedastic Gaussian regression problem. Then in principle any good nonparametric Gaussian regression procedure can be applied to the transformed data. To illustrate our general methodology, in this paper we use wavelet block thresholding to construct the final estimators of the regression function. The procedures are easily implementable. Both theoretical and numerical properties of the estimators are investigated. The estimators are shown to enjoy a high degree of adaptivity and spatial adaptivity with near-optimal asymptotic performance over a wide range of Besov spaces. The estimators also perform well numerically.

preprint2010arXiv

Optimal rates of convergence for covariance matrix estimation

Covariance matrix plays a central role in multivariate statistical analysis. Significant advances have been made recently on developing both theory and methodology for estimating large covariance matrices. However, a minimax theory has yet been developed. In this paper we establish the optimal rates of convergence for estimating the covariance matrix under both the operator norm and Frobenius norm. It is shown that optimal procedures under the two norms are different and consequently matrix estimation under the operator norm is fundamentally different from vector estimation. The minimax upper bound is obtained by constructing a special class of tapering estimators and by studying their risk properties. A key step in obtaining the optimal rate of convergence is the derivation of the minimax lower bound. The technical analysis requires new ideas that are quite different from those used in the more conventional function/sequence estimation problems.

preprint2010arXiv

Optimal rates of convergence for estimating the null density and proportion of nonnull effects in large-scale multiple testing

An important estimation problem that is closely related to large-scale multiple testing is that of estimating the null density and the proportion of nonnull effects. A few estimators have been introduced in the literature; however, several important problems, including the evaluation of the minimax rate of convergence and the construction of rate-optimal estimators, remain open. In this paper, we consider optimal estimation of the null density and the proportion of nonnull effects. Both minimax lower and upper bounds are derived. The lower bound is established by a two-point testing argument, where at the core is the novel construction of two least favorable marginal densities $f_1$ and $f_2$. The density $f_1$ is heavy tailed both in the spatial and frequency domains and $f_2$ is a perturbation of $f_1$ such that the characteristic functions associated with $f_1$ and $f_2$ match each other in low frequencies. The minimax upper bound is obtained by constructing estimators which rely on the empirical characteristic function and Fourier analysis. The estimator is shown to be minimax rate optimal. Compared to existing methods in the literature, the proposed procedure not only provides more precise estimates of the null density and the proportion of the nonnull effects, but also yields more accurate results when used inside some multiple testing procedures which aim at controlling the False Discovery Rate (FDR). The procedure is easy to implement and numerical results are given.

T. Tony Cai

What is connected

Connect this record

See the researcher in context

Building this map preview

50 published item(s)

Nonparametric Bandits with Single-Index Rewards: Optimality and Adaptivity

Estimation and Inference with Proxy Data and its Genetic Applications

On the Non-Asymptotic Concentration of Heteroskedastic Wishart-type Matrix

Sparse Group Lasso: Optimal Sample Complexity, Convergence Rate, and Statistical Inference

Distributed Gaussian Mean Estimation under Communication Constraints: Optimal Rates and Communication-Efficient Algorithms

Optimal Permutation Recovery in Permuted Monotone Matrix Model

Rate-Optimal Perturbation Bounds for Singular Subspaces with Applications to High-Dimensional Statistics

Transfer Learning for High-dimensional Linear Regression: Prediction, Estimation, and Minimax Optimality

Two Robust Tools for Inference about Causal Effects with Invalid Instruments

Weighted Message Passing and Minimum Energy Flow for Heterogeneous Stochastic Block Models with Side Information

A simple and robust confidence interval for causal effects with possibly invalid instruments

A Sparse PCA Approach to Clustering

Accuracy Assessment for High-dimensional Linear Regression

Global testing against sparse alternatives in time-frequency analysis

Inference via Message Passing on Partially Labeled Stochastic Block Models

Minimax Rate-optimal Estimation of High-dimensional Covariance Matrices with Incomplete Data

On Detection and Structural Reconstruction of Small-World Random Networks

Optimal Estimation of Co-heritability in High-dimensional Linear Models

Computational and Statistical Boundaries for Submatrix Localization in a Large Noisy Matrix

Confidence Intervals for High-Dimensional Linear Regression: Minimax Rates and Adaptivity

Geometric Inference for General High-Dimensional Linear Inverse Problems

High-Dimensional Gaussian Copula Regression: Adaptive Estimation and Statistical Inference

Inference for High-dimensional Differential Correlation Matrices

Optimal Estimation of A Quadratic Functional and Detection of Simultaneous Signals

Optimal Rates of Convergence for Noisy Sparse Phase Retrieval via Thresholded Wirtinger Flow

Robust and computationally feasible community detection in the presence of arbitrary outlier nodes

Structured Matrix Completion with Applications to Genomic Data Integration

Discussion: "A significance test for the lasso"

Instrumental Variables Estimation with Some Invalid Instruments and its Application to Mendelian Randomization

Rate-Optimal Detection of Very Short Signal Segments

ROP: Matrix recovery via rank-one projections

Sparse PCA: Optimal rates and adaptive estimation

A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion

Adaptive confidence intervals for regression functions under shape constraints

Compressed Sensing and Affine Rank Minimization under Restricted Isometry

Law of Log Determinant of Sample Covariance Matrix and Optimal Estimation of Differential Entropy for High-Dimensional Gaussian Distributions

Optimal hypothesis testing for high dimensional covariance matrices

Optimal rates of convergence for sparse covariance matrix estimation

Sharp RIP Bound for Sparse Signal and Low-Rank Matrix Recovery

Sparse Representation of a Polytope and Recovery of Sparse Signals and Low-rank Matrices

A reproducing kernel Hilbert space approach to functional linear regression

Adaptive covariance matrix estimation through block thresholding

Estimating Sparse Precision Matrix: Optimal Rates of Convergence and Adaptive Estimation

Minimax and Adaptive Inference in Nonparametric Function Estimation

Optimal Detection For Sparse Mixtures

Optimal estimation of the mean function based on discretely sampled functional data: Phase transition

Testing composite hypotheses, Hermite polynomials and optimal estimation of a nonsmooth functional

Nonparametric regression in exponential families

Optimal rates of convergence for covariance matrix estimation

Optimal rates of convergence for estimating the null density and proportion of nonnull effects in large-scale multiple testing