Researcher profile

Cun-Hui Zhang

Cun-Hui Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2022arXiv

Decorrelated Local Linear Estimator: Inference for Non-linear Effects in High-dimensional Additive Models

Additive models play an essential role in studying non-linear relationships. Despite many recent advances in estimation, there is a lack of methods and theories for inference in high-dimensional additive models, including confidence interval construction and hypothesis testing. Motivated by inference for non-linear treatment effects, we consider the high-dimensional additive model and make inference for the derivative of the function of interest. We propose a novel decorrelated local linear estimator and establish its asymptotic normality. The main novelty is the construction of the decorrelation weights, which is instrumental in reducing the error inherited from estimating the nuisance functions in the high-dimensional additive model. We construct the confidence interval for the function derivative and conduct the related hypothesis testing. We demonstrate our proposed method over large-scale simulation studies and apply it to identify non-linear effects in the motif regression problem. Our proposed method is implemented in the R package \texttt{DLL} available from CRAN.

preprint2022arXiv

Rank Determination in Tensor Factor Model

Factor model is an appealing and effective analytic tool for high-dimensional time series, with a wide range of applications in economics, finance and statistics. This paper develops two criteria for the determination of the number of factors for tensor factor models where the signal part of an observed tensor time series assumes a Tucker decomposition with the core tensor as the factor tensor. The task is to determine the dimensions of the core tensor. One of the proposed criteria is similar to information based criteria of model selection, and the other is an extension of the approaches based on the ratios of consecutive eigenvalues often used in factor analysis for panel time series. Theoretically results, including sufficient conditions and convergence rates, are established. The results include the vector factor models as special cases, with an additional convergence rates. Simulation studies provide promising finite sample performance for the two criteria.

preprint2021arXiv

High-Order Statistical Functional Expansion and Its Application To Some Nonsmooth Problems

Let $\bx_j = \btheta +\bep_j, j=1,...,n$, be observations of an unknown parameter $\btheta$ in a Euclidean or separable Hilbert space $\scrH$, where $\bep_j$ are noises as random elements in $\scrH$ from a general distribution. We study the estimation of $f(\btheta)$ for a given functional $f:\scrH\rightarrow \RR$ based on $\bx_j$'s. The key element of our approach is a new method which we call High-Order Degenerate Statistical Expansion. It leverages the use of classical multivariate Taylor expansion and degenerate $U$-statistic and yields an elegant explicit formula. In the univariate case of $\scrH=\R$, the formula expresses the error of the proposed estimator as a sum of order $k$ degenerate $U$-products of the noises with coefficient $f^{(k)}(\btheta)/k!$ and an explicit remainder term in the form of the Riemann-Liouville integral as in the Taylor expansion around the true $\btheta$. For general $\scrH$, the formula expresses the estimation error in terms of the inner product of $f^{(k)}(\btheta)/k!$ and the average of the tensor products of $k$ noises with distinct indices and a parallel extension of the remainder term from the univariate case. This makes the proposed method a natural statistical version of the classical Taylor expansion. The proposed estimator can be viewed as a jackknife estimator of an ideal degenerate expansion of $f(\cdot)$ around the true $\btheta$ with the degenerate $U$-product of the noises, and can be approximated by bootstrap. Thus, the jackknife, bootstrap and Taylor expansion approaches all converge to the proposed estimator. We develop risk bounds for the proposed estimator and a central limit theorem under a second moment condition (even in expansions of higher than the second order). We apply this new method to generalize several existing results with smooth and nonsmooth $f$ to universal $\bep_j$'s with only minimum moment constraints.

preprint2020arXiv

Beyond Gaussian Approximation: Bootstrap for Maxima of Sums of Independent Random Vectors

The Bonferroni adjustment, or the union bound, is commonly used to study rate optimality properties of statistical methods in high-dimensional problems. However, in practice, the Bonferroni adjustment is overly conservative. The extreme value theory has been proven to provide more accurate multiplicity adjustments in a number of settings, but only on ad hoc basis. Recently, Gaussian approximation has been used to justify bootstrap adjustments in large scale simultaneous inference in some general settings when $n \gg (\log p)^7$, where $p$ is the multiplicity of the inference problem and $n$ is the sample size. The thrust of this theory is the validity of the Gaussian approximation for maxima of sums of independent random vectors in high-dimension. In this paper, we reduce the sample size requirement to $n \gg (\log p)^5$ for the consistency of the empirical bootstrap and the multiplier/wild bootstrap in the Kolmogorov-Smirnov distance, possibly in the regime where the Gaussian approximation is not available. New comparison and anti-concentration theorems, which are of considerable interest in and of themselves, are developed as existing ones interweaved with Gaussian approximation are no longer applicable.

preprint2020arXiv

Extreme Eigenvalues of Nonlinear Correlation Matrices with Applications to Additive Models

The maximum correlation of functions of a pair of random variables is an important measure of stochastic dependence. It is known that this maximum nonlinear correlation is identical to the absolute value of the Pearson correlation for a pair of Gaussian random variables or a pair of finite sums of iid random variables. This paper extends these results to pairwise Gaussian vectors and processes, nested sums of iid random variables, and permutation symmetric functions of sub-groups of iid random variables. It also discusses applications to additive regression models.

preprint2020arXiv

Factor Models for High-Dimensional Tensor Time Series

Large tensor (multi-dimensional array) data are now routinely collected in a wide range of applications, due to modern data collection capabilities. Often such observations are taken over time, forming tensor time series. In this paper we present a factor model approach for analyzing high-dimensional dynamic tensor time series and multi-category dynamic transport networks. Two estimation procedures along with their theoretical properties and simulation results are presented. Two applications are used to illustrate the model and its interpretations.

preprint2020arXiv

Isotonic Regression in Multi-Dimensional Spaces and Graphs

In this paper we study minimax and adaptation rates in general isotonic regression. For uniform deterministic and random designs in $[0,1]^d$ with $d\ge 2$ and $N(0,1)$ noise, the minimax rate for the $\ell_2$ risk is known to be bounded from below by $n^{-1/d}$ when the unknown mean function $f$ is nondecreasing and its range is bounded by a constant, while the least squares estimator (LSE) is known to nearly achieve the minimax rate up to a factor $(\log n)^γ$ where $n$ is sample size, $γ= 4$ in the lattice design and $γ= \max\{9/2, (d^2+d+1)/2 \}$ in the random design. Moreover, the LSE is known to achieve the adaptation rate $(K/n)^{-2/d}\{1\vee \log(n/K)\}^{2γ}$ when $f$ is piecewise constant on $K$ hyperrectangles in a partition of $[0,1]^d$. Due to the minimax theorem, the LSE is identical on every design point to both the max-min and min-max estimators over all upper and lower sets containing the design point. This motivates our consideration of estimators which lie in-between the max-min and min-max estimators over possibly smaller classes of upper and lower sets, including a subclass of block estimators. Under a $q$-th moment condition on the noise, we develop $\ell_q$ risk bounds for such general estimators for isotonic regression on graphs. For uniform deterministic and random designs in $[0,1]^d$ with $d\ge 3$, our $\ell_2$ risk bound for the block estimator matches the minimax rate $n^{-1/d}$ when the range of $f$ is bounded and achieves the near parametric adaptation rate $(K/n)\{1\vee\log(n/K)\}^{d}$ when $f$ is $K$-piecewise constant. Furthermore, the block estimator possesses the following oracle property in variable selection: When $f$ depends on only a subset $S$ of variables, the $\ell_2$ risk of the block estimator automatically achieves up to a poly-logarithmic factor the minimax rate based on the oracular knowledge of $S$.

preprint2020arXiv

Second order Stein: SURE for SURE and other applications in high-dimensional inference

Stein's formula states that a random variable of the form $z^\top f(z) - \text{div} f(z)$ is mean-zero for functions $f$ with integrable gradient. Here, $\text{div} f$ is the divergence of the function $f$ and $z$ is a standard normal vector. This paper aims to propose a Second Order Stein formula to characterize the variance of such random variables for all functions $f(z)$ with square integrable gradient, and to demonstrate the usefulness of this formula in various applications. In the Gaussian sequence model, a consequence of Stein's formula is Stein's Unbiased Risk Estimate (SURE), an unbiased estimate of the mean squared risk for almost any estimator $\hatμ$ of the unknown mean. A first application of the Second Order Stein formula is an Unbiased Risk Estimate for SURE itself (SURE for SURE): an unbiased estimate {providing} information about the squared distance between SURE and the squared estimation error of $\hatμ$. SURE for SURE has a simple form as a function of the data and is applicable to all $\hatμ$ with square integrable gradient, e.g. the Lasso and the Elastic Net. In addition to SURE for SURE, the following applications are developed: (1) Upper bounds on the risk of SURE when the estimation target is the mean squared error; (2) Confidence regions based on SURE; (3) Oracle inequalities satisfied by SURE-tuned estimates; (4) An upper bound on the variance of the size of the model selected by the Lasso; (5) Explicit expressions of SURE for SURE for the Lasso and the Elastic-Net; (6) In the linear model, a general semi-parametric scheme to de-bias a differentiable initial estimator for inference of a low-dimensional projection of the unknown $β$, with a characterization of the variance after de-biasing; and (7) An accuracy analysis of a Gaussian Monte Carlo scheme to approximate the divergence of functions $f: R^n\to R^n$.

preprint2015arXiv

Lasso adjustments of treatment effect estimates in randomized experiments

We provide a principled way for investigators to analyze randomized experiments when the number of covariates is large. Investigators often use linear multivariate regression to analyze randomized experiments instead of simply reporting the difference of means between treatment and control groups. Their aim is to reduce the variance of the estimated treatment effect by adjusting for covariates. If there are a large number of covariates relative to the number of observations, regression may perform poorly because of overfitting. In such cases, the Lasso may be helpful. We study the resulting Lasso-based treatment effect estimator under the Neyman-Rubin model of randomized experiments. We present theoretical conditions that guarantee that the estimator is more efficient than the simple difference-of-means estimator, and we provide a conservative estimator of the asymptotic variance, which can yield tighter confidence intervals than the difference-of-means estimator. Simulation and data examples show that Lasso-based adjustment can be advantageous even when the number of covariates is less than the number of observations. Specifically, a variant using Lasso for selection and OLS for estimation performs particularly well, and it chooses a smoothing parameter based on combined performance of Lasso and OLS.