Researcher profile

Qiyang Han

Qiyang Han contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2022arXiv

Noisy linear inverse problems under convex constraints: Exact risk asymptotics in high dimensions

In the standard Gaussian linear measurement model $Y=Xμ_0+ξ\in \mathbb{R}^m$ with a fixed noise level $σ>0$, we consider the problem of estimating the unknown signal $μ_0$ under a convex constraint $μ_0 \in K$, where $K$ is a closed convex set in $\mathbb{R}^n$. We show that the risk of the natural convex constrained least squares estimator (LSE) $\hatμ(σ)$ can be characterized exactly in high dimensional limits, by that of the convex constrained LSE $\hatμ_K^{\mathsf{seq}}$ in the corresponding Gaussian sequence model at a different noise level. The characterization holds (uniformly) for risks in the maximal regime that ranges from constant order all the way down to essentially the parametric rate, as long as certain necessary non-degeneracy condition is satisfied for $\hatμ(σ)$. The precise risk characterization reveals a fundamental difference between noiseless (or low noise limit) and noisy linear inverse problems in terms of the sample complexity for signal recovery. A concrete example is given by the isotonic regression problem: While exact recovery of a general monotone signal requires $m\gg n^{1/3}$ samples in the noiseless setting, consistent signal recovery in the noisy setting requires as few as $m\gg \log n$ samples. Such a discrepancy occurs when the low and high noise risk behavior of $\hatμ_K^{\mathsf{seq}}$ differ significantly. In statistical languages, this occurs when $\hatμ_K^{\mathsf{seq}}$ estimates $0$ at a faster `adaptation rate' than the slower `worst-case rate' for general signals. Several other examples, including non-negative least squares and generalized Lasso (in constrained forms), are also worked out to demonstrate the concrete applicability of the theory in problems of different types.

preprint2022arXiv

Universality of regularized regression estimators in high dimensions

The Convex Gaussian Min-Max Theorem (CGMT) has emerged as a prominent theoretical tool for analyzing the precise stochastic behavior of various statistical estimators in the so-called high dimensional proportional regime, where the sample size and the signal dimension are of the same order. However, a well recognized limitation of the existing CGMT machinery rests in its stringent requirement on the exact Gaussianity of the design matrix, therefore rendering the obtained precise high dimensional asymptotics largely a specific Gaussian theory in various important statistical models. This paper provides a structural universality framework for a broad class of regularized regression estimators that is particularly compatible with the CGMT machinery. In particular, we show that with a good enough $\ell_\infty$ bound for the regression estimator $\hatμ_A$, any `structural property' that can be detected via the CGMT for $\hatμ_G$ (under a standard Gaussian design $G$) also holds for $\hatμ_A$ under a general design $A$ with independent entries. As a proof of concept, we demonstrate our new universality framework in three key examples of regularized regression estimators: the Ridge, Lasso and regularized robust regression estimators, where new universality properties of risk asymptotics and/or distributions of regression estimators and other related quantities are proved. As a major statistical implication of the Lasso universality results, we validate inference procedures using the degrees-of-freedom adjusted debiased Lasso under general design and error distributions. We also provide a counterexample, showing that universality properties for regularized regression estimators do not extend to general isotropic designs.

preprint2021arXiv

Multiplier U-processes: sharp bounds and applications

The theory for multiplier empirical processes has been one of the central topics in the development of the classical theory of empirical processes, due to its wide applicability to various statistical problems. In this paper, we develop theory and tools for studying multiplier $U$-processes, a natural higher-order generalization of the multiplier empirical processes. To this end, we develop a multiplier inequality that quantifies the moduli of continuity of the multiplier $U$-process in terms of that of the (decoupled) symmetrized $U$-process. The new inequality finds a variety of applications including (i) multiplier and bootstrap central limit theorems for $U$-processes, (ii) general theory for bootstrap $M$-estimators based on $U$-statistics, and (iii) theory for $M$-estimation under general complex sampling designs, again based on $U$-statistics.

preprint2021arXiv

Oracle posterior contraction rates under hierarchical priors

We offer a general Bayes theoretic framework to derive posterior contraction rates under a hierarchical prior design: the first-step prior serves to assess the model selection uncertainty, and the second-step prior quantifies the prior belief on the strength of the signals within the model chosen from the first step. In particular, we establish non-asymptotic oracle posterior contraction rates under (i) a local Gaussianity condition on the log likelihood ratio of the statistical experiment, (ii) a local entropy condition on the dimensionality of the models, and (iii) a sufficient mass condition on the second-step prior near the best approximating signal for each model. The first-step prior can be designed generically. The posterior distribution enjoys Gaussian tail behavior and therefore the resulting posterior mean also satisfies an oracle inequality, automatically serving as an adaptive point estimator in a frequentist sense. Model mis-specification is allowed in these oracle rates. The local Gaussianity condition serves as a unified attempt of non-asymptotic Gaussian quantification of the experiments, and can be easily verified in various experiments considered in [GvdV07a] and beyond. The general results are applied in various problems including: (i) trace regression, (ii) shape-restricted isotonic/convex regression, (iii) high-dimensional partially linear regression, (iv) covariance matrix estimation in the sparse factor model, (v) detection of non-smooth polytopal image boundary, and (vi) intensity estimation in a Poisson point process model. These new results serve either as theoretical justification of practical prior proposals in the literature, or as an illustration of the generic construction scheme of a (nearly) minimax adaptive estimator for a complicated experiment.

preprint2021arXiv

Set structured global empirical risk minimizers are rate optimal in general dimensions

Entropy integrals are widely used as a powerful empirical process tool to obtain upper bounds for the rates of convergence of global empirical risk minimizers (ERMs), in standard settings such as density estimation and regression. The upper bound for the convergence rates thus obtained typically matches the minimax lower bound when the entropy integral converges, but admits a strict gap compared to the lower bound when it diverges. Birgé and Massart [BM93] provided a striking example showing that such a gap is real with the entropy structure alone: for a variant of the natural Hölder class with low regularity, the global ERM actually converges at the rate predicted by the entropy integral that substantially deviates from the lower bound. The counter-example has spawned a long-standing negative position on the use of global ERMs in the regime where the entropy integral diverges, as they are heuristically believed to converge at a sub-optimal rate in a variety of models. The present paper demonstrates that this gap can be closed if the models admit certain degree of `set structures' in addition to the entropy structure. In other words, the global ERMs in such set structured models will indeed be rate-optimal, matching the lower bound even when the entropy integral diverges. The models with set structures we investigate include (i) image and edge estimation, (ii) binary classification, (iii) multiple isotonic regression, (iv) $s$-concave density estimation, all in general dimensions when the entropy integral diverges. Here set structures are interpreted broadly in the sense that the complexity of the underlying models can be essentially captured by the size of the empirical process over certain class of measurable sets, for which matching upper and lower bounds are obtained to facilitate the derivation of sharp convergence rates for the associated global ERMs.

preprint2020arXiv

Inference for local parameters in convexity constrained models

We consider the problem of inference for local parameters of a convex regression function $f_0: [0,1] \to \mathbb{R}$ based on observations from a standard nonparametric regression model, using the convex least squares estimator (LSE) $\widehat{f}_n$. For $x_0 \in (0,1)$, the local parameters include the pointwise function value $f_0(x_0)$, the pointwise derivative $f_0'(x_0)$, and the anti-mode (i.e., the smallest minimizer) of $f_0$. The existing limiting distribution of the estimation error $(\widehat{f}_n(x_0) - f_0(x_0), \widehat{f}_n'(x_0) - f_0'(x_0) )$ depends on the unknown second derivative $f_0''(x_0)$, and is therefore not directly applicable for inference. To circumvent this impasse, we show that the following locally normalized errors (LNEs) enjoy pivotal limiting behavior: Let $[\widehat{u}(x_0), \widehat{v}(x_0)]$ be the maximal interval containing $x_0$ where $\widehat{f}_n$ is linear. Then, under standard conditions, $$\binom{ \sqrt{n(\widehat{v}(x_0)-\widehat{u}(x_0))}(\widehat{f}_n(x_0)-f_0(x_0)) }{ \sqrt{n(\widehat{v}(x_0)-\widehat{u}(x_0))^3}(\widehat{f}_n'(x_0)-f_0'(x_0))} \rightsquigarrow σ\cdot \binom{\mathbb{L}^{(0)}_2}{\mathbb{L}^{(1)}_2},$$ where $n$ is the sample size, $σ$ is the standard deviation of the errors, and $\mathbb{L}^{(0)}_2, \mathbb{L}^{(1)}_2$ are universal random variables. This asymptotically pivotal LNE theory instantly yields a simple tuning-free procedure for constructing CIs with asymptotically exact coverage and optimal length for $f_0(x_0)$ and $f_0'(x_0)$. We also construct an asymptotically pivotal LNE for the anti-mode of $f_0$, and its limiting distribution does not even depend on $σ$. These asymptotically pivotal LNE theories are further extended to other convexity/concavity constrained models (e.g., log-concave density estimation) for which a limit distribution theory is available for problem-specific estimators.

preprint2020arXiv

On a phase transition in general order spline regression

In the Gaussian sequence model $Y= θ_0 + \varepsilon$ in $\mathbb{R}^n$, we study the fundamental limit of approximating the signal $θ_0$ by a class $Θ(d,d_0,k)$ of (generalized) splines with free knots. Here $d$ is the degree of the spline, $d_0$ is the order of differentiability at each inner knot, and $k$ is the maximal number of pieces. We show that, given any integer $d\geq 0$ and $d_0\in\{-1,0,\ldots,d-1\}$, the minimax rate of estimation over $Θ(d,d_0,k)$ exhibits the following phase transition: \begin{equation*} \begin{aligned} \inf_{\widetildeθ}\sup_{θ\inΘ(d,d_0, k)}\mathbb{E}_θ\|\widetildeθ - θ\|^2 \asymp_d \begin{cases} k\log\log(16n/k), & 2\leq k\leq k_0,\\ k\log(en/k), & k \geq k_0+1. \end{cases} \end{aligned} \end{equation*} The transition boundary $k_0$, which takes the form $\lfloor{(d+1)/(d-d_0)\rfloor} + 1$, demonstrates the critical role of the regularity parameter $d_0$ in the separation between a faster $\log \log(16n)$ and a slower $\log(en)$ rate. We further show that, once encouraging an additional '$d$-monotonicity' shape constraint (including monotonicity for $d = 0$ and convexity for $d=1$), the above phase transition is eliminated and the faster $k\log\log(16n/k)$ rate can be achieved for all $k$. These results provide theoretical support for developing $\ell_0$-penalized (shape-constrained) spline regression procedures as useful alternatives to $\ell_1$- and $\ell_2$-penalized ones.