Source author record

Hang Deng

Hang Deng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Methodology Statistics Theory cond-mat.mtrl-sci Machine Learning

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Machine Learning in Heterogeneous Porous Materials

The "Workshop on Machine learning in heterogeneous porous materials" brought together international scientific communities of applied mathematics, porous media, and material sciences with experts in the areas of heterogeneous materials, machine learning (ML) and applied mathematics to identify how ML can advance materials research. Within the scope of ML and materials research, the goal of the workshop was to discuss the state-of-the-art in each community, promote crosstalk and accelerate multi-disciplinary collaborative research, and identify challenges and opportunities. As the end result, four topic areas were identified: ML in predicting materials properties, and discovery and design of novel materials, ML in porous and fractured media and time-dependent phenomena, Multi-scale modeling in heterogeneous porous materials via ML, and Discovery of materials constitutive laws and new governing equations. This workshop was part of the AmeriMech Symposium series sponsored by the National Academies of Sciences, Engineering and Medicine and the U.S. National Committee on Theoretical and Applied Mechanics.

preprint2020arXiv

Beyond Gaussian Approximation: Bootstrap for Maxima of Sums of Independent Random Vectors

The Bonferroni adjustment, or the union bound, is commonly used to study rate optimality properties of statistical methods in high-dimensional problems. However, in practice, the Bonferroni adjustment is overly conservative. The extreme value theory has been proven to provide more accurate multiplicity adjustments in a number of settings, but only on ad hoc basis. Recently, Gaussian approximation has been used to justify bootstrap adjustments in large scale simultaneous inference in some general settings when $n \gg (\log p)^7$, where $p$ is the multiplicity of the inference problem and $n$ is the sample size. The thrust of this theory is the validity of the Gaussian approximation for maxima of sums of independent random vectors in high-dimension. In this paper, we reduce the sample size requirement to $n \gg (\log p)^5$ for the consistency of the empirical bootstrap and the multiplier/wild bootstrap in the Kolmogorov-Smirnov distance, possibly in the regime where the Gaussian approximation is not available. New comparison and anti-concentration theorems, which are of considerable interest in and of themselves, are developed as existing ones interweaved with Gaussian approximation are no longer applicable.

preprint2020arXiv

Inference for local parameters in convexity constrained models

We consider the problem of inference for local parameters of a convex regression function $f_0: [0,1] \to \mathbb{R}$ based on observations from a standard nonparametric regression model, using the convex least squares estimator (LSE) $\widehat{f}_n$. For $x_0 \in (0,1)$, the local parameters include the pointwise function value $f_0(x_0)$, the pointwise derivative $f_0'(x_0)$, and the anti-mode (i.e., the smallest minimizer) of $f_0$. The existing limiting distribution of the estimation error $(\widehat{f}_n(x_0) - f_0(x_0), \widehat{f}_n'(x_0) - f_0'(x_0) )$ depends on the unknown second derivative $f_0''(x_0)$, and is therefore not directly applicable for inference. To circumvent this impasse, we show that the following locally normalized errors (LNEs) enjoy pivotal limiting behavior: Let $[\widehat{u}(x_0), \widehat{v}(x_0)]$ be the maximal interval containing $x_0$ where $\widehat{f}_n$ is linear. Then, under standard conditions, $$\binom{ \sqrt{n(\widehat{v}(x_0)-\widehat{u}(x_0))}(\widehat{f}_n(x_0)-f_0(x_0)) }{ \sqrt{n(\widehat{v}(x_0)-\widehat{u}(x_0))^3}(\widehat{f}_n'(x_0)-f_0'(x_0))} \rightsquigarrow σ\cdot \binom{\mathbb{L}^{(0)}_2}{\mathbb{L}^{(1)}_2},$$ where $n$ is the sample size, $σ$ is the standard deviation of the errors, and $\mathbb{L}^{(0)}_2, \mathbb{L}^{(1)}_2$ are universal random variables. This asymptotically pivotal LNE theory instantly yields a simple tuning-free procedure for constructing CIs with asymptotically exact coverage and optimal length for $f_0(x_0)$ and $f_0'(x_0)$. We also construct an asymptotically pivotal LNE for the anti-mode of $f_0$, and its limiting distribution does not even depend on $σ$. These asymptotically pivotal LNE theories are further extended to other convexity/concavity constrained models (e.g., log-concave density estimation) for which a limit distribution theory is available for problem-specific estimators.

preprint2020arXiv

Isotonic Regression in Multi-Dimensional Spaces and Graphs

In this paper we study minimax and adaptation rates in general isotonic regression. For uniform deterministic and random designs in $[0,1]^d$ with $d\ge 2$ and $N(0,1)$ noise, the minimax rate for the $\ell_2$ risk is known to be bounded from below by $n^{-1/d}$ when the unknown mean function $f$ is nondecreasing and its range is bounded by a constant, while the least squares estimator (LSE) is known to nearly achieve the minimax rate up to a factor $(\log n)^γ$ where $n$ is sample size, $γ= 4$ in the lattice design and $γ= \max\{9/2, (d^2+d+1)/2 \}$ in the random design. Moreover, the LSE is known to achieve the adaptation rate $(K/n)^{-2/d}\{1\vee \log(n/K)\}^{2γ}$ when $f$ is piecewise constant on $K$ hyperrectangles in a partition of $[0,1]^d$. Due to the minimax theorem, the LSE is identical on every design point to both the max-min and min-max estimators over all upper and lower sets containing the design point. This motivates our consideration of estimators which lie in-between the max-min and min-max estimators over possibly smaller classes of upper and lower sets, including a subclass of block estimators. Under a $q$-th moment condition on the noise, we develop $\ell_q$ risk bounds for such general estimators for isotonic regression on graphs. For uniform deterministic and random designs in $[0,1]^d$ with $d\ge 3$, our $\ell_2$ risk bound for the block estimator matches the minimax rate $n^{-1/d}$ when the range of $f$ is bounded and achieves the near parametric adaptation rate $(K/n)\{1\vee\log(n/K)\}^{d}$ when $f$ is $K$-piecewise constant. Furthermore, the block estimator possesses the following oracle property in variable selection: When $f$ depends on only a subset $S$ of variables, the $\ell_2$ risk of the block estimator automatically achieves up to a poly-logarithmic factor the minimax rate based on the oracular knowledge of $S$.

preprint2020arXiv

Slightly Conservative Bootstrap for Maxima of Sums

We study the bootstrap for the maxima of the sums of independent random variables, a problem of high relevance to many applications in modern statistics. Since the consistency of bootstrap was justified by Gaussian approximation in Chernozhukov et al. (2013), quite a few attempts have been made to sharpen the error bound for bootstrap and reduce the sample size requirement for bootstrap consistency. In this paper, we show that the sample size requirement can be dramatically improved when we make the inference slightly conservative, that is, to inflate the bootstrap quantile $t_α^*$ by a small fraction, e.g. by $1\%$ to $1.01\,t^*_α$. This simple procedure yields error bounds for the coverage probability of conservative bootstrap at as fast a rate as $\sqrt{(\log p)/n}$ under suitable conditions, so that not only the sample size requirement can be reduced to $\log p \ll n$ but also the overall convergence rate is nearly parametric. Furthermore, we improve the error bound for the coverage probability of the standard non-conservative bootstrap to $[(\log (np))^3 (\log p)^2/n]^{1/4}$ under general assumptions on data. These results are established for the empirical bootstrap and the multiplier bootstrap with third moment match. An improved coherent Lindeberg interpolation method, originally proposed in Deng and Zhang (2017), is developed to derive sharper comparison bounds, especially for the maxima.

Hang Deng

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Machine Learning in Heterogeneous Porous Materials

Beyond Gaussian Approximation: Bootstrap for Maxima of Sums of Independent Random Vectors

Inference for local parameters in convexity constrained models

Isotonic Regression in Multi-Dimensional Spaces and Graphs

Slightly Conservative Bootstrap for Maxima of Sums