Source author record

Kengo Kato

Kengo Kato appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory math.PR Methodology econ.EM Machine Learning

Catalog footprint

What is connected

19works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

Large deviations for dynamical Schrödinger problems

We establish large deviations for dynamical Schrödinger problems driven by perturbed Brownian motions when the noise parameter tends to zero. Our results show that Schrödinger bridges charge exponentially small masses outside the support of the limiting law that agrees with the optimal solution to the dynamical Monge-Kantorovich optimal transport problem. Our proofs build on mixture representations of Schrödinger bridges and establishing exponential continuity of Brownian bridges with respect to the initial and terminal points.

preprint2022arXiv

High-dimensional Data Bootstrap

This article reviews recent progress in high-dimensional bootstrap. We first review high-dimensional central limit theorems for distributions of sample mean vectors over the rectangles, bootstrap consistency results in high dimensions, and key techniques used to establish those results. We then review selected applications of high-dimensional bootstrap: construction of simultaneous confidence sets for high-dimensional vector parameters, multiple hypothesis testing via stepdown, post-selection inference, intersection bounds for partially identified parameters, and inference on best policies in policy evaluation. Finally, we also comment on a couple of future research directions.

preprint2022arXiv

Improved Central Limit Theorem and bootstrap approximations in high dimensions

This paper deals with the Gaussian and bootstrap approximations to the distribution of the max statistic in high dimensions. This statistic takes the form of the maximum over components of the sum of independent random vectors and its distribution plays a key role in many high-dimensional econometric problems. Using a novel iterative randomized Lindeberg method, the paper derives new bounds for the distributional approximation errors. These new bounds substantially improve upon existing ones and simultaneously allow for a larger class of bootstrap methods.

preprint2022arXiv

Limit distribution theory for smooth $p$-Wasserstein distances

The Wasserstein distance is a metric on a space of probability measures that has seen a surge of applications in statistics, machine learning, and applied mathematics. However, statistical aspects of Wasserstein distances are bottlenecked by the curse of dimensionality, whereby the number of data points needed to accurately estimate them grows exponentially with dimension. Gaussian smoothing was recently introduced as a means to alleviate the curse of dimensionality, giving rise to a parametric convergence rate in any dimension, while preserving the Wasserstein metric and topological structure. To facilitate valid statistical inference, in this work, we develop a comprehensive limit distribution theory for the empirical smooth Wasserstein distance. The limit distribution results leverage the functional delta method after embedding the domain of the Wasserstein distance into a certain dual Sobolev space, characterizing its Hadamard directional derivative for the dual Sobolev norm, and establishing weak convergence of the smooth empirical process in the dual space. To estimate the distributional limits, we also establish consistency of the nonparametric bootstrap. Finally, we use the limit distribution theory to study applications to generative modeling via minimum distance estimation with the smooth Wasserstein distance, showing asymptotic normality of optimal solutions for the quadratic cost.

preprint2022arXiv

Limit Distribution Theory for the Smooth 1-Wasserstein Distance with Applications

The smooth 1-Wasserstein distance (SWD) $W_1^σ$ was recently proposed as a means to mitigate the curse of dimensionality in empirical approximation while preserving the Wasserstein structure. Indeed, SWD exhibits parametric convergence rates and inherits the metric and topological structure of the classic Wasserstein distance. Motivated by the above, this work conducts a thorough statistical study of the SWD, including a high-dimensional limit distribution result for empirical $W_1^σ$, bootstrap consistency, concentration inequalities, and Berry-Esseen type bounds. The derived nondegenerate limit stands in sharp contrast with the classic empirical $W_1$, for which a similar result is known only in the one-dimensional case. We also explore asymptotics and characterize the limit distribution when the smoothing parameter $σ$ is scaled with $n$, converging to $0$ at a sufficiently slow rate. The dimensionality of the sampled distribution enters empirical SWD convergence bounds only through the prefactor (i.e., the constant). We provide a sharp characterization of this prefactor's dependence on the smoothing parameter and the intrinsic dimension. This result is then used to derive new empirical convergence rates for classic $W_1$ in terms of the intrinsic dimension. As applications of the limit distribution theory, we study two-sample testing and minimum distance estimation (MDE) under $W_1^σ$. We establish asymptotic validity of SWD testing, while for MDE, we prove measurability, almost sure convergence, and limit distributions for optimal estimators and their corresponding $W_1^σ$ error. Our results suggest that the SWD is well suited for high-dimensional statistical learning and inference.

preprint2022arXiv

Statistical inference with regularized optimal transport

Optimal transport (OT) is a versatile framework for comparing probability measures, with many applications to statistics, machine learning, and applied mathematics. However, OT distances suffer from computational and statistical scalability issues to high dimensions, which motivated the study of regularized OT methods like slicing, smoothing, and entropic penalty. This work establishes a unified framework for deriving limit distributions of empirical regularized OT distances, semiparametric efficiency of the plug-in empirical estimator, and bootstrap consistency. We apply the unified framework to provide a comprehensive statistical treatment of: (i) average- and max-sliced $p$-Wasserstein distances, for which several gaps in existing literature are closed; (ii) smooth distances with compactly supported kernels, the analysis of which is motivated by computational considerations; and (iii) entropic OT, for which our method generalizes existing limit distribution results and establishes, for the first time, efficiency and bootstrap consistency. While our focus is on these three regularized OT distances as applications, the flexibility of the proposed framework renders it applicable to broad classes of functionals beyond these examples.

preprint2021arXiv

Linear programming approach to nonparametric inference under shape restrictions: with an application to regression kink designs

We develop a novel method of constructing confidence bands for nonparametric regression functions under shape constraints. This method can be implemented via a linear programming, and it is thus computationally appealing. We illustrate a usage of our proposed method with an application to the regression kink design (RKD). Econometric analyses based on the RKD often suffer from wide confidence intervals due to slow convergence rates of nonparametric derivative estimators. We demonstrate that economic models and structures motivate shape restrictions, which in turn contribute to shrinking the confidence interval for an analysis of the causal effects of unemployment insurance benefits on unemployment durations.

preprint2020arXiv

Limit Distribution for Smooth Total Variation and $χ^2$-Divergence in High Dimensions

Statistical divergences are ubiquitous in machine learning as tools for measuring discrepancy between probability distributions. As these applications inherently rely on approximating distributions from samples, we consider empirical approximation under two popular $f$-divergences: the total variation (TV) distance and the $χ^2$-divergence. To circumvent the sensitivity of these divergences to support mismatch, the framework of Gaussian smoothing is adopted. We study the limit distributions of $\sqrt{n}δ_{\mathsf{TV}}(P_n\ast\mathcal{N},P\ast\mathcal{N})$ and $nχ^2(P_n\ast\mathcal{N}\|P\ast\mathcal{N})$, where $P_n$ is the empirical measure based on $n$ independently and identically distributed (i.i.d.) observations from $P$, $\mathcal{N}_σ:=\mathcal{N}(0,σ^2\mathrm{I}_d)$, and $\ast$ stands for convolution. In arbitrary dimension, the limit distributions are characterized in terms of Gaussian process on $\mathbb{R}^d$ with covariance operator that depends on $P$ and the isotropic Gaussian density of parameter $σ$. This, in turn, implies optimality of the $n^{-1/2}$ expected value convergence rates recently derived for $δ_{\mathsf{TV}}(P_n\ast\mathcal{N},P\ast\mathcal{N})$ and $χ^2(P_n\ast\mathcal{N}\|P\ast\mathcal{N})$. These strong statistical guarantees promote empirical approximation under Gaussian smoothing as a potent framework for learning and inference based on high-dimensional data.

preprint2020arXiv

Multiway Cluster Robust Double/Debiased Machine Learning

This paper investigates double/debiased machine learning (DML) under multiway clustered sampling environments. We propose a novel multiway cross fitting algorithm and a multiway DML estimator based on this algorithm. We also develop a multiway cluster robust standard error formula. Simulations indicate that the proposed procedure has favorable finite sample performance. Applying the proposed method to market share data for demand analysis, we obtain larger two-way cluster robust standard errors than non-robust ones.

preprint2016arXiv

Central Limit Theorems and Bootstrap in High Dimensions

This paper derives central limit and bootstrap theorems for probabilities that sums of centered high-dimensional random vectors hit hyperrectangles and sparsely convex sets. Specifically, we derive Gaussian and bootstrap approximations for probabilities $\Pr(n^{-1/2}\sum_{i=1}^n X_i\in A)$ where $X_1,\dots,X_n$ are independent random vectors in $\mathbb{R}^p$ and $A$ is a hyperrectangle, or, more generally, a sparsely convex set, and show that the approximation error converges to zero even if $p=p_n\to \infty$ as $n \to \infty$ and $p \gg n$; in particular, $p$ can be as large as $O(e^{Cn^c})$ for some constants $c,C>0$. The result holds uniformly over all hyperrectangles, or more generally, sparsely convex sets, and does not require any restriction on the correlation structure among coordinates of $X_i$. Sparsely convex sets are sets that can be represented as intersections of many convex sets whose indicator functions depend only on a small subset of their arguments, with hyperrectangles being a special case.

preprint2015arXiv

Empirical and multiplier bootstraps for suprema of empirical processes of increasing complexity, and related Gaussian couplings

We derive strong approximations to the supremum of the non-centered empirical process indexed by a possibly unbounded VC-type class of functions by the suprema of the Gaussian and bootstrap processes. The bounds of these approximations are non-asymptotic, which allows us to work with classes of functions whose complexity increases with the sample size. The construction of couplings is not of the Hungarian type and is instead based on the Slepian-Stein methods and Gaussian comparison inequalities. The increasing complexity of classes of functions and non-centrality of the processes make the results useful for applications in modern nonparametric statistics (Giné and Nickl, 2015), in particular allowing us to study the power properties of nonparametric tests using Gaussian and bootstrap approximations.

preprint2014arXiv

Anti-concentration and honest, adaptive confidence bands

Modern construction of uniform confidence bands for nonparametric densities (and other functions) often relies on the classical Smirnov-Bickel-Rosenblatt (SBR) condition; see, for example, Giné and Nickl [Probab. Theory Related Fields 143 (2009) 569-596]. This condition requires the existence of a limit distribution of an extreme value type for the supremum of a studentized empirical process (equivalently, for the supremum of a Gaussian process with the same covariance function as that of the studentized empirical process). The principal contribution of this paper is to remove the need for this classical condition. We show that a considerably weaker sufficient condition is derived from an anti-concentration property of the supremum of the approximating Gaussian process, and we derive an inequality leading to such a property for separable Gaussian processes. We refer to the new condition as a generalized SBR condition. Our new result shows that the supremum does not concentrate too fast around any value. We then apply this result to derive a Gaussian multiplier bootstrap procedure for constructing honest confidence bands for nonparametric density estimators (this result can be applied in other nonparametric problems as well). An essential advantage of our approach is that it applies generically even in those cases where the limit distribution of the supremum of the studentized empirical process does not exist (or is unknown). This is of particular importance in problems where resolution levels or other tuning parameters have been chosen in a data-driven fashion, which is needed for adaptive constructions of the confidence bands. Finally, of independent interest is our introduction of a new, practical version of Lepski's method, which computes the optimal, nonconservative resolution levels via a Gaussian multiplier bootstrap method.

preprint2014arXiv

Comparison and anti-concentration bounds for maxima of Gaussian random vectors

Slepian and Sudakov-Fernique type inequalities, which compare expectations of maxima of Gaussian random vectors under certain restrictions on the covariance matrices, play an important role in probability theory, especially in empirical process and extreme value theories. Here we give explicit comparisons of expectations of smooth functions and distribution functions of maxima of Gaussian random vectors without any restriction on the covariance matrices. We also establish an anti-concentration inequality for the maximum of a Gaussian random vector, which derives a useful upper bound on the Lévy concentration function for the Gaussian maximum. The bound is dimension-free and applies to vectors with arbitrary covariance matrices. This anti-concentration inequality plays a crucial role in establishing bounds on the Kolmogorov distance between maxima of Gaussian random vectors. These results have immediate applications in mathematical statistics. As an example of application, we establish a conditional multiplier central limit theorem for maxima of sums of independent random vectors where the dimension of the vectors is possibly much larger than the sample size.

preprint2014arXiv

Estimation and inference for linear panel data models under misspecification when both $n$ and $T$ are large

This paper considers fixed effects (FE) estimation for linear panel data models under possible model misspecification when both the number of individuals, $n$, and the number of time periods, $T$, are large. We first clarify the probability limit of the FE estimator and argue that this probability limit can be regarded as a pseudo-true parameter. We then establish the asymptotic distributional properties of the FE estimator around the pseudo-true parameter when $n$ and $T$ jointly go to infinity. Notably, we show that the FE estimator suffers from the incidental parameters bias of which the top order is $O(T^{-1})$, and even after the incidental parameters bias is completely removed, the rate of convergence of the FE estimator depends on the degree of model misspecification and is either $(nT)^{-1/2}$ or $n^{-1/2}$. Second, we establish asymptotically valid inference on the (pseudo-true) parameter. Specifically, we derive the asymptotic properties of the clustered covariance matrix (CCM) estimator and the cross section bootstrap, and show that they are robust to model misspecification. This establishes a rigorous theoretical ground for the use of the CCM estimator and the cross section bootstrap when model misspecification and the incidental parameters bias (in the coefficient estimate) are present. We conduct Monte Carlo simulations to evaluate the finite sample performance of the estimators and inference methods, together with a simple application to the unemployment dynamics in the U.S.

preprint2014arXiv

Gaussian approximation of suprema of empirical processes

This paper develops a new direct approach to approximating suprema of general empirical processes by a sequence of suprema of Gaussian processes, without taking the route of approximating whole empirical processes in the sup-norm. We prove an abstract approximation theorem applicable to a wide variety of statistical problems, such as construction of uniform confidence bands for functions. Notably, the bound in the main approximation theorem is nonasymptotic and the theorem allows for functions that index the empirical process to be unbounded and have entropy divergent with the sample size. The proof of the approximation theorem builds on a new coupling inequality for maxima of sums of random vectors, the proof of which depends on an effective use of Stein's method for normal approximation, and some new empirical process techniques. We study applications of this approximation theorem to local and series empirical processes arising in nonparametric estimation via kernel and series methods, where the classes of functions change with the sample size and are non-Donsker. Importantly, our new technique is able to prove the Gaussian approximation for the supremum type statistics under weak regularity conditions, especially concerning the bandwidth and the number of series functions, in those examples.

preprint2013arXiv

Estimation in functional linear quantile regression

This paper studies estimation in functional linear quantile regression in which the dependent variable is scalar while the covariate is a function, and the conditional quantile for each fixed quantile index is modeled as a linear functional of the covariate. Here we suppose that covariates are discretely observed and sampling points may differ across subjects, where the number of measurements per subject increases as the sample size. Also, we allow the quantile index to vary over a given subset of the open unit interval, so the slope function is a function of two variables: (typically) time and quantile index. Likewise, the conditional quantile function is a function of the quantile index and the covariate. We consider an estimator for the slope function based on the principal component basis. An estimator for the conditional quantile function is obtained by a plug-in method. Since the so-constructed plug-in estimator not necessarily satisfies the monotonicity constraint with respect to the quantile index, we also consider a class of monotonized estimators for the conditional quantile function. We establish rates of convergence for these estimators under suitable norms, showing that these rates are optimal in a minimax sense under some smoothness assumptions on the covariance kernel of the covariate and the slope function. Empirical choice of the cutoff level is studied by using simulations.

preprint2013arXiv

Quasi-Bayesian analysis of nonparametric instrumental variables models

This paper aims at developing a quasi-Bayesian analysis of the nonparametric instrumental variables model, with a focus on the asymptotic properties of quasi-posterior distributions. In this paper, instead of assuming a distributional assumption on the data generating process, we consider a quasi-likelihood induced from the conditional moment restriction, and put priors on the function-valued parameter. We call the resulting posterior quasi-posterior, which corresponds to ``Gibbs posterior'' in the literature. Here we focus on priors constructed on slowly growing finite-dimensional sieves. We derive rates of contraction and a nonparametric Bernstein-von Mises type result for the quasi-posterior distribution, and rates of convergence for the quasi-Bayes estimator defined by the posterior expectation. We show that, with priors suitably chosen, the quasi-posterior distribution (the quasi-Bayes estimator) attains the minimax optimal rate of contraction (convergence, resp.). These results greatly sharpen the previous related work.

preprint2013arXiv

Two-step estimation of high dimensional additive models

This paper investigates the two-step estimation of a high dimensional additive regression model, in which the number of nonparametric additive components is potentially larger than the sample size but the number of significant additive components is sufficiently small. The approach investigated consists of two steps. The first step implements the variable selection, typically by the group Lasso, and the second step applies the penalized least squares estimation with Sobolev penalties to the selected additive components. Such a procedure is computationally simple to implement and, in our numerical experiments, works reasonably well. Despite its intuitive nature, the theoretical properties of this two-step procedure have to be carefully analyzed, since the effect of the first step variable selection is random, and generally it may contain redundant additive components and at the same time miss significant additive components. This paper derives a generic performance bound on the two-step estimation procedure allowing for these situations, and studies in detail the overall performance when the first step variable selection is implemented by the group Lasso.

preprint2011arXiv

Group Lasso for high dimensional sparse quantile regression models

This paper studies the statistical properties of the group Lasso estimator for high dimensional sparse quantile regression models where the number of explanatory variables (or the number of groups of explanatory variables) is possibly much larger than the sample size while the number of variables in "active" groups is sufficiently small. We establish a non-asymptotic bound on the $\ell_{2}$-estimation error of the estimator. This bound explains situations under which the group Lasso estimator is potentially superior/inferior to the $\ell_{1}$-penalized quantile regression estimator in terms of the estimation error. We also propose a data-dependent choice of the tuning parameter to make the method more practical, by extending the original proposal of Belloni and Chernozhukov (2011) for the $\ell_{1}$-penalized quantile regression estimator. As an application, we analyze high dimensional additive quantile regression models. We show that under a set of suitable regularity conditions, the group Lasso estimator can attain the convergence rate arbitrarily close to the oracle rate. Finally, we conduct simulations experiments to examine our theoretical results.

Kengo Kato

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

Large deviations for dynamical Schrödinger problems

High-dimensional Data Bootstrap

Improved Central Limit Theorem and bootstrap approximations in high dimensions

Limit distribution theory for smooth $p$-Wasserstein distances

Limit Distribution Theory for the Smooth 1-Wasserstein Distance with Applications

Statistical inference with regularized optimal transport

Linear programming approach to nonparametric inference under shape restrictions: with an application to regression kink designs

Limit Distribution for Smooth Total Variation and $χ^2$-Divergence in High Dimensions

Multiway Cluster Robust Double/Debiased Machine Learning

Central Limit Theorems and Bootstrap in High Dimensions

Empirical and multiplier bootstraps for suprema of empirical processes of increasing complexity, and related Gaussian couplings

Anti-concentration and honest, adaptive confidence bands

Comparison and anti-concentration bounds for maxima of Gaussian random vectors

Estimation and inference for linear panel data models under misspecification when both $n$ and $T$ are large

Gaussian approximation of suprema of empirical processes

Estimation in functional linear quantile regression

Quasi-Bayesian analysis of nonparametric instrumental variables models

Two-step estimation of high dimensional additive models

Group Lasso for high dimensional sparse quantile regression models