Source author record

Adityanand Guntuboyina

Adityanand Guntuboyina appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

18works
11topics
4close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

18 published item(s)

preprint2026arXiv

What Functions Does XGBoost Learn?

This paper establishes a rigorous theoretical foundation for the function class implicitly learned by XGBoost, bridging the gap between its empirical success and our theoretical understanding. We introduce an infinite-dimensional function class $\mathcal{F}^{d, s}_{\infty-\text{ST}}$ that extends finite ensembles of bounded-depth regression trees, together with a complexity measure $V^{d, s}_{\infty-\text{XGB}}(\cdot)$ that generalizes the $L^1$ regularization penalty used in XGBoost. We show that every optimizer of the XGBoost objective is also an optimizer of an equivalent penalized regression problem over $\mathcal{F}^{d, s}_{\infty-\text{ST}}$ with penalty $V^{d, s}_{\infty-\text{XGB}}(\cdot)$, providing an interpretation of XGBoost as implicitly targeting a broader function class. We also develop a smoothness-based interpretation of $\mathcal{F}^{d, s}_{\infty-\text{ST}}$ and $V^{d, s}_{\infty-\text{XGB}}(\cdot)$ in terms of Hardy--Krause variation. We prove that the least squares estimator over $\{f \in \mathcal{F}^{d, s}_{\infty-\text{ST}}: V^{d, s}_{\infty-\text{XGB}}(f) \le V\}$ achieves a nearly minimax-optimal rate of convergence $n^{-2/3} (\log n)^{4(\min(s, d) - 1)/3}$, thereby avoiding the curse of dimensionality. Our results provide the first rigorous characterization of the function space underlying XGBoost, clarify its connection to classical notions of variation, and identify an important open problem: whether the XGBoost algorithm itself achieves minimax optimality over this class.

preprint2023arXiv

Multivariate, Heteroscedastic Empirical Bayes via Nonparametric Maximum Likelihood

Multivariate, heteroscedastic errors complicate statistical inference in many large-scale denoising problems. Empirical Bayes is attractive in such settings, but standard parametric approaches rest on assumptions about the form of the prior distribution which can be hard to justify and which introduce unnecessary tuning parameters. We extend the nonparametric maximum likelihood estimator (NPMLE) for Gaussian location mixture densities to allow for multivariate, heteroscedastic errors. NPMLEs estimate an arbitrary prior by solving an infinite-dimensional, convex optimization problem; we show that this convex optimization problem can be tractably approximated by a finite-dimensional version. The empirical Bayes posterior means based on an NPMLE have low regret, meaning they closely target the oracle posterior means one would compute with the true prior in hand. We prove an oracle inequality implying that the empirical Bayes estimator performs at nearly the optimal level (up to logarithmic factors) for denoising without prior knowledge. We provide finite-sample bounds on the average Hellinger accuracy of an NPMLE for estimating the marginal densities of the observations. We also demonstrate the adaptive and nearly-optimal properties of NPMLEs for deconvolution. We apply our method to two denoising problems in astronomy, constructing a fully data-driven color-magnitude diagram of 1.4 million stars in the Milky Way and investigating the distribution of 19 chemical abundance ratios for 27 thousand stars in the red clump. We also apply our method to hierarchical linear models, illustrating the advantages of nonparametric shrinkage of regression coefficients on an education data set and on a microarray data set.

preprint2020arXiv

Covariance estimation with nonnegative partial correlations

We study the problem of high-dimensional covariance estimation under the constraint that the partial correlations are nonnegative. The sign constraints dramatically simplify estimation: the Gaussian maximum likelihood estimator is well defined with only two observations regardless of the number of variables. We analyze its performance in the setting where the dimension may be much larger than the sample size. We establish that the estimator is both high-dimensionally consistent and minimax optimal in the symmetrized Stein loss. We also prove a negative result which shows that the sign-constraints can introduce substantial bias for estimating the top eigenvalue of the covariance matrix.

preprint2020arXiv

Multivariate extensions of isotonic regression and total variation denoising via entire monotonicity and Hardy-Krause variation

We consider the problem of nonparametric regression when the covariate is $d$-dimensional, where $d \geq 1$. In this paper we introduce and study two nonparametric least squares estimators (LSEs) in this setting---the entirely monotonic LSE and the constrained Hardy-Krause variation LSE. We show that these two LSEs are natural generalizations of univariate isotonic regression and univariate total variation denoising, respectively, to multiple dimensions. We discuss the characterization and computation of these two LSEs obtained from $n$ data points. We provide a detailed study of their risk properties under the squared error loss and fixed uniform lattice design. We show that the finite sample risk of these LSEs is always bounded from above by $n^{-2/3}$ modulo logarithmic factors depending on $d$; thus these nonparametric LSEs avoid the curse of dimensionality to some extent. We also prove nearly matching minimax lower bounds. Further, we illustrate that these LSEs are particularly useful in fitting rectangular piecewise constant functions. Specifically, we show that the risk of the entirely monotonic LSE is almost parametric (at most $1/n$ up to logarithmic factors) when the true function is well-approximable by a rectangular piecewise constant entirely monotone function with not too many constant pieces. A similar result is also shown to hold for the constrained Hardy-Krause variation LSE for a simple subclass of rectangular piecewise constant functions. We believe that the proposed LSEs yield a novel approach to estimating multivariate functions using convex optimization that avoid the curse of dimensionality to some extent.

preprint2020arXiv

On Suboptimality of Least Squares with Application to Estimation of Convex Bodies

We develop a technique for establishing lower bounds on the sample complexity of Least Squares (or, Empirical Risk Minimization) for large classes of functions. As an application, we settle an open problem regarding optimality of Least Squares in estimating a convex set from noisy support function measurements in dimension $d\geq 6$. Specifically, we establish that Least Squares is mimimax sub-optimal, and achieves a rate of $\tildeΘ_d(n^{-2/(d-1)})$ whereas the minimax rate is $Θ_d(n^{-4/(d+3)})$.

preprint2016arXiv

Adaptation in log-concave density estimation

The log-concave maximum likelihood estimator of a density on the real line based on a sample of size $n$ is known to attain the minimax optimal rate of convergence of $O(n^{-4/5})$ with respect to, e.g., squared Hellinger distance. In this paper, we show that it also enjoys attractive adaptation properties, in the sense that it achieves a faster rate of convergence when the logarithm of the true density is $k$-affine (i.e.\ made up of $k$ affine pieces), provided $k$ is not too large. Our results use two different techniques: the first relies on a new Marshall's inequality for log-concave density estimation, and reveals that when the true density is close to log-linear on its support, the log-concave maximum likelihood estimator can achieve the parametric rate of convergence in total variation distance. Our second approach depends on local bracketing entropy methods, and allows us to prove a sharp oracle inequality, which implies in particular that the rate of convergence with respect to various global loss functions, including Kullback--Leibler divergence, is $O\bigl(\frac{k}{n}\log^{5/4} n\bigr)$ when the true density is log-concave and its logarithm is close to $k$-affine.

preprint2016arXiv

On Bayes Risk Lower Bounds

This paper provides a general technique for lower bounding the Bayes risk of statistical estimation, applicable to arbitrary loss functions and arbitrary prior distributions. A lower bound on the Bayes risk not only serves as a lower bound on the minimax risk, but also characterizes the fundamental limit of any estimator given the prior knowledge. Our bounds are based on the notion of $f$-informativity, which is a function of the underlying class of probability measures and the prior. Application of our bounds requires upper bounds on the $f$-informativity, thus we derive new upper bounds on $f$-informativity which often lead to tight Bayes risk lower bounds. Our technique leads to generalizations of a variety of classical minimax bounds (e.g., generalized Fano's inequality). Our Bayes risk lower bounds can be directly applied to several concrete estimation problems, including Gaussian location models, generalized linear models, and principal component analysis for spiked covariance models. To further demonstrate the applications of our Bayes risk lower bounds to machine learning problems, we present two new theoretical results: (1) a precise characterization of the minimax risk of learning spherical Gaussian mixture models under the smoothed analysis framework, and (2) lower bounds for the Bayes risk under a natural prior for both the prediction and estimation errors for high-dimensional sparse linear regression under an improper learning setting.

preprint2016arXiv

Stochastically Transitive Models for Pairwise Comparisons: Statistical and Computational Issues

There are various parametric models for analyzing pairwise comparison data, including the Bradley-Terry-Luce (BTL) and Thurstone models, but their reliance on strong parametric assumptions is limiting. In this work, we study a flexible model for pairwise comparisons, under which the probabilities of outcomes are required only to satisfy a natural form of stochastic transitivity. This class includes parametric models including the BTL and Thurstone models as special cases, but is considerably more general. We provide various examples of models in this broader stochastically transitive class for which classical parametric models provide poor fits. Despite this greater flexibility, we show that the matrix of probabilities can be estimated at the same rate as in standard parametric models. On the other hand, unlike in the BTL and Thurstone models, computing the minimax-optimal estimator in the stochastically transitive model is non-trivial, and we explore various computationally tractable alternatives. We show that a simple singular value thresholding algorithm is statistically consistent but does not achieve the minimax rate. We then propose and study algorithms that achieve the minimax rate over interesting sub-classes of the full stochastically transitive class. We complement our theoretical results with thorough numerical simulations.

preprint2015arXiv

Adaptive estimation of planar convex sets

In this paper, we consider adaptive estimation of an unknown planar compact, convex set from noisy measurements of its support function on a uniform grid. Both the problem of estimating the support function at a point and that of estimating the convex set are studied. Data-driven adaptive estimators are proposed and their optimality properties are established. For pointwise estimation, it is shown that the estimator optimally adapts to every compact, convex set instead of a collection of large parameter spaces as in the conventional minimax theory of nonparametric estimation. For set estimation, the estimators adaptively achieve the optimal rate of convergence. In both these problems, our analysis makes no smoothness assumptions on the unknown sets.

preprint2015arXiv

On matrix estimation under monotonicity constraints

We consider the problem of estimating an unknown $n_1 \times n_2$ matrix $\mathbf{θ^*}$ from noisy observations under the constraint that $\mathbfθ^*$ is nondecreasing in both rows and columns. We consider the least squares estimator (LSE) in this setting and study its risk properties. We show that the worst case risk of the LSE is $n^{-1/2}$, up to multiplicative logarithmic factors, where $n = n_1 n_2$ and that the LSE is minimax rate optimal (up to logarithmic factors). We further prove that for some special $\mathbfθ^*$, the risk of the LSE could be much smaller than $n^{-1/2}$; in fact, it could even be parametric i.e., $n^{-1}$ up to logarithmic factors. Such parametric rates occur when the number of "rectangular" blocks of $\mathbfθ^*$ is bounded from above by a constant. We derive, as a consequence, an interesting adaptation property of the LSE which we term variable adaptation -- the LSE performs as well as the oracle estimator when estimating a matrix that is constant along each row/column. Our proofs borrow ideas from empirical process theory and convex geometry and are of independent interest.

preprint2015arXiv

On risk bounds in isotonic and other shape restricted regression problems

We consider the problem of estimating an unknown $θ\in {\mathbb{R}}^n$ from noisy observations under the constraint that $θ$ belongs to certain convex polyhedral cones in ${\mathbb{R}}^n$. Under this setting, we prove bounds for the risk of the least squares estimator (LSE). The obtained risk bound behaves differently depending on the true sequence $θ$ which highlights the adaptive behavior of $θ$. As special cases of our general result, we derive risk bounds for the LSE in univariate isotonic and convex regression. We study the risk bound in isotonic regression in greater detail: we show that the isotonic LSE converges at a whole range of rates from $\log n/n$ (when $θ$ is constant) to $n^{-2/3}$ (when $θ$ is uniformly increasing in a certain sense). We argue that the bound presents a benchmark for the risk of any estimator in isotonic regression by proving nonasymptotic local minimax lower bounds. We prove an analogue of our bound for model misspecification where the true $θ$ is not necessarily nondecreasing.

preprint2014arXiv

Global risk bounds and adaptation in univariate convex regression

We consider the problem of nonparametric estimation of a convex regression function $ϕ_0$. We study the risk of the least squares estimator (LSE) under the natural squared error loss. We show that the risk is always bounded from above by $n^{-4/5}$ modulo logarithmic factors while being much smaller when $ϕ_0$ is well-approximable by a piecewise affine convex function with not too many affine pieces (in which case, the risk is at most $1/n$ up to logarithmic factors). On the other hand, when $ϕ_0$ has curvature, we show that no estimator can have risk smaller than a constant multiple of $n^{-4/5}$ in a very strong sense by proving a "local" minimax lower bound. We also study the case of model misspecification where we show that the LSE exhibits the same global behavior provided the loss is measured from the closest convex projection of the true regression function. In the process of deriving our risk bounds, we prove new results for the metric entropy of local neighborhoods of the space of univariate convex functions. These results, which may be of independent interest, demonstrate the non-uniform nature of the space of univariate convex functions in sharp contrast to classical function spaces based on smoothness constraints.

preprint2014arXiv

On the impossibility of constructing good population mean estimators in a realistic Respondent Driven Sampling model

Current methods for population mean estimation from data collected by Respondent Driven Sampling (RDS) are based on the Horvitz-Thompson estimator together with a set of assumptions on the sampling model under which the inclusion probabilities can be determined from the information contained in the data. In this paper, we argue that such set of assumptions are too simplistic to be realistic and that under realistic sampling models, the situation is far more complicated. Specifically, we study a realistic RDS sampling model that is motivated by a real world RDS dataset. We show that, for this model, the inclusion probabilities, which are necessary for the application of the Horvitz-Thompson estimator, can not be determined by the information in the sample alone. An implication is that, unless additional information about the underlying population network is obtained, it is hopeless to conceive of a general theory of population mean estimation from current RDS data.

preprint2013arXiv

Sharp Inequalities for $f$-divergences

$f$-divergences are a general class of divergences between probability measures which include as special cases many commonly used divergences in probability, mathematical statistics and information theory such as Kullback-Leibler divergence, chi-squared divergence, squared Hellinger distance, total variation distance etc. In this paper, we study the problem of maximizing or minimizing an $f$-divergence between two probability measures subject to a finite number of constraints on other $f$-divergences. We show that these infinite-dimensional optimization problems can all be reduced to optimization problems over small finite dimensional spaces which are tractable. Our results lead to a comprehensive and unified treatment of the problem of obtaining sharp inequalities between $f$-divergences. We demonstrate that many of the existing results on inequalities between $f$-divergences can be obtained as special cases of our results and we also improve on some existing non-sharp inequalities.

preprint2012arXiv

Covering Numbers for Convex Functions

In this paper we study the covering numbers of the space of convex and uniformly bounded functions in multi-dimension. We find optimal upper and lower bounds for the $ε$-covering number of $\C([a, b]^d, B)$, in the $L_p$-metric, $1 \le p < \infty$, in terms of the relevant constants, where $d \geq 1$, $a < b \in \mathbb{R}$, $B>0$, and $\C([a,b]^d, B)$ denotes the set of all convex functions on $[a, b]^d$ that are uniformly bounded by $B$. We summarize previously known results on covering numbers for convex functions and also provide alternate proofs of some known results. Our results have direct implications in the study of rates of convergence of empirical minimization procedures as well as optimal convergence rates in the numerous convexity constrained function estimation problems.

preprint2011arXiv

Lower bounds for the minimax risk using $f$-divergences and applications

Lower bounds involving $f$-divergences between the underlying probability measures are proved for the minimax risk in estimation problems. Our proofs just use simple convexity facts. Special cases and straightforward corollaries of our bounds include well known inequalities for establishing minimax lower bounds such as Fano's inequality, Pinsker's inequality and inequalities based on global entropy conditions. Two applications are provided: a new minimax lower bound for the reconstruction of convex bodies from noisy support function measurements and a different proof of a recent minimax lower bound for the estimation of a covariance matrix.