Researcher profile

Bodhisattva Sen

Bodhisattva Sen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
20works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

20 published item(s)

preprint2023arXiv

Multivariate, Heteroscedastic Empirical Bayes via Nonparametric Maximum Likelihood

Multivariate, heteroscedastic errors complicate statistical inference in many large-scale denoising problems. Empirical Bayes is attractive in such settings, but standard parametric approaches rest on assumptions about the form of the prior distribution which can be hard to justify and which introduce unnecessary tuning parameters. We extend the nonparametric maximum likelihood estimator (NPMLE) for Gaussian location mixture densities to allow for multivariate, heteroscedastic errors. NPMLEs estimate an arbitrary prior by solving an infinite-dimensional, convex optimization problem; we show that this convex optimization problem can be tractably approximated by a finite-dimensional version. The empirical Bayes posterior means based on an NPMLE have low regret, meaning they closely target the oracle posterior means one would compute with the true prior in hand. We prove an oracle inequality implying that the empirical Bayes estimator performs at nearly the optimal level (up to logarithmic factors) for denoising without prior knowledge. We provide finite-sample bounds on the average Hellinger accuracy of an NPMLE for estimating the marginal densities of the observations. We also demonstrate the adaptive and nearly-optimal properties of NPMLEs for deconvolution. We apply our method to two denoising problems in astronomy, constructing a fully data-driven color-magnitude diagram of 1.4 million stars in the Milky Way and investigating the distribution of 19 chemical abundance ratios for 27 thousand stars in the red clump. We also apply our method to hierarchical linear models, illustrating the advantages of nonparametric shrinkage of regression coefficients on an education data set and on a microarray data set.

preprint2022arXiv

Permuted and Unlinked Monotone Regression in $\mathbb{R}^d$: an approach based on mixture modeling and optimal transport

Suppose that we have a regression problem with response variable Y in $\mathbb{R}^d$ and predictor X in $\mathbb{R}^d$, for $d \geq 1$. In permuted or unlinked regression we have access to separate unordered data on X and Y, as opposed to data on (X,Y)-pairs in usual regression. So far in the literature the case $d=1$ has received attention, see e.g., the recent papers by Rigollet and Weed [Information & Inference, 8, 619--717] and Balabdaoui et al. [J. Mach. Learn. Res., 22(172), 1--60]. In this paper, we consider the general multivariate setting with $d \geq 1$. We show that the notion of cyclical monotonicity of the regression function is sufficient for identification and estimation in the permuted/unlinked regression model. We study permutation recovery in the permuted regression setting and develop a computationally efficient and easy-to-use algorithm for denoising based on the Kiefer-Wolfowitz [Ann. Math. Statist., 27, 887--906] nonparametric maximum likelihood estimator and techniques from the theory of optimal transport. We provide explicit upper bounds on the associated mean squared denoising error for Gaussian noise. As in previous work on the case $d = 1$, the permuted/unlinked setting involves slow (logarithmic) rates of convergence rooting in the underlying deconvolution problem. Numerical studies corroborate our theoretical analysis and show that the proposed approach performs at least on par with the methods in the aforementioned prior work in the case $d = 1$ while achieving substantial reductions in terms of computational complexity.

preprint2021arXiv

Semiparametric Efficiency in Convexity Constrained Single Index Model

We consider estimation and inference in a single index regression model with an unknown convex link function. We introduce a convex and Lipschitz constrained least squares estimator (CLSE) for both the parametric and the nonparametric components given independent and identically distributed observations. We prove the consistency and find the rates of convergence of the CLSE when the errors are assumed to have only $q \ge 2$ moments and are allowed to depend on the covariates. When $q\ge 5$, we establish $n^{-1/2}$-rate of convergence and asymptotic normality of the estimator of the parametric component. Moreover, the CLSE is proved to be semiparametrically efficient if the errors happen to be homoscedastic. {We develop and implement a numerically stable and computationally fast algorithm to compute our proposed estimator in the R package~\texttt{simest}}. We illustrate our methodology through extensive simulations and data analysis. Finally, our proof of efficiency is geometric and provides a general framework that can be used to prove efficiency of estimators in a wide variety of semiparametric models even when they do not satisfy the efficient score equation directly.

preprint2021arXiv

Tracing birth properties of stars with abundance clustering

To understand the formation and evolution of the Milky Way disk, we must connect its current properties to its past. We explore hydrodynamical cosmological simulations to investigate how the chemical abundances of stars might be linked to their origins. Using hierarchical clustering of abundance measurements in two Milky Way-like simulations with distributed and steady star formation histories, we find that abundance clusters of stars comprise different groups in birth place ($R_\text{birth}$) and time (age). Simulating observational abundance errors (0.05 dex), we find that to trace discrete groups of ($R_\text{birth}$, age) requires a large vector of abundances. Using 15-element abundances (Fe, O, Mg, S, Si, C, P, Mn, Ne, Al, N, V, Ba, Cr, Co), up to $\approx$ 10 clusters can be defined with $\approx$ 25% overlap in ($R_\text{birth}$, age). We build a simple model to show that it is possible to infer a star's age and $R_\text{birth}$ from abundances with precisions of $\pm$0.06 Gyr and $\pm$1.17 kpc respectively. We find that abundance clustering is ineffective for a third simulation, where low-$α$ stars form distributed in the disc and early high-$α$ stars form more rapidly in clumps that sink towards the galactic center as their constituent stars evolve to enrich the interstellar medium. However, this formation path leads to large age-dispersions across the [$α$/Fe]-[Fe/H] plane, which is inconsistent with the Milky Way's observed properties. We conclude that abundance clustering is a promising approach toward charting the history of our Galaxy.

preprint2020arXiv

Inference for local parameters in convexity constrained models

We consider the problem of inference for local parameters of a convex regression function $f_0: [0,1] \to \mathbb{R}$ based on observations from a standard nonparametric regression model, using the convex least squares estimator (LSE) $\widehat{f}_n$. For $x_0 \in (0,1)$, the local parameters include the pointwise function value $f_0(x_0)$, the pointwise derivative $f_0'(x_0)$, and the anti-mode (i.e., the smallest minimizer) of $f_0$. The existing limiting distribution of the estimation error $(\widehat{f}_n(x_0) - f_0(x_0), \widehat{f}_n'(x_0) - f_0'(x_0) )$ depends on the unknown second derivative $f_0''(x_0)$, and is therefore not directly applicable for inference. To circumvent this impasse, we show that the following locally normalized errors (LNEs) enjoy pivotal limiting behavior: Let $[\widehat{u}(x_0), \widehat{v}(x_0)]$ be the maximal interval containing $x_0$ where $\widehat{f}_n$ is linear. Then, under standard conditions, $$\binom{ \sqrt{n(\widehat{v}(x_0)-\widehat{u}(x_0))}(\widehat{f}_n(x_0)-f_0(x_0)) }{ \sqrt{n(\widehat{v}(x_0)-\widehat{u}(x_0))^3}(\widehat{f}_n'(x_0)-f_0'(x_0))} \rightsquigarrow σ\cdot \binom{\mathbb{L}^{(0)}_2}{\mathbb{L}^{(1)}_2},$$ where $n$ is the sample size, $σ$ is the standard deviation of the errors, and $\mathbb{L}^{(0)}_2, \mathbb{L}^{(1)}_2$ are universal random variables. This asymptotically pivotal LNE theory instantly yields a simple tuning-free procedure for constructing CIs with asymptotically exact coverage and optimal length for $f_0(x_0)$ and $f_0'(x_0)$. We also construct an asymptotically pivotal LNE for the anti-mode of $f_0$, and its limiting distribution does not even depend on $σ$. These asymptotically pivotal LNE theories are further extended to other convexity/concavity constrained models (e.g., log-concave density estimation) for which a limit distribution theory is available for problem-specific estimators.

preprint2020arXiv

Multivariate extensions of isotonic regression and total variation denoising via entire monotonicity and Hardy-Krause variation

We consider the problem of nonparametric regression when the covariate is $d$-dimensional, where $d \geq 1$. In this paper we introduce and study two nonparametric least squares estimators (LSEs) in this setting---the entirely monotonic LSE and the constrained Hardy-Krause variation LSE. We show that these two LSEs are natural generalizations of univariate isotonic regression and univariate total variation denoising, respectively, to multiple dimensions. We discuss the characterization and computation of these two LSEs obtained from $n$ data points. We provide a detailed study of their risk properties under the squared error loss and fixed uniform lattice design. We show that the finite sample risk of these LSEs is always bounded from above by $n^{-2/3}$ modulo logarithmic factors depending on $d$; thus these nonparametric LSEs avoid the curse of dimensionality to some extent. We also prove nearly matching minimax lower bounds. Further, we illustrate that these LSEs are particularly useful in fitting rectangular piecewise constant functions. Specifically, we show that the risk of the entirely monotonic LSE is almost parametric (at most $1/n$ up to logarithmic factors) when the true function is well-approximable by a rectangular piecewise constant entirely monotone function with not too many constant pieces. A similar result is also shown to hold for the constrained Hardy-Krause variation LSE for a simple subclass of rectangular piecewise constant functions. We believe that the proposed LSEs yield a novel approach to estimating multivariate functions using convex optimization that avoid the curse of dimensionality to some extent.

preprint2020arXiv

Tracing the assembly of the Milky Way's disk through abundance clustering

A major goal in the field of galaxy formation is to understand the formation of the Milky Way's disk. The first step toward doing this is to empirically describe its present state. We use the new high-dimensional dataset of 19 abundances from 27,135 red clump APOGEE stars to examine the distribution of clusters defined using abundances. We explore different dimensionality reduction techniques and implement a non-parametric agglomerate hierarchical clustering method. We see that groups defined using abundances are spatially separated, as a function of age. Furthermore, the abundance groups represent different distributions in the [Fe/H]-age plane. Ordering our clusters by age reveals patterns suggestive of the sequence of chemical enrichment in the disk over time. Our results indicate that a promising avenue to trace the details of the disk's assembly is via a full interpretation of the empirical connections we report.

preprint2014arXiv

Global risk bounds and adaptation in univariate convex regression

We consider the problem of nonparametric estimation of a convex regression function $ϕ_0$. We study the risk of the least squares estimator (LSE) under the natural squared error loss. We show that the risk is always bounded from above by $n^{-4/5}$ modulo logarithmic factors while being much smaller when $ϕ_0$ is well-approximable by a piecewise affine convex function with not too many affine pieces (in which case, the risk is at most $1/n$ up to logarithmic factors). On the other hand, when $ϕ_0$ has curvature, we show that no estimator can have risk smaller than a constant multiple of $n^{-4/5}$ in a very strong sense by proving a "local" minimax lower bound. We also study the case of model misspecification where we show that the LSE exhibits the same global behavior provided the loss is measured from the closest convex projection of the true regression function. In the process of deriving our risk bounds, we prove new results for the metric entropy of local neighborhoods of the space of univariate convex functions. These results, which may be of independent interest, demonstrate the non-uniform nature of the space of univariate convex functions in sharp contrast to classical function spaces based on smoothness constraints.

preprint2014arXiv

On Testing Independence and Goodness-of-fit in Linear Models

We consider a linear regression model and propose an omnibus test to simultaneously check the assumption of independence between the error and the predictor variables, and the goodness-of-fit of the parametric model. Our approach is based on testing for independence between the residual obtained from the parametric fit and the predictor using the Hilbert--Schmidt independence criterion (Gretton et al. (2008)). The proposed method requires no user-defined regularization, is simple to compute, based merely on pairwise distances between points in the sample, and is consistent against all alternatives. We develop distribution theory for the proposed test statistic, both under the null and the alternative hypotheses, and devise a bootstrap scheme to approximate its null distribution. We prove the consistency of the bootstrap scheme. A simulation study shows that our method has better power than its main competitors. Two real datasets are analyzed to demonstrate the scope and usefulness of our method.

preprint2014arXiv

Testing against a linear regression model using ideas from shape-restricted estimation

A formal likelihood ratio hypothesis test for the validity of a parametric regression function is proposed, using a large-dimensional, nonparametric double cone alternative. For example, the test against a constant function uses the alternative of increasing or decreasing regression functions, and the test against a linear function uses the convex or concave alternative. The proposed test is exact, unbiased and the critical value is easily computed. The power of the test increases to one as the sample size increases, under very mild assumptions -- even when the alternative is mis-specified. That is, the power of the test converges to one for any true regression function that deviates (in a non-degenerate way) from the parametric null hypothesis. We also formulate tests for the linear versus partial linear model, and consider the special case of the additive model. Simulations show that our procedure behaves well consistently when compared with other methods. Although the alternative fit is non-parametric, no tuning parameters are involved.

preprint2013arXiv

Bootstrapping a Change-Point Cox Model for Survival Data

This paper investigates the (in)-consistency of various bootstrap methods for making inference on a change-point in time in the Cox model with right censored survival data. A criterion is established for the consistency of any bootstrap method. It is shown that the usual nonparametric bootstrap is inconsistent for the maximum partial likelihood estimation of the change-point. A new model-based bootstrap approach is proposed and its consistency established. Simulation studies are carried out to assess the performance of various bootstrap schemes.

preprint2013arXiv

Model Based Bootstrap Methods for Interval Censored Data

We investigate the performance of model based bootstrap methods for constructing point-wise confidence intervals around the survival function with interval censored data. We show that bootstrapping from the nonparametric maximum likelihood estimator of the survival function is inconsistent for both the current status and case 2 interval censoring models. A model based smoothed bootstrap procedure is proposed and shown to be consistent. In addition, simulation studies are conducted to illustrate the (in)-consistency of the bootstrap methods. Our conclusions in the interval censoring model would extend more generally to estimators in regression models that exhibit non-standard rates of convergence.

preprint2012arXiv

Bootstrap confidence intervals for isotonic estimators in a stereological problem

Let $\mathbf{X}=(X_1,X_2,X_3)$ be a spherically symmetric random vector of which only $(X_1,X_2)$ can be observed. We focus attention on estimating F, the distribution function of the squared radius $Z:=X_1^2+X_2^2+X_3^2$, from a random sample of $(X_1,X_2)$. Such a problem arises in astronomy where $(X_1,X_2,X_3)$ denotes the three dimensional position of a star in a galaxy but we can only observe the projected stellar positions $(X_1,X_2)$. We consider isotonic estimators of F and derive their limit distributions. The results are nonstandard with a rate of convergence $\sqrt{n/{\log n}}$. The isotonized estimators of F have exactly half the limiting variance when compared to naive estimators, which do not incorporate the shape constraint. We consider the problem of constructing point-wise confidence intervals for F, state sufficient conditions for the consistency of a bootstrap procedure, and show that the conditions are met by the conventional bootstrap method (generating samples from the empirical distribution function).

preprint2012arXiv

Covering Numbers for Convex Functions

In this paper we study the covering numbers of the space of convex and uniformly bounded functions in multi-dimension. We find optimal upper and lower bounds for the $ε$-covering number of $\C([a, b]^d, B)$, in the $L_p$-metric, $1 \le p < \infty$, in terms of the relevant constants, where $d \geq 1$, $a < b \in \mathbb{R}$, $B>0$, and $\C([a,b]^d, B)$ denotes the set of all convex functions on $[a, b]^d$ that are uniformly bounded by $B$. We summarize previously known results on covering numbers for convex functions and also provide alternate proofs of some known results. Our results have direct implications in the study of rates of convergence of empirical minimization procedures as well as optimal convergence rates in the numerous convexity constrained function estimation problems.

preprint2011arXiv

A continuous mapping theorem for the smallest argmax functional

This paper introduces a version of the argmax continuous mapping theorem that applies to M-estimation problems in which the objective functions converge to a limiting process with multiple maximizers. The concept of the smallest maximizer of a function in the d-dimensional Skorohod space is introduced and its main properties are studied. The resulting continuous mapping theorem is applied to three problems arising in change-point regression analysis. Some of the results proved in connection to the d-dimensional Skorohod space are also of independent interest.

preprint2011arXiv

Change-point in stochastic design regression and the bootstrap

In this paper we study the consistency of different bootstrap procedures for constructing confidence intervals (CIs) for the unique jump discontinuity (change-point) in an otherwise smooth regression function in a stochastic design setting. This problem exhibits nonstandard asymptotics and we argue that the standard bootstrap procedures in regression fail to provide valid confidence intervals for the change-point. We propose a version of smoothed bootstrap, illustrate its remarkable finite sample performance in our simulation study, and prove the consistency of the procedure. The $m$ out of $n$ bootstrap procedure is also considered and shown to be consistent. We also provide sufficient conditions for any bootstrap procedure to be consistent in this scenario.

preprint2011arXiv

Threshold estimation based on a p-value framework in dose-response and regression settings

We use p-values to identify the threshold level at which a regression function takes off from its baseline value, a problem motivated by applications in toxicological and pharmacological dose-response studies and environmental statistics. We study the problem in two sampling settings: one where multiple responses can be obtained at a number of different covariate-levels and the other the standard regression setting involving limited number of response values at each covariate. Our procedure involves testing the hypothesis that the regression function is at its baseline at each covariate value and then computing the potentially approximate p-value of the test. An estimate of the threshold is obtained by fitting a piecewise constant function with a single jump discontinuity, otherwise known as a stump, to these observed p-values, as they behave in markedly different ways on the two sides of the threshold. The estimate is shown to be consistent and its finite sample properties are studied through simulations. Our approach is computationally simple and extends to the estimation of the baseline value of the regression function, heteroscedastic errors and to time-series. It is illustrated on some real data applications.

preprint2010arXiv

Fractals with point impact in functional linear regression

This paper develops a point impact linear regression model in which the trajectory of a continuous stochastic process, when evaluated at a sensitive time point, is associated with a scalar response. The proposed model complements and is more interpretable than the functional linear regression approach that has become popular in recent years. The trajectories are assumed to have fractal (self-similar) properties in common with a fractional Brownian motion with an unknown Hurst exponent. Bootstrap confidence intervals based on the least-squares estimator of the sensitive time point are developed. Misspecification of the point impact model by a functional linear model is also investigated. Non-Gaussian limit distributions and rates of convergence determined by the Hurst exponent play an important role.

preprint2010arXiv

Inconsistency of bootstrap: The Grenander estimator

In this paper, we investigate the (in)-consistency of different bootstrap methods for constructing confidence intervals in the class of estimators that converge at rate $n^{1/3}$. The Grenander estimator, the nonparametric maximum likelihood estimator of an unknown nonincreasing density function $f$ on $[0,\infty)$, is a prototypical example. We focus on this example and explore different approaches to constructing bootstrap confidence intervals for $f(t_0)$, where $t_0\in(0,\infty)$ is an interior point. We find that the bootstrap estimate, when generating bootstrap samples from the empirical distribution function $\mathbb{F}_n$ or its least concave majorant $\tilde{F}_n$, does not have any weak limit in probability. We provide a set of sufficient conditions for the consistency of any bootstrap method in this example and show that bootstrapping from a smoothed version of $\tilde{F}_n$ leads to strongly consistent estimators. The $m$ out of $n$ bootstrap method is also shown to be consistent while generating samples from $\mathbb{F}_n$ and $\tilde{F}_n$.

preprint2010arXiv

Threshold estimation based on a P-value framework

We use p-values as a discrepancy criterion for identifying the threshold value at which a regression function takes off from its baseline value -- a problem that is motivated by applications in omics experiments, systems engineering, pharmacological dose-response studies and astronomy. In this paper, we study the problem in a controlled sampling setting, where multiple responses, discrete or continuous, can be obtained at a number of different covariate-levels. Our procedure involves testing the hypothesis that the regression function is at its baseline at each covariate value using the sampled responses at that value and then computing the p-value of the test. An estimate of the threshold is provided by fitting a stump, i.e., a piecewise constant function with a single jump discontinuity, to the observed p-values, since the corresponding p-values behave in markedly different ways on different sides of the threshold. The estimate is shown to be consistent, as both the number of covariate values and the number of responses sampled at each value become large, and its finite sample properties are studied through an extensive simulation study. Our approach is computationally simple and can also be used to estimate the baseline value of the regression function. The procedure is illustrated on two motivating real data applications. Extensions to multiple thresholds are also briefly investigated.