Source author record

Rajarshi Mukherjee

Rajarshi Mukherjee appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory Machine Learning Methodology Artificial Intelligence Computation Information Theory math.IT math.NA

Catalog footprint

What is connected

14works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Asymptotic Inference for Constrained Regression

We consider statistical inference in high-dimensional regression problems under affine constraints on the parameter space. The theoretical study of this is motivated by the study of genetic determinants of diseases, such as diabetes, using external information from mediating protein expression levels. Specifically, we develop rigorous methods for estimating genetic effects on diabetes-related continuous outcomes when these associations are constrained based on external information about genetic determinants of proteins, and genetic relationships between proteins and the outcome of interest. In this regard, we discuss multiple candidate estimators and study their theoretical properties, sharp large sample optimality, and numerical qualities under a high-dimensional proportional asymptotic framework.

preprint2022arXiv

On Statistical Inference with High Dimensional Sparse CCA

We consider asymptotically exact inference on the leading canonical correlation directions and strengths between two high dimensional vectors under sparsity restrictions. In this regard, our main contribution is the development of a loss function, based on which, one can operationalize a one-step bias-correction on reasonable initial estimators. Our analytic results in this regard are adaptive over suitable structural restrictions of the high dimensional nuisance parameters, which, in this set-up, correspond to the covariance matrices of the variables of interest. We further supplement the theoretical guarantees behind our procedures with extensive numerical studies.

preprint2022arXiv

On the Existence of Universal Lottery Tickets

The lottery ticket hypothesis conjectures the existence of sparse subnetworks of large randomly initialized deep neural networks that can be successfully trained in isolation. Recent work has experimentally observed that some of these tickets can be practically reused across a variety of tasks, hinting at some form of universality. We formalize this concept and theoretically prove that not only do such universal tickets exist but they also do not require further training. Our proofs introduce a couple of technical innovations related to pruning for strong lottery tickets, including extensions of subset sum results and a strategy to leverage higher amounts of depth. Our explicit sparse constructions of universal function families might be of independent interest, as they highlight representational benefits induced by univariate convolutional architectures.

preprint2022arXiv

Sparse Uniformity Testing

In this paper we consider the uniformity testing problem for high-dimensional discrete distributions (multinomials) under sparse alternatives. More precisely, we derive sharp detection thresholds for testing, based on $n$ samples, whether a discrete distribution supported on $d$ elements differs from the uniform distribution only in $s$ (out of the $d$) coordinates and is $\varepsilon$-far (in total variation distance) from uniformity. Our results reveal various interesting phase transitions which depend on the interplay of the sample size $n$ and the signal strength $\varepsilon$ with the dimension $d$ and the sparsity level $s$. For instance, if the sample size is less than a threshold (which depends on $d$ and $s$), then all tests are asymptotically powerless, irrespective of the magnitude of the signal strength. On the other hand, if the sample size is above the threshold, then the detection boundary undergoes a further phase transition depending on the signal strength. Here, a $χ^2$-type test attains the detection boundary in the dense regime, whereas in the sparse regime a Bonferroni correction of two maximum-type tests and a version of the Higher Criticism test is optimal up to sharp constants. These results combined provide a complete description of the phase diagram for the sparse uniformity testing problem across all regimes of the parameters $n$, $d$, and $s$. One of the challenges in dealing with multinomials is that the parameters are always constrained to lie in the simplex. This results in the aforementioned two-layered phase transition, a new phenomenon which does not arise in classical high-dimensional sparse testing problems.

preprint2022arXiv

Towards a Unified Framework for Uncertainty-aware Nonlinear Variable Selection with Theoretical Guarantees

We develop a simple and unified framework for nonlinear variable selection that incorporates uncertainty in the prediction function and is compatible with a wide range of machine learning models (e.g., tree ensembles, kernel methods, neural networks, etc). In particular, for a learned nonlinear model $f(\mathbf{x})$, we consider quantifying the importance of an input variable $\mathbf{x}^j$ using the integrated partial derivative $Ψ_j = \Vert \frac{\partial}{\partial \mathbf{x}^j} f(\mathbf{x})\Vert^2_{P_\mathcal{X}}$. We then (1) provide a principled approach for quantifying variable selection uncertainty by deriving its posterior distribution, and (2) show that the approach is generalizable even to non-differentiable models such as tree ensembles. Rigorous Bayesian nonparametric theorems are derived to guarantee the posterior consistency and asymptotic uncertainty of the proposed approach. Extensive simulations and experiments on healthcare benchmark datasets confirm that the proposed algorithm outperforms existing classic and recent variable selection methods.

preprint2021arXiv

On Estimation of $L_{r}$-Norms in Gaussian White Noise Models

We provide a complete picture of asymptotically minimax estimation of $L_r$-norms (for any $r\ge 1$) of the mean in Gaussian white noise model over Nikolskii-Besov spaces. In this regard, we complement the work of Lepski, Nemirovski and Spokoiny (1999), who considered the cases of $r=1$ (with poly-logarithmic gap between upper and lower bounds) and $r$ even (with asymptotically sharp upper and lower bounds) over Hölder spaces. We additionally consider the case of asymptotically adaptive minimax estimation and demonstrate a difference between even and non-even $r$ in terms of an investigator's ability to produce asymptotically adaptive minimax estimators without paying a penalty.

preprint2021arXiv

Semi-Supervised Off Policy Reinforcement Learning

Reinforcement learning (RL) has shown great success in estimating sequential treatment strategies which take into account patient heterogeneity. However, health-outcome information, which is used as the reward for reinforcement learning methods, is often not well coded but rather embedded in clinical notes. Extracting precise outcome information is a resource intensive task, so most of the available well-annotated cohorts are small. To address this issue, we propose a semi-supervised learning (SSL) approach that efficiently leverages a small sized labeled data with true outcome observed, and a large unlabeled data with outcome surrogates. In particular, we propose a semi-supervised, efficient approach to Q-learning and doubly robust off policy value estimation. Generalizing SSL to sequential treatment regimes brings interesting challenges: 1) Feature distribution for Q-learning is unknown as it includes previous outcomes. 2) The surrogate variables we leverage in the modified SSL framework are predictive of the outcome but not informative to the optimal policy or value function. We provide theoretical results for our Q-function and value function estimators to understand to what degree efficiency can be gained from SSL. Our method is at least as efficient as the supervised approach, and moreover safe as it robust to mis-specification of the imputation models.

preprint2020arXiv

On Minimax Exponents of Sparse Testing

We consider exact asymptotics of the minimax risk for global testing against sparse alternatives in the context of high dimensional linear regression. Our results characterize the leading order behavior of this minimax risk in several regimes, uncovering new phase transitions in its behavior. This complements a vast literature characterizing asymptotic consistency in this problem, and provides a useful benchmark, against which the performance of specific tests may be compared. Finally, we provide some preliminary evidence that popular sparsity adaptive procedures might be sub-optimal in terms of the minimax risk.

preprint2020arXiv

On nearly assumption-free tests of nominal confidence interval coverage for causal parameters estimated by machine learning

For many causal effect parameters of interest, doubly robust machine learning (DRML) estimators $\hatψ_{1}$ are the state-of-the-art, incorporating the good prediction performance of machine learning; the decreased bias of doubly robust estimators; and the analytic tractability and bias reduction of sample splitting with cross fitting. Nonetheless, even in the absence of confounding by unmeasured factors, the nominal $(1 - α)$ Wald confidence interval $\hatψ_{1} \pm z_{α/ 2} \widehat{\mathsf{se}} [\hatψ_{1}]$ may still undercover even in large samples, because the bias of $\hatψ_{1}$ may be of the same or even larger order than its standard error of order $n^{-1/2}$. In this paper, we introduce essentially assumption-free tests that (i) can falsify the null hypothesis that the bias of $\hatψ_{1}$ is of smaller order than its standard error, (ii) can provide an upper confidence bound on the true coverage of the Wald interval, and (iii) are valid under the null under no smoothness/sparsity assumptions on the nuisance parameters. The tests, which we refer to as \underline{A}ssumption \underline{F}ree \underline{E}mpirical \underline{C}overage \underline{T}ests (AFECTs), are based on a U-statistic that estimates part of the bias of $\hatψ_{1}$.

preprint2020arXiv

Rejoinder: On nearly assumption-free tests of nominal confidence interval coverage for causal parameters estimated by machine learning

This is the rejoinder to the discussion by Kennedy, Balakrishnan and Wasserman on the paper "On nearly assumption-free tests of nominal confidence interval coverage for causal parameters estimated by machine learning" published in Statistical Science.

preprint2016arXiv

Asymptotic Normality of Scrambled Geometric Net Quadrature

In a very recent work, Basu and Owen (2015) propose the use of scrambled geometric nets in numerical integration when the domain is a product of $s$ arbitrary spaces of dimension $d$ having a certain partitioning constraint. It was shown that for a class of smooth functions, the integral estimate has variance $O( n^{-1 -2/d} (\log n)^{s-1})$ for scrambled geometric nets, compared to $O(n^{-1})$ for ordinary Monte Carlo. The main idea of this paper is to develop on the work by Loh (2003), to show that the scrambled geometric net estimate has an asymptotic normal distribution for certain smooth functions defined on products of suitable subsets of $\mathbb{R}^d$.

preprint2016arXiv

Lepski's Method and Adaptive Estimation of Nonlinear Integral Functionals of Density

We study the adaptive minimax estimation of non-linear integral functionals of a density and extend the results obtained for linear and quadratic functionals to general functionals. The typical rate optimal non-adaptive minimax estimators of "smooth" non-linear functionals are higher order U-statistics. Since Lepski's method requires tight control of tails of such estimators, we bypass such calculations by a modification of Lepski's method which is applicable in such situations. As a necessary ingredient, we also provide a method to control higher order moments of minimax estimator of cubic integral functionals. Following a standard constrained risk inequality method, we also show the optimality of our adaptation rates.

preprint2016arXiv

Optimal Adaptive Inference in Random Design Binary Regression

We construct confidence sets for the regression function in nonparametric binary regression with an unknown design density. These confidence sets are adaptive in $L^2$ loss over a continuous class of Sobolev type spaces. Adaptation holds in the smoothness of the regression function, over the maximal parameter spaces where adaptation is possible, provided the design density is smooth enough. We identify two key regimes --- one where adaptation is possible, and one where some critical regions must be removed. We address related questions about goodness of fit testing and adaptive estimation of relevant parameters.

preprint2015arXiv

Hypothesis testing for high-dimensional sparse binary regression

In this paper, we study the detection boundary for minimax hypothesis testing in the context of high-dimensional, sparse binary regression models. Motivated by genetic sequencing association studies for rare variant effects, we investigate the complexity of the hypothesis testing problem when the design matrix is sparse. We observe a new phenomenon in the behavior of detection boundary which does not occur in the case of Gaussian linear regression. We derive the detection boundary as a function of two components: a design matrix sparsity index and signal strength, each of which is a function of the sparsity of the alternative. For any alternative, if the design matrix sparsity index is too high, any test is asymptotically powerless irrespective of the magnitude of signal strength. For binary design matrices with the sparsity index that is not too high, our results are parallel to those in the Gaussian case. In this context, we derive detection boundaries for both dense and sparse regimes. For the dense regime, we show that the generalized likelihood ratio is rate optimal; for the sparse regime, we propose an extended Higher Criticism Test and show it is rate optimal and sharp. We illustrate the finite sample properties of the theoretical results using simulation studies.

Rajarshi Mukherjee

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Asymptotic Inference for Constrained Regression

On Statistical Inference with High Dimensional Sparse CCA

On the Existence of Universal Lottery Tickets

Sparse Uniformity Testing

Towards a Unified Framework for Uncertainty-aware Nonlinear Variable Selection with Theoretical Guarantees

On Estimation of $L_{r}$-Norms in Gaussian White Noise Models

Semi-Supervised Off Policy Reinforcement Learning

On Minimax Exponents of Sparse Testing

On nearly assumption-free tests of nominal confidence interval coverage for causal parameters estimated by machine learning

Rejoinder: On nearly assumption-free tests of nominal confidence interval coverage for causal parameters estimated by machine learning

Asymptotic Normality of Scrambled Geometric Net Quadrature

Lepski's Method and Adaptive Estimation of Nonlinear Integral Functionals of Density

Optimal Adaptive Inference in Random Design Binary Regression

Hypothesis testing for high-dimensional sparse binary regression