Source author record

Jason M. Klusowski

Jason M. Klusowski appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computation math.ST Methodology

Catalog footprint

What is connected

8works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Classification Imbalance as Transfer Learning

Classification imbalance arises when one class is much rarer than the other. We frame this setting as transfer learning under label (prior) shift between an imbalanced source distribution induced by the observed data and a balanced target distribution under which performance is evaluated. Within this framework, we study a family of oversampling procedures that augment the training data by generating synthetic samples from an estimated minority-class distribution to roughly balance the classes, among which the celebrated SMOTE algorithm is a canonical example. We show that the excess risk decomposes into the rate achievable under balanced training (as if the data had been drawn from the balanced target distribution) and an additional term, the cost of transfer, which quantifies the discrepancy between the estimated and true minority-class distributions. In particular, we show that the cost of transfer for SMOTE dominates that of bootstrapping (random oversampling) in moderately high dimensions, suggesting that we should expect bootstrapping to have better performance than SMOTE in general. We corroborate these findings with experimental evidence. More broadly, our results provide guidance for choosing among augmentation strategies for imbalanced classification.

preprint2026arXiv

High-Dimensional Statistics: Reflections on Progress and Open Problems

Over the past two decades, the field of high-dimensional statistics has experienced substantial progress, driven largely by technological advances that have dramatically reduced the cost and effort for data collection and storage across a broad range of domains, including biology, medicine, astronomy, and the social and environmental sciences. Modern datasets are increasingly complex, often exhibiting rich dependency, heterogeneity, and other features that challenge traditional statistical methods. In response, high-dimensional statistics has evolved to address more sophisticated estimation and inference problems. This evolution has, in turn, fostered deep connections with and contributions to a wide range of research areas, including optimization, concentration of measure, random matrix theory, information theory, and theoretical computer science. Given the rapid pace of recent developments in high-dimensional statistics, our goal is to synthesize representative advances, highlight common themes and open problems, and point to important works that offer entry points into the field.

preprint2026arXiv

Revisiting Randomization in Greedy Model Search

Feature subsampling is a core component of random forests and other ensemble methods. While recent theory suggests that this randomization acts solely as a variance reduction mechanism analogous to ridge regularization, these results largely rely on base learners optimized via ordinary least squares. We investigate the effects of feature subsampling on greedy forward selection, a model that better captures the adaptive nature of decision trees. Assuming an orthogonal design, we prove that ensembling with feature subsampling can reduce both bias and variance, contrasting with the pure variance reduction of convex base learners. More precisely, we show that both the training error and degrees of freedom can be non-monotonic in the subsampling rate, breaking the analogy with standard shrinkage methods like the lasso or ridge regression. Furthermore, we characterize the exact asymptotic behavior of the estimator, showing that it adaptively reweights OLS coefficients based on their rank, with weights that are well-approximated by a logistic function. These results elucidate the distinct role of algorithmic randomization when interleaved with greedy optimization.

preprint2025arXiv

Stochastic Gradient Descent for Nonparametric Additive Regression

This paper introduces an iterative algorithm for training nonparametric additive models that enjoys favorable memory storage and computational requirements. The algorithm can be viewed as the functional counterpart of stochastic gradient descent, applied to the coefficients of a truncated basis expansion of the component functions. We show that the resulting estimator satisfies an oracle inequality that allows for model mis-specification. In the well-specified setting, by choosing the learning rate carefully across three distinct stages of training, we demonstrate that its risk is minimax optimal in terms of the dependence on both the dimensionality of the data and the size of the training sample. Unlike past work, we also provide polynomial convergence rates even when the covariates do not have full support on their domain.

preprint2021arXiv

Good Classifiers are Abundant in the Interpolating Regime

Within the machine learning community, the widely-used uniform convergence framework has been used to answer the question of how complex, over-parameterized models can generalize well to new data. This approach bounds the test error of the worst-case model one could have fit to the data, but it has fundamental limitations. Inspired by the statistical mechanics approach to learning, we formally define and develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers from several model classes. We apply our method to compute this distribution for several real and synthetic datasets, with both linear and random feature classification models. We find that test errors tend to concentrate around a small typical value $\varepsilon^*$, which deviates substantially from the test error of the worst-case interpolating model on the same datasets, indicating that "bad" classifiers are extremely rare. We provide theoretical results in a simple setting in which we characterize the full asymptotic distribution of test errors, and we show that these indeed concentrate around a value $\varepsilon^*$, which we also identify exactly. We then formalize a more general conjecture supported by our empirical findings. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice, and that approaches based on the statistical mechanics of learning may offer a promising alternative.

preprint2020arXiv

Analyzing CART

Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For binary classification and regression models, this approach recursively divides the data into two near-homogenous daughter nodes according to a split point that maximizes the reduction in sum of squares error (the impurity) along a particular variable. This paper aims to study the bias and adaptive properties of regression trees constructed with CART. In doing so, we derive an interesting connection between the bias and the mean decrease in impurity (MDI) measure of variable importance---a tool widely used for model interpretability---defined as the sum of impurity reductions over all non-terminal nodes in the tree. In particular, we show that the probability content of a terminal subnode for a variable is small when the MDI for that variable is large and that this relationship is exponential---confirming theoretically that decision trees with CART have small bias and are adaptive to signal strength and direction. Finally, we apply these individual tree bounds to tree ensembles and show consistency of Breiman's random forests. The context is surprisingly general and applies to a wide variety of multivariable data generating distributions and regression functions. The main technical tool is an exact characterization of the conditional probability content of the daughter nodes arising from an optimal split, in terms of the partial dependence function and reduction in impurity.

preprint2020arXiv

Sharp Analysis of a Simple Model for Random Forests

Random forests have become an important tool for improving accuracy in regression and classification problems since their inception by Leo Breiman in 2001. In this paper, we revisit a historically important random forest model originally proposed by Breiman in 2004 and later studied by Gérard Biau in 2012, where a feature is selected at random and the splits occurs at the midpoint of the node along the chosen feature. If the regression function is Lipschitz and depends only on a small subset of $ S $ out of $ d $ features, we show that, given access to $ n $ observations and properly tuned split probabilities, the mean-squared prediction error is $ O((n(\log n)^{(S-1)/2})^{-\frac{1}{S\log2+1}}) $. This positively answers an outstanding question of Biau about whether the rate of convergence for this random forest model could be improved. Furthermore, by a refined analysis of the approximation and estimation errors for linear models, we show that this rate cannot be improved in general. Finally, we generalize our analysis and improve extant prediction error bounds for another random forest model in which each tree is constructed from subsampled data and the splits are performed at the empirical median along a chosen feature.

preprint2016arXiv

Statistical Guarantees for Estimating the Centers of a Two-component Gaussian Mixture by EM

Recently, a general method for analyzing the statistical accuracy of the EM algorithm has been developed and applied to some simple latent variable models [Balakrishnan et al. 2016]. In that method, the basin of attraction for valid initialization is required to be a ball around the truth. Using Stein's Lemma, we extend these results in the case of estimating the centers of a two-component Gaussian mixture in $d$ dimensions. In particular, we significantly expand the basin of attraction to be the intersection of a half space and a ball around the origin. If the signal-to-noise ratio is at least a constant multiple of $ \sqrt{d\log d} $, we show that a random initialization strategy is feasible.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint