Researcher profile

Giles Hooker

Giles Hooker contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2026arXiv

Estimating Implicit Regularization in Deep Learning

Deep learning systems are known to exhibit implicit regularization (alt. implicit bias), favoring simple solutions instead of merely minimizing the loss function. In some cases, we can analytically derive the implicit regularization -- connecting it to an equivalent penalty that augments the learning objective. However, modern deep learning systems are complex, carrying modifications to the training procedure and architecture (e.g. early stopping, minibatching, dropout) whose effects are not always directly interpretable. Although estimating the resulting implicit regularization could aid theorists in algorithm design and practitioners in interpreting their hyperparameter choices, this problem has received little direct attention. It is also tractable: regularization makes weight updates deviate from loss gradients, promising a signal for identifying implicit bias. Here we provide gradient matching methods that can be used to empirically estimate the implicit regularization. Our method works on networks with known regularization, recovering popular explicit penalties like $\ell_1$ and $\ell_2$. It also replicates known implicit effects, like the quadratic weight penalty induced by early stopping in gradient descent, demonstrating that it can be used to test theories of implicit regularization. Crucially, because our method is empirical, it can handle implicit regularization in arbitrary networks. We demonstrate this use by characterizing the effects of dropout in deep networks, showing implicit $\ell_2$ effects in this popular method. Our work shows that practitioners can use gradient matching to understand regularization in networks with implicit biases that are too complicated to derive analytically.

preprint2022arXiv

The Infinitesimal Jackknife and Combinations of Models

The Infinitesimal Jackknife is a general method for estimating variances of parametric models, and more recently also for some ensemble methods. In this paper we extend the Infinitesimal Jackknife to estimate the covariance between any two models. This can be used to quantify uncertainty for combinations of models, or to construct test statistics for comparing different models or ensembles of models fitted using the same training dataset. Specific examples in this paper use boosted combinations of models like random forests and M-estimators. We also investigate its application on neural networks and ensembles of XGBoost models. We illustrate the efficacy of variance estimates through extensive simulations and its application to the Beijing Housing data, and demonstrate the theoretical consistency of the Infinitesimal Jackknife covariance estimate.

preprint2021arXiv

Bridging Breiman's Brook: From Algorithmic Modeling to Statistical Learning

In 2001, Leo Breiman wrote of a divide between "data modeling" and "algorithmic modeling" cultures. Twenty years later this division feels far more ephemeral, both in terms of assigning individuals to camps, and in terms of intellectual boundaries. We argue that this is largely due to the "data modelers" incorporating algorithmic methods into their toolbox, particularly driven by recent developments in the statistical understanding of Breiman's own Random Forest methods. While this can be simplistically described as "Breiman won", these same developments also expose the limitations of the prediction-first philosophy that he espoused, making careful statistical analysis all the more important. This paper outlines these exciting recent developments in the random forest literature which, in our view, occurred as a result of a necessary blending of the two ways of thinking Breiman originally described. We also ask what areas statistics and statisticians might currently overlook.

preprint2021arXiv

Generalised Boosted Forests

This paper extends recent work on boosting random forests to model non-Gaussian responses. Given an exponential family $\mathbb{E}[Y|X] = g^{-1}(f(X))$ our goal is to obtain an estimate for $f$. We start with an MLE-type estimate in the link space and then define generalised residuals from it. We use these residuals and some corresponding weights to fit a base random forest and then repeat the same to obtain a boost random forest. We call the sum of these three estimators a \textit{generalised boosted forest}. We show with simulated and real data that both the random forest steps reduces test-set log-likelihood, which we treat as our primary metric. We also provide a variance estimator, which we can obtain with the same computational cost as the original estimate itself. Empirical experiments on real-world data and simulations demonstrate that the methods can effectively reduce bias, and that confidence interval coverage is conservative in the bulk of the covariate distribution.

preprint2020arXiv

$V$-statistics and Variance Estimation

This paper develops a general framework for analyzing asymptotics of $V$-statistics. Previous literature on limiting distribution mainly focuses on the cases when $n \to \infty$ with fixed kernel size $k$. Under some regularity conditions, we demonstrate asymptotic normality when $k$ grows with $n$ by utilizing existing results for $U$-statistics. The key in our approach lies in a mathematical reduction to $U$-statistics by designing an equivalent kernel for $V$-statistics. We also provide a unified treatment on variance estimation for both $U$- and $V$-statistics by observing connections to existing methods and proposing an empirically more accurate estimator. Ensemble methods such as random forests, where multiple base learners are trained and aggregated for prediction purposes, serve as a running example throughout the paper because they are a natural and flexible application of $V$-statistics.

preprint2020arXiv

Boosting Random Forests to Reduce Bias; One-Step Boosted Forest and its Variance Estimate

In this paper we propose using the principle of boosting to reduce the bias of a random forest prediction in the regression setting. From the original random forest fit we extract the residuals and then fit another random forest to these residuals. We call the sum of these two random forests a \textit{one-step boosted forest}. We show with simulated and real data that the one-step boosted forest has a reduced bias compared to the original random forest. The paper also provides a variance estimate of the one-step boosted forest by an extension of the infinitesimal Jackknife estimator. Using this variance estimate we can construct prediction intervals for the boosted forest and we show that they have good coverage probabilities. Combining the bias reduction and the variance estimate we show that the one-step boosted forest has a significant reduction in predictive mean squared error and thus an improvement in predictive performance. When applied on datasets from the UCI database, one-step boosted forest performs better than random forest and gradient boosting machine algorithms. Theoretically we can also extend such a boosting process to more than one step and the same principles outlined in this paper can be used to find variance estimates for such predictors. Such boosting will reduce bias even further but it risks over-fitting and also increases the computational burden.

preprint2020arXiv

Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additive Models

Models which estimate main effects of individual variables alongside interaction effects have an identifiability challenge: effects can be freely moved between main effects and interaction effects without changing the model prediction. This is a critical problem for interpretability because it permits "contradictory" models to represent the same function. To solve this problem, we propose pure interaction effects: variance in the outcome which cannot be represented by any smaller subset of features. This definition has an equivalence with the Functional ANOVA decomposition. To compute this decomposition, we present a fast, exact algorithm that transforms any piecewise-constant function (such as a tree-based model) into a purified, canonical representation. We apply this algorithm to Generalized Additive Models with interactions trained on several datasets and show large disparity, including contradictions, between the effects before and after purification. These results underscore the need to specify data distributions and ensure identifiability before interpreting model parameters.

preprint2020arXiv

Selecting the Derivative of a Functional Covariate in Scalar-on-Function Regression

This paper presents tests to formally choose between regression models using different derivatives of a functional covariate in scalar-on-function regression. We demonstrate that for linear regression, models using different derivatives can be nested within a model that includes point-impact effects at the end-points of the observed functions. Contrasts can then be employed to test the specification of different derivatives. When nonlinear regression models are defined, we apply a $J$ test to determine the statistical significance of the nonlinear structure between a functional covariate and a scalar response. The finite-sample performance of these methods is verified in simulation, and their practical application is demonstrated using a chemometric data set.

preprint2020arXiv

Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable

Ensembles of decision trees perform well on many problems, but are not interpretable. In contrast to existing approaches in interpretability that focus on explaining relationships between features and predictions, we propose an alternative approach to interpret tree ensemble classifiers by surfacing representative points for each class -- prototypes. We introduce a new distance for Gradient Boosted Tree models, and propose new, adaptive prototype selection methods with theoretical guarantees, with the flexibility to choose a different number of prototypes in each class. We demonstrate our methods on random forests and gradient boosted trees, showing that the prototypes can perform as well as or even better than the original tree ensemble when used as a nearest-prototype classifier. In a user study, humans were better at predicting the output of a tree ensemble classifier when using prototypes than when using Shapley values, a popular feature attribution method. Hence, prototypes present a viable alternative to feature-based explanations for tree ensembles.

preprint2020arXiv

Unbiased Measurement of Feature Importance in Tree-Based Methods

We propose a modification that corrects for split-improvement variable importance measures in Random Forests and other tree-based methods. These methods have been shown to be biased towards increasing the importance of features with more potential splits. We show that by appropriately incorporating split-improvement as measured on out of sample data, this bias can be corrected yielding better summaries and screening tools.