Source author record

Gavin Brown

Gavin Brown appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.AG Information Theory math.IT math.CO Applications Artificial Intelligence Computation and Language Computer Science and Game Theory Computer Vision Distributed, Parallel, and Cluster Computing Programming Languages

Catalog footprint

What is connected

19works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Self-Improvement for Fast, High-Quality Plan Generation

Generative models trained on synthetic plan data are a promising approach to generalized planning. Recent work has focused on finding any valid plan, rather than a high-quality solution. We address the challenge of producing high-quality plans, a computationally hard problem, in sub-exponential time. First, we demonstrate that, given optimal data, a decoder-only transformer can generate high-quality plans for unseen problem instances. Second, we show how to self-improve an initial model trained on sub-optimal data. Each round of self-improvement combines multiple model calls with graph search to generate improved plans, used for model fine-tuning. An experimental study on four domains: Blocksworld, Logistics, Labyrinth, and Sokoban, shows on average a 30% reduction in plan length over the source symbolic planner, with over 80% of plans being optimal, where the optimum is known. Plan quality is further improved by inference-time search. The model's latency scales sub-exponentially in contrast to the satisficing and optimal symbolic planners to which we compare. Together, these results suggest that self-improvement with generative models offers a scalable approach for high-quality plan generation.

preprint2022arXiv

Bias-Variance Decompositions for Margin Losses

We introduce a novel bias-variance decomposition for a range of strictly convex margin losses, including the logistic loss (minimized by the classic LogitBoost algorithm), as well as the squared margin loss and canonical boosting loss. Furthermore, we show that, for all strictly convex margin losses, the expected risk decomposes into the risk of a "central" model and a term quantifying variation in the functional margin with respect to variations in the training data. These decompositions provide a diagnostic tool for practitioners to understand model overfitting/underfitting, and have implications for additive ensemble models -- for example, when our bias-variance decomposition holds, there is a corresponding "ambiguity" decomposition, which can be used to quantify model diversity.

preprint2022arXiv

Kawamata boundedness for Fano threefolds and the Graded Ring Database

We explain an effective Kawamata boundedness result for Mori-Fano 3-folds. In particular, we describe a list of 39,550 possible Hilbert series of semistable Mori-Fano 3-folds, with examples to explain its meaning, its relationship to known classifications and the wealth of more general Fano 3-folds it contains, as well as its application to the on-going classification of Fano 3-folds.

preprint2022arXiv

Performative Prediction in a Stateful World

Deployed supervised machine learning models make predictions that interact with and influence the world. This phenomenon is called performative prediction by Perdomo et al. (ICML 2020). It is an ongoing challenge to understand the influence of such predictions as well as design tools so as to control that influence. We propose a theoretical framework where the response of a target population to the deployed classifier is modeled as a function of the classifier and the current state (distribution) of the population. We show necessary and sufficient conditions for convergence to an equilibrium of two retraining algorithms, repeated risk minimization and a lazier variant. Furthermore, convergence is near an optimal classifier. We thus generalize results of Perdomo et al., whose performativity framework does not assume any dependence on the state of the target population. A particular phenomenon captured by our model is that of distinct groups that acquire information and resources at different rates to be able to respond to the latest deployed classifier. We study this phenomenon theoretically and empirically.

preprint2022arXiv

Strong Memory Lower Bounds for Learning Natural Models

We give lower bounds on the amount of memory required by one-pass streaming algorithms for solving several natural learning problems. In a setting where examples lie in $\{0,1\}^d$ and the optimal classifier can be encoded using $κ$ bits, we show that algorithms which learn using a near-minimal number of examples, $\tilde O(κ)$, must use $\tilde Ω( dκ)$ bits of space. Our space bounds match the dimension of the ambient space of the problem's natural parametrization, even when it is quadratic in the size of examples and the final classifier. For instance, in the setting of $d$-sparse linear classifiers over degree-2 polynomial features, for which $κ=Θ(d\log d)$, our space lower bound is $\tildeΩ(d^2)$. Our bounds degrade gracefully with the stream length $N$, generally having the form $\tildeΩ\left(dκ\cdot \fracκ{N}\right)$. Bounds of the form $Ω(dκ)$ were known for learning parity and other problems defined over finite fields. Bounds that apply in a narrow range of sample sizes are also known for linear regression. Ours are the first such bounds for problems of the type commonly seen in recent learning applications that apply for a large range of input sizes.

preprint2022arXiv

Toric Sarkisov links of toric Fano varieties

We explain a web of Sarkisov links that overlies the classification of Fano weighted projective spaces in dimensions 3 and 4, extending results of Prokhorov.

preprint2020arXiv

Better Boosting with Bandits for Online Learning

Probability estimates generated by boosting ensembles are poorly calibrated because of the margin maximization nature of the algorithm. The outputs of the ensemble need to be properly calibrated before they can be used as probability estimates. In this work, we demonstrate that online boosting is also prone to producing distorted probability estimates. In batch learning, calibration is achieved by reserving part of the training data for training the calibrator function. In the online setting, a decision needs to be made on each round: shall the new example(s) be used to update the parameters of the ensemble or those of the calibrator. We proceed to resolve this decision with the aid of bandit optimization algorithms. We demonstrate superior performance to uncalibrated and naively-calibrated on-line boosting ensembles in terms of probability estimation. Our proposed mechanism can be easily adapted to other tasks(e.g. cost-sensitive classification) and is robust to the choice of hyperparameters of both the calibrator and the ensemble.

preprint2020arXiv

Margin Maximization as Lossless Maximal Compression

The ultimate goal of a supervised learning algorithm is to produce models constructed on the training data that can generalize well to new examples. In classification, functional margin maximization -- correctly classifying as many training examples as possible with maximal confidence --has been known to construct models with good generalization guarantees. This work gives an information-theoretic interpretation of a margin maximizing model on a noiseless training dataset as one that achieves lossless maximal compression of said dataset -- i.e. extracts from the features all the useful information for predicting the label and no more. The connection offers new insights on generalization in supervised machine learning, showing margin maximization as a special case (that of classification) of a more general principle and explains the success and potential limitations of popular learning algorithms like gradient boosting. We support our observations with theoretical arguments and empirical evidence and identify interesting directions for future work.

preprint2020arXiv

To Ensemble or Not Ensemble: When does End-To-End Training Fail?

End-to-End training (E2E) is becoming more and more popular to train complex Deep Network architectures. An interesting question is whether this trend will continue-are there any clear failure cases for E2E training? We study this question in depth, for the specific case of E2E training an ensemble of networks. Our strategy is to blend the gradient smoothly in between two extremes: from independent training of the networks, up to to full E2E training. We find clear failure cases, where over-parameterized models cannot be trained E2E. A surprising result is that the optimum can sometimes lie in between the two, neither an ensemble or an E2E system. The work also uncovers links to Dropout, and raises questions around the nature of ensemble diversity and multi-branch networks.

preprint2016arXiv

Ranking Biomarkers Through Mutual Information

We study information theoretic methods for ranking biomarkers. In clinical trials there are two, closely related, types of biomarkers: predictive and prognostic, and disentangling them is a key challenge. Our first step is to phrase biomarker ranking in terms of optimizing an information theoretic quantity. This formalization of the problem will enable us to derive rankings of predictive/prognostic biomarkers, by estimating different, high dimensional, conditional mutual information terms. To estimate these terms, we suggest efficient low dimensional approximations, and we derive an empirical Bayes estimator, which is suitable for small or sparse datasets. Finally, we introduce a new visualisation tool that captures the prognostic and the predictive strength of a set of biomarkers. We believe this representation will prove to be a powerful tool in biomarker discovery.

preprint2015arXiv

Boosting Java Performance using GPGPUs

Heterogeneous programming has started becoming the norm in order to achieve better performance by running portions of code on the most appropriate hardware resource. Currently, significant engineering efforts are undertaken in order to enable existing programming languages to perform heterogeneous execution mainly on GPUs. In this paper we describe Jacc, an experimental framework which allows developers to program GPGPUs directly from Java. By using the Jacc framework, developers have the ability to add GPGPU support into their applications with minimal code refactoring. To simplify the development of GPGPU applications we allow developers to model heterogeneous code using two key abstractions: \textit{tasks}, which encapsulate all the information needed to execute code on a GPGPU; and \textit{task graphs}, which capture the inter-task control-flow of the application. Using this information the Jacc runtime is able to automatically handle data movement and synchronization between the host and the GPGPU; eliminating the need for explicitly managing disparate memory spaces. In order to generate highly parallel GPGPU code, Jacc provides developers with the ability to decorate key aspects of their code using annotations. The compiler, in turn, exploits this information in order to automatically generate code without requiring additional code refactoring. Finally, we demonstrate the advantages of Jacc, both in terms of programmability and performance, by evaluating it against existing Java frameworks. Experimental results show an average performance speedup of 32x and a 4.4x code decrease across eight evaluated benchmarks on a NVIDIA Tesla K20m GPU.

preprint2015arXiv

Diptych varieties. II: Polar varieties

Part I introduced diptych varieties $V_{ABLM}$ and gave a rigorous construction of them in the case $d,e\ge 2$ and $de>4$. Here we prove the existence of $V_{ABLM}$ in all the cases with $de\le4$. At the same time we construct some classes of interesting quasihomogeneous spaces for groups such as $\GL(2)\times\G_m^r$ based on the algebra of polars.

preprint2015arXiv

Modular Autoencoders for Ensemble Feature Extraction

We introduce the concept of a Modular Autoencoder (MAE), capable of learning a set of diverse but complementary representations from unlabelled data, that can later be used for supervised tasks. The learning of the representations is controlled by a trade off parameter, and we show on six benchmark datasets the optimum lies between two extremes: a set of smaller, independent autoencoders each with low capacity, versus a single monolithic encoding, outperforming an appropriate baseline. In the present paper we explore the special case of linear MAE, and derive an SVD-based algorithm which converges several orders of magnitude faster than gradient descent.

preprint2015arXiv

Polarized Calabi-Yau 3-folds in codimension 4

We construct Calabi-Yau 3-folds as orbifolds embedded in weighted projective space in codimension 4. For each Hilbert series that is realised, there are at least two different components of Calabi-Yau 3-folds.

preprint2013arXiv

ManTIME: Temporal expression identification and normalization in the TempEval-3 challenge

This paper describes a temporal expression identification and normalization system, ManTIME, developed for the TempEval-3 challenge. The identification phase combines the use of conditional random fields along with a post-processing identification pipeline, whereas the normalization phase is carried out using NorMA, an open-source rule-based temporal normalizer. We investigate the performance variation with respect to different feature types. Specifically, we show that the use of WordNet-based features in the identification task negatively affects the overall performance, and that there is no statistically significant difference in using gazetteers, shallow parsing and propositional noun phrases labels on top of the morphological features. On the test data, the best run achieved 0.95 (P), 0.85 (R) and 0.90 (F1) in the identification phase. Normalization accuracies are 0.84 (type attribute) and 0.77 (value attribute). Surprisingly, the use of the silver data (alone or in addition to the gold annotated ones) does not improve the performance.

preprint2013arXiv

Maps of toric varieties in Cox coordinates

The Cox ring provides a coordinate system on a toric variety analogous to the homogeneous coordinate ring of projective space. Rational maps between projective spaces are described using polynomials in the coordinate ring, and we generalise this to toric varieties, providing a unified description of arbitrary rational maps between toric arieties in terms of their Cox coordinates. Introducing formal roots of polynomials is necessary even in the simplest examples.

preprint2012arXiv

Diptych varieties. I

We present a new class of affine Gorenstein 6-folds obtained by smoothing the 1-dimensional singular locus of a reducible affine toric surface; their existence is established using explicit methods in toric geometry and serial use of Kustin-Miller Gorenstein unprojection. These varieties have applications as key varieties in constructing other varieties, including local models of Mori flips of Type A.

preprint2012arXiv

Seven new champion linear codes

We exhibit seven linear codes exceeding the current best known minimum distance d for their dimension k and block length n. Each code is defined over F_8, and their invariants [n,k,d] are given by [49,13,27], [49,14,26], [49,16,24], [49,17,23], [49,19,21], [49,25,16] and [49,26,15]. Our method includes an exhaustive search of all monomial evaluation codes generated by points in the [0,5]x[0,5] lattice square.

preprint2012arXiv

Small polygons and toric codes

We describe two different approaches to making systematic classifications of plane lattice polygons, and recover the toric codes they generate, over small fields, where these match or exceed the best known minimum distance. This includes a [36,19,12]-code over F_7 whose minimum distance 12 exceeds that of all previously known codes.

Gavin Brown

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

Self-Improvement for Fast, High-Quality Plan Generation

Bias-Variance Decompositions for Margin Losses

Kawamata boundedness for Fano threefolds and the Graded Ring Database

Performative Prediction in a Stateful World

Strong Memory Lower Bounds for Learning Natural Models

Toric Sarkisov links of toric Fano varieties

Better Boosting with Bandits for Online Learning

Margin Maximization as Lossless Maximal Compression

To Ensemble or Not Ensemble: When does End-To-End Training Fail?

Ranking Biomarkers Through Mutual Information

Boosting Java Performance using GPGPUs

Diptych varieties. II: Polar varieties

Modular Autoencoders for Ensemble Feature Extraction

Polarized Calabi-Yau 3-folds in codimension 4

ManTIME: Temporal expression identification and normalization in the TempEval-3 challenge

Maps of toric varieties in Cox coordinates

Diptych varieties. I

Seven new champion linear codes

Small polygons and toric codes