Source author record

Danica J. Sutherland

Danica J. Sutherland appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Methodology Artificial Intelligence astro-ph.CO Computer Vision math.ST Statistics Theory Applications astro-ph.IM Neural and Evolutionary Computing

Catalog footprint

What is connected

21works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Better Supervisory Signals by Observing Learning Paths

Better-supervised models might have better performance. In this paper, we first clarify what makes for good supervision for a classification problem, and then explain two existing label refining methods, label smoothing and knowledge distillation, in terms of our proposed criterion. To further answer why and how better supervision emerges, we observe the learning path, i.e., the trajectory of the model's predictions during training, for each training sample. We find that the model can spontaneously refine "bad" labels through a "zig-zag" learning path, which occurs on both toy and real datasets. Observing the learning path not only provides a new perspective for understanding knowledge distillation, overfitting, and learning dynamics, but also reveals that the supervisory signal of a teacher network can be very unstable near the best points in training on real tasks. Inspired by this, we propose a new knowledge distillation scheme, Filter-KD, which improves downstream classification performance in various settings.

preprint2022arXiv

Making Look-Ahead Active Learning Strategies Feasible with Neural Tangent Kernels

We propose a new method for approximating active learning acquisition strategies that are based on retraining with hypothetically-labeled candidate data points. Although this is usually infeasible with deep networks, we use the neural tangent kernel to approximate the result of retraining, and prove that this approximation works asymptotically even in an active learning setup -- approximating "look-ahead" selection criteria with far less computation required. This also enables us to conduct sequential active learning, i.e. updating the model in a streaming regime, without needing to retrain the model with SGD after adding each new data point. Moreover, our querying strategy, which better understands how the model's predictions will change by adding new data points in comparison to the standard ("myopic") criteria, beats other look-ahead strategies by large margins, and achieves equal or better performance compared to state-of-the-art methods on several benchmark datasets in pool-based active learning.

preprint2022arXiv

Meta Two-Sample Testing: Learning Kernels for Testing with Limited Data

Modern kernel-based two-sample tests have shown great success in distinguishing complex, high-dimensional distributions with appropriate learned kernels. Previous work has demonstrated that this kernel learning procedure succeeds, assuming a considerable number of observed samples from each distribution. In realistic scenarios with very limited numbers of data samples, however, it can be challenging to identify a kernel powerful enough to distinguish complex distributions. We address this issue by introducing the problem of meta two-sample testing (M2ST), which aims to exploit (abundant) auxiliary data on related tasks to find an algorithm that can quickly identify a powerful test on new target tasks. We propose two specific algorithms for this task: a generic scheme which improves over baselines and a more tailored approach which performs even better. We provide both theoretical justification and empirical evidence that our proposed meta-testing schemes out-perform learning kernel-based tests directly from scarce observations, and identify when such schemes will be successful.

preprint2022arXiv

Object Discovery via Contrastive Learning for Weakly Supervised Object Detection

Weakly Supervised Object Detection (WSOD) is a task that detects objects in an image using a model trained only on image-level annotations. Current state-of-the-art models benefit from self-supervised instance-level supervision, but since weak supervision does not include count or location information, the most common ``argmax'' labeling method often ignores many instances of objects. To alleviate this issue, we propose a novel multiple instance labeling method called object discovery. We further introduce a new contrastive loss under weak supervision where no instance-level information is available for sampling, called weakly supervised contrastive loss (WSCL). WSCL aims to construct a credible similarity threshold for object discovery by leveraging consistent features for embedding vectors in the same class. As a result, we achieve new state-of-the-art results on MS-COCO 2014 and 2017 as well as PASCAL VOC 2012, and competitive results on PASCAL VOC 2007.

preprint2022arXiv

Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting

We consider interpolation learning in high-dimensional linear regression with Gaussian data, and prove a generic uniform convergence guarantee on the generalization error of interpolators in an arbitrary hypothesis class in terms of the class's Gaussian width. Applying the generic bound to Euclidean norm balls recovers the consistency result of Bartlett et al. (2020) for minimum-norm interpolators, and confirms a prediction of Zhou et al. (2020) for near-minimal-norm interpolators in the special case of Gaussian data. We demonstrate the generality of the bound by applying it to the simplex, obtaining a novel consistency result for minimum l1-norm interpolators (basis pursuit). Our results show how norm-based generalization bounds can explain and be used to analyze benign overfitting, at least in some settings.

Danica J. Sutherland

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

Better Supervisory Signals by Observing Learning Paths

Making Look-Ahead Active Learning Strategies Feasible with Neural Tangent Kernels

Meta Two-Sample Testing: Learning Kernels for Testing with Limited Data

Object Discovery via Contrastive Learning for Weakly Supervised Object Detection

Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting

A Machine Learning Approach for Dynamical Mass Measurements of Galaxy Clusters

Bayesian Approaches to Distribution Regression

Deep Mean Maps

Demystifying MMD GANs

Does Invariant Risk Minimization Capture Invariance?

Efficient and principled score estimation with Nyström kernel exponential families

Fixing an error in Caponnetto and de Vito (2007)

Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy

Kernels on Sample Sets via Nonparametric Divergence Estimates

Learning deep kernels for exponential family densities

Learning Deep Kernels for Non-Parametric Two-Sample Tests

Linear-time Learning on Distributions with Approximate Kernel Embeddings

On gradient regularizers for MMD GANs

On Uniform Convergence and Low-Norm Interpolation Learning

The Role of Machine Learning in the Next Decade of Cosmology

Understanding the 2016 US Presidential Election using ecological inference and distribution regression with census microdata