Source author record

Ekin D. Cubuk

Ekin D. Cubuk appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision cond-mat.soft cond-mat.mtrl-sci cond-mat.dis-nn physics.comp-ph

Catalog footprint

What is connected

11works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

No One Representation to Rule Them All: Overlapping Features of Training Methods

Despite being able to capture a range of features of the data, high accuracy models trained with supervision tend to make similar predictions. This seemingly implies that high-performing models share similar biases regardless of training methodology, which would limit ensembling benefits and render low-accuracy models as having little practical use. Against this backdrop, recent work has developed quite different training techniques, such as large-scale contrastive learning, yielding competitively high accuracy on generalization and robustness benchmarks. This motivates us to revisit the assumption that models necessarily learn similar functions. We conduct a large-scale empirical study of models across hyper-parameters, architectures, frameworks, and datasets. We find that model pairs that diverge more in training methodology display categorically different generalization behavior, producing increasingly uncorrelated errors. We show these models specialize in subdomains of the data, leading to higher ensemble performance: with just 2 models (each with ImageNet accuracy ~76.5%), we can create ensembles with 83.4% (+7% boost). Surprisingly, we find that even significantly low-accuracy models can be used to improve high-accuracy models. Finally, we show diverging training methodology yield representations that capture overlapping (but not supersetting) feature sets which, when combined, lead to increased downstream performance.

preprint2020arXiv

A Fourier Perspective on Model Robustness in Computer Vision

Achieving robustness to distributional shift is a longstanding and challenging goal of computer vision. Data augmentation is a commonly used approach for improving robustness, however robustness gains are typically not uniform across corruption types. Indeed increasing performance in the presence of random noise is often met with reduced performance on other corruptions such as contrast change. Understanding when and why these sorts of trade-offs occur is a crucial step towards mitigating them. Towards this end, we investigate recently observed trade-offs caused by Gaussian data augmentation and adversarial training. We find that both methods improve robustness to corruptions that are concentrated in the high frequency domain while reducing robustness to corruptions that are concentrated in the low frequency domain. This suggests that one way to mitigate these trade-offs via data augmentation is to use a more diverse set of augmentations. Towards this end we observe that AutoAugment, a recently proposed data augmentation policy optimized for clean accuracy, achieves state-of-the-art robustness on the CIFAR-10-C benchmark.

preprint2020arXiv

Affinity and Diversity: Quantifying Mechanisms of Data Augmentation

Though data augmentation has become a standard component of deep neural network training, the underlying mechanism behind the effectiveness of these techniques remains poorly understood. In practice, augmentation policies are often chosen using heuristics of either distribution shift or augmentation diversity. Inspired by these, we seek to quantify how data augmentation improves model generalization. To this end, we introduce interpretable and easy-to-compute measures: Affinity and Diversity. We find that augmentation performance is predicted not by either of these alone but by jointly optimizing the two.

preprint2020arXiv

AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

Modern deep neural networks can achieve high accuracy when the training distribution and test distribution are identically distributed, but this assumption is frequently violated in practice. When the train and test distributions are mismatched, accuracy can plummet. Currently there are few techniques that improve robustness to unforeseen data shifts encountered during deployment. In this work, we propose a technique to improve the robustness and uncertainty estimates of image classifiers. We propose AugMix, a data processing technique that is simple to implement, adds limited computational overhead, and helps models withstand unforeseen corruptions. AugMix significantly improves robustness and uncertainty measures on challenging image classification benchmarks, closing the gap between previous methods and the best possible performance in some cases by more than half.

preprint2020arXiv

Kohn-Sham equations as regularizer: building prior knowledge into machine-learned physics

Including prior knowledge is important for effective machine learning models in physics, and is usually achieved by explicitly adding loss terms or constraints on model architectures. Prior knowledge embedded in the physics computation itself rarely draws attention. We show that solving the Kohn-Sham equations when training neural networks for the exchange-correlation functional provides an implicit regularization that greatly improves generalization. Two separations suffice for learning the entire one-dimensional H$_2$ dissociation curve within chemical accuracy, including the strongly correlated region. Our models also generalize to unseen types of molecules and overcome self-interaction error.

preprint2020arXiv

Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation

Supervised learning in large discriminative models is a mainstay for modern computer vision. Such an approach necessitates investing in large-scale human-annotated datasets for achieving state-of-the-art results. In turn, the efficacy of supervised learning may be limited by the size of the human annotated dataset. This limitation is particularly notable for image segmentation tasks, where the expense of human annotation is especially large, yet large amounts of unlabeled data may exist. In this work, we ask if we may leverage semi-supervised learning in unlabeled video sequences and extra images to improve the performance on urban scene segmentation, simultaneously tackling semantic, instance, and panoptic segmentation. The goal of this work is to avoid the construction of sophisticated, learned architectures specific to label propagation (e.g., patch matching and optical flow). Instead, we simply predict pseudo-labels for the unlabeled data and train subsequent models with both human-annotated and pseudo-labeled data. The procedure is iterated for several times. As a result, our Naive-Student model, trained with such simple yet effective iterative semi-supervised learning, attains state-of-the-art results at all three Cityscapes benchmarks, reaching the performance of 67.8% PQ, 42.6% AP, and 85.2% mIOU on the test set. We view this work as a notable step towards building a simple procedure to harness unlabeled video sequences and extra images to surpass state-of-the-art performance on core computer vision tasks.

preprint2020arXiv

ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring

We improve the recently-proposed "MixMatch" semi-supervised learning algorithm by introducing two new techniques: distribution alignment and augmentation anchoring. Distribution alignment encourages the marginal distribution of predictions on unlabeled data to be close to the marginal distribution of ground-truth labels. Augmentation anchoring feeds multiple strongly augmented versions of an input into the model and encourages each output to be close to the prediction for a weakly-augmented version of the same input. To produce strong augmentations, we propose a variant of AutoAugment which learns the augmentation policy while the model is being trained. Our new algorithm, dubbed ReMixMatch, is significantly more data-efficient than prior work, requiring between $5\times$ and $16\times$ less data to reach the same accuracy. For example, on CIFAR-10 with 250 labeled examples we reach $93.73\%$ accuracy (compared to MixMatch's accuracy of $93.58\%$ with $4{,}000$ examples) and a median accuracy of $84.92\%$ with just four labels per class. We make our code and data open-source at https://github.com/google-research/remixmatch.

preprint2020arXiv

Unifying framework for strong and fragile liquids via machine learning: a study of liquid silica

The fragility of a glassforming liquid characterizes how rapidly its relaxation dynamics slow down with cooling. The viscosity of strong liquids follows an Arrhenius law with a temperature-independent barrier height to rearrangements responsible for relaxation, whereas fragile liquids experience a much faster increase in their dynamics, suggesting a barrier height that increases with decreasing temperature. Strong glassformers are typically network glasses, while fragile glassformers are typically molecular or hard-sphere-like. As a result of these differences at the microscopic level, strong and fragile glassformers are usually treated separately from a theoretical point of view. Silica is the archetypal strong glassformer at low temperatures, but also exhibits a mysterious strong-to-fragile crossover at higher temperatures. Here we show that softness, a structure-based machine learned parameter that has previously been applied to fragile glassformers provides a useful description of model liquid silica in the strong and fragile regimes, and through the strong-to-fragile crossover. Just as for fragile glassformers, the relationship between softness and dynamics is invariant and Arrhenius in all regimes, but the average softness changes with temperature. The strong-to-fragile crossover in silica is not due to a sudden, qualitative change in structure, but can be explained by a simple Arrhenius form with a continuously and linearly changing local structure. Our results unify the study of liquid silica under a single simple conceptual picture.

preprint2015arXiv

A structural approach to relaxation in glassy liquids

When a liquid freezes, a change in the local atomic structure marks the transition to the crystal. When a liquid is cooled to form a glass, however, no noticeable structural change marks the glass transition. Indeed, characteristic features of glassy dynamics that appear below an onset temperature, T_0, are qualitatively captured by mean field theory, which assumes uniform local structure at all temperatures. Even studies of more realistic systems have found only weak correlations between structure and dynamics. This raises the question: is structure important to glassy dynamics in three dimensions? Here, we answer this question affirmatively by using machine learning methods to identify a new field, that we call softness, which characterizes local structure and is strongly correlated with rearrangement dynamics. We find that the onset of glassy dynamics at T_0 is marked by the onset of correlations between softness (i.e. structure) and dynamics. Moreover, we use softness to construct a simple model of slow glassy relaxation that is in excellent agreement with our simulation results, showing that a theory of the evolution of softness in time would constitute a theory of glassy dynamics.

preprint2015arXiv

High-Temperature Quantum Anomalous Hall Effect in n-p Codoped Topological Insulators

The quantum anomalous Hall effect (QAHE) is a fundamental quantum transport phenomenon that manifests as a quantized transverse conductance in response to a longitudinally applied electric field in the absence of an external magnetic field, and promises to have immense application potentials in future dissipation-less quantum electronics. Here we present a novel kinetic pathway to realize the QAHE at high temperatures by $n$-$p$ codoping of three-dimensional topological insulators. We provide proof-of-principle numerical demonstration of this approach using vanadium-iodine (V-I) codoped Sb$_2$Te$_3$ and demonstrate that, strikingly, even at low concentrations of $\sim$2\% V and $\sim$1\% I, the system exhibits a quantized Hall conductance, the tell-tale hallmark of QAHE, at temperatures of at least $\sim$ 50 Kelvin, which is three orders of magnitude higher than the typical temperatures at which it has been realized so far. The proposed approach is conceptually general and may shed new light in experimental realization of high-temperature QAHE.

preprint2014arXiv

Identifying structural flow defects in disordered solids using machine learning methods

We use machine learning methods on local structure to identify flow defects - or regions susceptible to rearrangement - in jammed and glassy systems. We apply this method successfully to two disparate systems: a two dimensional experimental realization of a granular pillar under compression, and a Lennard-Jones glass in both two and three dimensions above and below its glass transition temperature. We also identify characteristics of flow defects that differentiate them from the rest of the sample. Our results show it is possible to discern subtle structural features responsible for heterogeneous dynamics observed across a broad range of disordered materials.

Ekin D. Cubuk

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

No One Representation to Rule Them All: Overlapping Features of Training Methods

A Fourier Perspective on Model Robustness in Computer Vision

Affinity and Diversity: Quantifying Mechanisms of Data Augmentation

AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

Kohn-Sham equations as regularizer: building prior knowledge into machine-learned physics

Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation

ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring

Unifying framework for strong and fragile liquids via machine learning: a study of liquid silica

A structural approach to relaxation in glassy liquids

High-Temperature Quantum Anomalous Hall Effect in n-p Codoped Topological Insulators

Identifying structural flow defects in disordered solids using machine learning methods