Source author record

Aleksander Madry

Aleksander Madry appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Data Structures and Algorithms Computer Vision Computer Science and Game Theory Robotics Computational Complexity cs.CY Discrete Mathematics Neural and Evolutionary Computing Neurons and Cognition

Catalog footprint

What is connected

23works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

User Strategization and Trustworthy Algorithms

Many human-facing algorithms -- including those that power recommender systems or hiring decision tools -- are trained on data provided by their users. The developers of these algorithms commonly adopt the assumption that the data generating process is exogenous: that is, how a user reacts to a given prompt (e.g., a recommendation or hiring suggestion) depends on the prompt and not on the algorithm that generated it. For example, the assumption that a person's behavior follows a ground-truth distribution is an exogeneity assumption. In practice, when algorithms interact with humans, this assumption rarely holds because users can be strategic. Recent studies document, for example, TikTok users changing their scrolling behavior after learning that TikTok uses it to curate their feed, and Uber drivers changing how they accept and cancel rides in response to changes in Uber's algorithm. Our work studies the implications of this strategic behavior by modeling the interactions between a user and their data-driven platform as a repeated, two-player game. We first find that user strategization can actually help platforms in the short term. We then show that it corrupts platforms' data and ultimately hurts their ability to make counterfactual decisions. We connect this phenomenon to user trust, and show that designing trustworthy algorithms can go hand in hand with accurate estimation. Finally, we provide a formalization of trustworthiness that inspires potential interventions.

preprint2022arXiv

A Data-Based Perspective on Transfer Learning

It is commonly believed that in transfer learning including more pre-training data translates into better performance. However, recent evidence suggests that removing data from the source dataset can actually help too. In this work, we take a closer look at the role of the source dataset's composition in transfer learning and present a framework for probing its impact on downstream performance. Our framework gives rise to new capabilities such as pinpointing transfer learning brittleness as well as detecting pathologies such as data-leakage and the presence of misleading examples in the source dataset. In particular, we demonstrate that removing detrimental datapoints identified by our framework improves transfer learning performance from ImageNet on a variety of target tasks. Code is available at https://github.com/MadryLab/data-transfer

preprint2022arXiv

Adversarially trained neural representations may already be as robust as corresponding biological neural representations

Visual systems of primates are the gold standard of robust perception. There is thus a general belief that mimicking the neural representations that underlie those systems will yield artificial visual systems that are adversarially robust. In this work, we develop a method for performing adversarial visual attacks directly on primate brain activity. We then leverage this method to demonstrate that the above-mentioned belief might not be well founded. Specifically, we report that the biological neurons that make up visual systems of primates exhibit susceptibility to adversarial perturbations that is comparable in magnitude to existing (robustly trained) artificial neural networks.

preprint2022arXiv

Combining Diverse Feature Priors

To improve model generalization, model designers often restrict the features that their models use, either implicitly or explicitly. In this work, we explore the design space of leveraging such feature priors by viewing them as distinct perspectives on the data. Specifically, we find that models trained with diverse sets of feature priors have less overlapping failure modes, and can thus be combined more effectively. Moreover, we demonstrate that jointly training such models on additional (unlabeled) data allows them to correct each other's mistakes, which, in turn, leads to better generalization and resilience to spurious correlations. Code available at https://github.com/MadryLab/copriors

preprint2022arXiv

Datamodels: Predicting Predictions from Training Data

We present a conceptual framework, datamodeling, for analyzing the behavior of a model class in terms of the training data. For any fixed "target" example $x$, training set $S$, and learning algorithm, a datamodel is a parameterized function $2^S \to \mathbb{R}$ that for any subset of $S' \subset S$ -- using only information about which examples of $S$ are contained in $S'$ -- predicts the outcome of training a model on $S'$ and evaluating on $x$. Despite the potential complexity of the underlying process being approximated (e.g., end-to-end training and evaluation of deep neural networks), we show that even simple linear datamodels can successfully predict model outputs. We then demonstrate that datamodels give rise to a variety of applications, such as: accurately predicting the effect of dataset counterfactuals; identifying brittle predictions; finding semantically similar examples; quantifying train-test leakage; and embedding data into a well-behaved and feature-rich representation space. Data for this paper (including pre-computed datamodels as well as raw predictions from four million trained deep neural networks) is available at https://github.com/MadryLab/datamodels-data .

preprint2022arXiv

When does Bias Transfer in Transfer Learning?

Using transfer learning to adapt a pre-trained "source model" to a downstream "target task" can dramatically increase performance with seemingly no downside. In this work, we demonstrate that there can exist a downside after all: bias transfer, or the tendency for biases of the source model to persist even after adapting the model to the target class. Through a combination of synthetic and natural experiments, we show that bias transfer both (a) arises in realistic settings (such as when pre-training on ImageNet or other standard datasets) and (b) can occur even when the target dataset is explicitly de-biased. As transfer-learned models are increasingly deployed in the real world, our work highlights the importance of understanding the limitations of pre-trained source models. Code is available at https://github.com/MadryLab/bias-transfer

preprint2021arXiv

On Distinctive Properties of Universal Perturbations

We identify properties of universal adversarial perturbations (UAPs) that distinguish them from standard adversarial perturbations. Specifically, we show that targeted UAPs generated by projected gradient descent exhibit two human-aligned properties: semantic locality and spatial invariance, which standard targeted adversarial perturbations lack. We also demonstrate that UAPs contain significantly less signal for generalization than standard adversarial perturbations -- that is, UAPs leverage non-robust features to a smaller extent than standard adversarial perturbations.

preprint2020arXiv

A Closer Look at Deep Policy Gradients

We study how the behavior of deep policy gradient algorithms reflects the conceptual framework motivating their development. To this end, we propose a fine-grained analysis of state-of-the-art methods based on key elements of this framework: gradient estimation, value prediction, and optimization landscapes. Our results show that the behavior of deep policy gradient algorithms often deviates from what their motivating framework would predict: the surrogate objective does not match the true reward landscape, learned value estimators fail to fit the true value function, and gradient estimates poorly correlate with the "true" gradient. The mismatch between predicted and empirical behavior we uncover highlights our poor understanding of current methods, and indicates the need to move beyond current benchmark-centric evaluation methods.

preprint2020arXiv

BREEDS: Benchmarks for Subpopulation Shift

We develop a methodology for assessing the robustness of models to subpopulation shift---specifically, their ability to generalize to novel data subpopulations that were not observed during training. Our approach leverages the class structure underlying existing datasets to control the data subpopulations that comprise the training and test distributions. This enables us to synthesize realistic distribution shifts whose sources can be precisely controlled and characterized, within existing large-scale datasets. Applying this methodology to the ImageNet dataset, we create a suite of subpopulation shift benchmarks of varying granularity. We then validate that the corresponding shifts are tractable by obtaining human baselines for them. Finally, we utilize these benchmarks to measure the sensitivity of standard model architectures as well as the effectiveness of off-the-shelf train-time robustness interventions. Code and data available at https://github.com/MadryLab/BREEDS-Benchmarks .

preprint2020arXiv

From ImageNet to Image Classification: Contextualizing Progress on Benchmarks

Building rich machine learning datasets in a scalable manner often necessitates a crowd-sourced data collection pipeline. In this work, we use human studies to investigate the consequences of employing such a pipeline, focusing on the popular ImageNet dataset. We study how specific design choices in the ImageNet creation process impact the fidelity of the resulting dataset---including the introduction of biases that state-of-the-art models exploit. Our analysis pinpoints how a noisy data collection pipeline can lead to a systematic misalignment between the resulting benchmark and the real-world task it serves as a proxy for. Finally, our findings emphasize the need to augment our current model training and evaluation toolkit to take such misalignments into account. To facilitate further research, we release our refined ImageNet annotations at https://github.com/MadryLab/ImageNetMultiLabel.

preprint2020arXiv

Identifying Statistical Bias in Dataset Replication

Dataset replication is a useful tool for assessing whether improvements in test accuracy on a specific benchmark correspond to improvements in models' ability to generalize reliably. In this work, we present unintuitive yet significant ways in which standard approaches to dataset replication introduce statistical bias, skewing the resulting observations. We study ImageNet-v2, a replication of the ImageNet dataset on which models exhibit a significant (11-14%) drop in accuracy, even after controlling for a standard human-in-the-loop measure of data quality. We show that after correcting for the identified statistical bias, only an estimated $3.6\% \pm 1.5\%$ of the original $11.7\% \pm 1.0\%$ accuracy drop remains unaccounted for. We conclude with concrete recommendations for recognizing and avoiding bias in dataset replication. Code for our study is publicly available at http://github.com/MadryLab/dataset-replication-analysis .

preprint2020arXiv

Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO

We study the roots of algorithmic progress in deep policy gradient algorithms through a case study on two popular algorithms: Proximal Policy Optimization (PPO) and Trust Region Policy Optimization (TRPO). Specifically, we investigate the consequences of "code-level optimizations:" algorithm augmentations found only in implementations or described as auxiliary details to the core algorithm. Seemingly of secondary importance, such optimizations turn out to have a major impact on agent behavior. Our results show that they (a) are responsible for most of PPO's gain in cumulative reward over TRPO, and (b) fundamentally change how RL methods function. These insights show the difficulty and importance of attributing performance gains in deep reinforcement learning. Code for reproducing our results is available at https://github.com/MadryLab/implementation-matters .

preprint2020arXiv

Noise or Signal: The Role of Image Backgrounds in Object Recognition

We assess the tendency of state-of-the-art object recognition models to depend on signals from image backgrounds. We create a toolkit for disentangling foreground and background signal on ImageNet images, and find that (a) models can achieve non-trivial accuracy by relying on the background alone, (b) models often misclassify images even in the presence of correctly classified foregrounds--up to 87.5% of the time with adversarially chosen backgrounds, and (c) more accurate models tend to depend on backgrounds less. Our analysis of backgrounds brings us closer to understanding which correlations machine learning models use, and how they determine models' out of distribution performance.

preprint2020arXiv

The Two Regimes of Deep Network Training

Learning rate schedule has a major impact on the performance of deep learning models. Still, the choice of a schedule is often heuristical. We aim to develop a precise understanding of the effects of different learning rate schedules and the appropriate way to select them. To this end, we isolate two distinct phases of training, the first, which we refer to as the "large-step" regime, exhibits a rather poor performance from an optimization point of view but is the primary contributor to model generalization; the latter, "small-step" regime exhibits much more "convex-like" optimization behavior but used in isolation produces models that generalize poorly. We find that by treating these regimes separately-and em specializing our training algorithm to each one of them, we can significantly simplify learning rate schedules.

preprint2016arXiv

Computing Maximum Flow with Augmenting Electrical Flows

We present an $\tilde{O}\left(m^{\frac{10}{7}}U^{\frac{1}{7}}\right)$-time algorithm for the maximum $s$-$t$ flow problem and the minimum $s$-$t$ cut problem in directed graphs with $m$ arcs and largest integer capacity $U$. This matches the running time of the $\tilde{O}\left((mU)^{\frac{10}{7}}\right)$-time algorithm of Mądry (FOCS 2013) in the unit-capacity case, and improves over it, as well as over the $\tilde{O}\left(m \sqrt{n} \log U\right)$-time algorithm of Lee and Sidford (FOCS 2014), whenever $U$ is moderately large and the graph is sufficiently sparse. By well-known reductions, this also gives similar running time improvements for the maximum-cardinality bipartite $b$-matching problem. One of the advantages of our algorithm is that it is significantly simpler than the ones presented in Madry (FOCS 2013) and Lee and Sidford (FOCS 2014). In particular, these algorithms employ a sophisticated interior-point method framework, while our algorithm is cast directly in the classic augmenting path setting that almost all the combinatorial maximum flow algorithms use. At a high level, the presented algorithm takes a primal dual approach in which each iteration uses electrical flows computations both to find an augmenting $s$-$t$ flow in the current residual graph and to update the dual solution. We show that by maintain certain careful coupling of these primal and dual solutions we are always guaranteed to make significant progress.

preprint2016arXiv

Negative-Weight Shortest Paths and Unit Capacity Minimum Cost Flow in $\tilde{O}(m^{10/7} \log W)$ Time

In this paper, we study a set of combinatorial optimization problems on weighted graphs: the shortest path problem with negative weights, the weighted perfect bipartite matching problem, the unit-capacity minimum-cost maximum flow problem and the weighted perfect bipartite $b$-matching problem under the assumption that $\Vert b\Vert_1=O(m)$. We show that each one of these four problems can be solved in $\tilde{O}(m^{10/7}\log W)$ time, where $W$ is the absolute maximum weight of an edge in the graph, which gives the first in over 25 years polynomial improvement in their sparse-graph time complexity. At a high level, our algorithms build on the interior-point method-based framework developed by Madry (FOCS 2013) for solving unit-capacity maximum flow problem. We develop a refined way to analyze this framework, as well as provide new variants of the underlying preconditioning and perturbation techniques. Consequently, we are able to extend the whole interior-point method-based approach to make it applicable in the weighted graph regime.

preprint2013arXiv

Navigating Central Path with Electrical Flows: from Flows to Matchings, and Back

We present an $\tilde{O}(m^{10/7})=\tilde{O}(m^{1.43})$-time algorithm for the maximum s-t flow and the minimum s-t cut problems in directed graphs with unit capacities. This is the first improvement over the sparse-graph case of the long-standing $O(m \min(\sqrt{m},n^{2/3}))$ time bound due to Even and Tarjan [EvenT75]. By well-known reductions, this also establishes an $\tilde{O}(m^{10/7})$-time algorithm for the maximum-cardinality bipartite matching problem. That, in turn, gives an improvement over the celebrated celebrated $O(m \sqrt{n})$ time bound of Hopcroft and Karp [HK73] whenever the input graph is sufficiently sparse.

preprint2012arXiv

Runtime Guarantees for Regression Problems

We study theoretical runtime guarantees for a class of optimization problems that occur in a wide variety of inference problems. these problems are motivated by the lasso framework and have applications in machine learning and computer vision. Our work shows a close connection between these problems and core questions in algorithmic graph theory. While this connection demonstrates the difficulties of obtaining runtime guarantees, it also suggests an approach of using techniques originally developed for graph algorithms. We then show that most of these problems can be formulated as a grouped least squares problem, and give efficient algorithms for this formulation. Our algorithms rely on routines for solving quadratic minimization problems, which in turn are equivalent to solving linear systems. Finally we present some experimental results on applying our approximation algorithm to image processing problems.

preprint2011arXiv

A Polylogarithmic-Competitive Algorithm for the k-Server Problem

We give the first polylogarithmic-competitive randomized online algorithm for the $k$-server problem on an arbitrary finite metric space. In particular, our algorithm achieves a competitive ratio of O(log^3 n log^2 k log log n) for any metric space on n points. Our algorithm improves upon the deterministic (2k-1)-competitive algorithm of Koutsoupias and Papadimitriou [J.ACM'95] whenever n is sub-exponential in k.

preprint2010arXiv

Electrical Flows, Laplacian Systems, and Faster Approximation of Maximum Flow in Undirected Graphs

We introduce a new approach to computing an approximately maximum s-t flow in a capacitated, undirected graph. This flow is computed by solving a sequence of electrical flow problems. Each electrical flow is given by the solution of a system of linear equations in a Laplacian matrix, and thus may be approximately computed in nearly-linear time. Using this approach, we develop the fastest known algorithm for computing approximately maximum s-t flows. For a graph having n vertices and m edges, our algorithm computes a (1-ε)-approximately maximum s-t flow in time \tilde{O}(mn^{1/3} ε^{-11/3}). A dual version of our approach computes a (1+ε)-approximately minimum s-t cut in time \tilde{O}(m+n^{4/3}\eps^{-8/3}), which is the fastest known algorithm for this problem as well. Previously, the best dependence on m and n was achieved by the algorithm of Goldberg and Rao (J. ACM 1998), which can be used to compute approximately maximum s-t flows in time \tilde{O}(m\sqrt{n}ε^{-1}), and approximately minimum s-t cuts in time \tilde{O}(m+n^{3/2}ε^{-3}).

preprint2010arXiv

Fast Approximation Algorithms for Cut-based Problems in Undirected Graphs

We present a general method of designing fast approximation algorithms for cut-based minimization problems in undirected graphs. In particular, we develop a technique that given any such problem that can be approximated quickly on trees, allows approximating it almost as quickly on general graphs while only losing a poly-logarithmic factor in the approximation guarantee. To illustrate the applicability of our paradigm, we focus our attention on the undirected sparsest cut problem with general demands and the balanced separator problem. By a simple use of our framework, we obtain poly-logarithmic approximation algorithms for these problems that run in time close to linear. The main tool behind our result is an efficient procedure that decomposes general graphs into simpler ones while approximately preserving the cut-flow structure. This decomposition is inspired by the cut-based graph decomposition of Räcke that was developed in the context of oblivious routing schemes, as well as, by the construction of the ultrasparsifiers due to Spielman and Teng that was employed to preconditioning symmetric diagonally-dominant matrices.

preprint2010arXiv

Faster Approximation Schemes for Fractional Multicommodity Flow Problems via Dynamic Graph Algorithms

We combine the work of Garg and Konemann, and Fleischer with ideas from dynamic graph algorithms to obtain faster (1-eps)-approximation schemes for various versions of the multicommodity flow problem. In particular, if eps is moderately small and the size of every number used in the input instance is polynomially bounded, the running times of our algorithms match - up to poly-logarithmic factors and some provably optimal terms - the Omega(mn) flow-decomposition barrier for single-commodity flow.

preprint2008arXiv

A 7/9 - Approximation Algorithm for the Maximum Traveling Salesman Problem

We give a 7/9 - Approximation Algorithm for the Maximum Traveling Salesman Problem.

Aleksander Madry

What is connected

Connect this record

See the researcher in context

Building this map preview

23 published item(s)

User Strategization and Trustworthy Algorithms

A Data-Based Perspective on Transfer Learning

Adversarially trained neural representations may already be as robust as corresponding biological neural representations

Combining Diverse Feature Priors

Datamodels: Predicting Predictions from Training Data

When does Bias Transfer in Transfer Learning?

On Distinctive Properties of Universal Perturbations

A Closer Look at Deep Policy Gradients

BREEDS: Benchmarks for Subpopulation Shift

From ImageNet to Image Classification: Contextualizing Progress on Benchmarks

Identifying Statistical Bias in Dataset Replication

Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO

Noise or Signal: The Role of Image Backgrounds in Object Recognition

The Two Regimes of Deep Network Training

Computing Maximum Flow with Augmenting Electrical Flows

Negative-Weight Shortest Paths and Unit Capacity Minimum Cost Flow in $\tilde{O}(m^{10/7} \log W)$ Time

Navigating Central Path with Electrical Flows: from Flows to Matchings, and Back

Runtime Guarantees for Regression Problems

A Polylogarithmic-Competitive Algorithm for the k-Server Problem

Electrical Flows, Laplacian Systems, and Faster Approximation of Maximum Flow in Undirected Graphs

Fast Approximation Algorithms for Cut-based Problems in Undirected Graphs

Faster Approximation Schemes for Fractional Multicommodity Flow Problems via Dynamic Graph Algorithms

A 7/9 - Approximation Algorithm for the Maximum Traveling Salesman Problem