Researcher profile

Atul Prakash

Atul Prakash contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

GRAPHITE: Generating Automatic Physical Examples for Machine-Learning Attacks on Computer Vision Systems

This paper investigates an adversary's ease of attack in generating adversarial examples for real-world scenarios. We address three key requirements for practical attacks for the real-world: 1) automatically constraining the size and shape of the attack so it can be applied with stickers, 2) transform-robustness, i.e., robustness of a attack to environmental physical variations such as viewpoint and lighting changes, and 3) supporting attacks in not only white-box, but also black-box hard-label scenarios, so that the adversary can attack proprietary models. In this work, we propose GRAPHITE, an efficient and general framework for generating attacks that satisfy the above three key requirements. GRAPHITE takes advantage of transform-robustness, a metric based on expectation over transforms (EoT), to automatically generate small masks and optimize with gradient-free optimization. GRAPHITE is also flexible as it can easily trade-off transform-robustness, perturbation size, and query count in black-box settings. On a GTSRB model in a hard-label black-box setting, we are able to find attacks on all possible 1,806 victim-target class pairs with averages of 77.8% transform-robustness, perturbation size of 16.63% of the victim images, and 126K queries per pair. For digital-only attacks where achieving transform-robustness is not a requirement, GRAPHITE is able to find successful small-patch attacks with an average of only 566 queries for 92.2% of victim-target pairs. GRAPHITE is also able to find successful attacks using perturbations that modify small areas of the input image against PatchGuard, a recently proposed defense against patch-based attacks.

preprint2020arXiv

Analyzing the Interpretability Robustness of Self-Explaining Models

Recently, interpretable models called self-explaining models (SEMs) have been proposed with the goal of providing interpretability robustness. We evaluate the interpretability robustness of SEMs and show that explanations provided by SEMs as currently proposed are not robust to adversarial inputs. Specifically, we successfully created adversarial inputs that do not change the model outputs but cause significant changes in the explanations. We find that even though current SEMs use stable co-efficients for mapping explanations to output labels, they do not consider the robustness of the first stage of the model that creates interpretable basis concepts from the input, leading to non-robust explanations. Our work makes a case for future work to start examining how to generate interpretable basis concepts in a robust way.

preprint2020arXiv

Efficient Adversarial Training with Transferable Adversarial Examples

Adversarial training is an effective defense method to protect classification models against adversarial attacks. However, one limitation of this approach is that it can require orders of magnitude additional training time due to high cost of generating strong adversarial examples during training. In this paper, we first show that there is high transferability between models from neighboring epochs in the same training process, i.e., adversarial examples from one epoch continue to be adversarial in subsequent epochs. Leveraging this property, we propose a novel method, Adversarial Training with Transferable Adversarial Examples (ATTA), that can enhance the robustness of trained models and greatly improve the training efficiency by accumulating adversarial perturbations through epochs. Compared to state-of-the-art adversarial training methods, ATTA enhances adversarial accuracy by up to 7.2% on CIFAR10 and requires 12~14x less training time on MNIST and CIFAR10 datasets with comparable model robustness.

preprint2020arXiv

Towards Model-Agnostic Adversarial Defenses using Adversarially Trained Autoencoders

Adversarial machine learning is a well-studied field of research where an adversary causes predictable errors in a machine learning algorithm through precise manipulation of the input. Numerous techniques have been proposed to harden machine learning algorithms and mitigate the effect of adversarial attacks. Of these techniques, adversarial training, which augments the training data with adversarial samples, has proven to be an effective defense with respect to a certain class of attacks. However, adversarial training is computationally expensive and its improvements are limited to a single model. In this work, we take a first step toward creating a model-agnostic adversarial defense. We propose Adversarially-Trained Autoencoder Augmentation (AAA), the first model-agnostic adversarial defense that is robust against certain adaptive adversaries. We show that AAA allows us to achieve a partially model-agnostic defense by training a single autoencoder to protect multiple pre-trained classifiers; achieving adversarial performance on par or better than adversarial training without modifying the classifiers. Furthermore, we demonstrate that AAA can be used to create a fully model-agnostic defense for MNIST and Fashion MNIST datasets by improving the adversarial performance of a never before seen pre-trained classifier by at least 45% with no additional training. Finally, using a natural image corruption dataset, we show that our approach improves robustness to naturally corrupted images,which has been identified as strongly indicative of true adversarial robustness.

preprint2020arXiv

Understanding and Diagnosing Vulnerability under Adversarial Attacks

Deep Neural Networks (DNNs) are known to be vulnerable to adversarial attacks. Currently, there is no clear insight into how slight perturbations cause such a large difference in classification results and how we can design a more robust model architecture. In this work, we propose a novel interpretability method, InterpretGAN, to generate explanations for features used for classification in latent variables. Interpreting the classification process of adversarial examples exposes how adversarial perturbations influence features layer by layer as well as which features are modified by perturbations. Moreover, we design the first diagnostic method to quantify the vulnerability contributed by each layer, which can be used to identify vulnerable parts of model architectures. The diagnostic results show that the layers introducing more information loss tend to be more vulnerable than other layers. Based on the findings, our evaluation results on MNIST and CIFAR10 datasets suggest that average pooling layers, with lower information loss, are more robust than max pooling layers for the network architectures studied in this paper.