Researcher profile

Mengting Xu

Mengting Xu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2022arXiv

InfoAT: Improving Adversarial Training Using the Information Bottleneck Principle

Adversarial training (AT) has shown excellent high performance in defending against adversarial examples. Recent studies demonstrate that examples are not equally important to the final robustness of models during AT, that is, the so-called hard examples that can be attacked easily exhibit more influence than robust examples on the final robustness. Therefore, guaranteeing the robustness of hard examples is crucial for improving the final robustness of the model. However, defining effective heuristics to search for hard examples is still difficult. In this article, inspired by the information bottleneck (IB) principle, we uncover that an example with high mutual information of the input and its associated latent representation is more likely to be attacked. Based on this observation, we propose a novel and effective adversarial training method (InfoAT). InfoAT is encouraged to find examples with high mutual information and exploit them efficiently to improve the final robustness of models. Experimental results show that InfoAT achieves the best robustness among different datasets and models in comparison with several state-of-the-art methods.

preprint2022arXiv

Scale-Invariant Adversarial Attack for Evaluating and Enhancing Adversarial Defenses

Efficient and effective attacks are crucial for reliable evaluation of defenses, and also for developing robust models. Projected Gradient Descent (PGD) attack has been demonstrated to be one of the most successful adversarial attacks. However, the effect of the standard PGD attack can be easily weakened by rescaling the logits, while the original decision of every input will not be changed. To mitigate this issue, in this paper, we propose Scale-Invariant Adversarial Attack (SI-PGD), which utilizes the angle between the features in the penultimate layer and the weights in the softmax layer to guide the generation of adversaries. The cosine angle matrix is used to learn angularly discriminative representation and will not be changed with the rescaling of logits, thus making SI-PGD attack to be stable and effective. We evaluate our attack against multiple defenses and show improved performance when compared with existing attacks. Further, we propose Scale-Invariant (SI) adversarial defense mechanism based on the cosine angle matrix, which can be embedded into the popular adversarial defenses. The experimental results show the defense method with our SI mechanism achieves state-of-the-art performance among multi-step and single-step defenses.

preprint2021arXiv

Improving the Certified Robustness of Neural Networks via Consistency Regularization

A range of defense methods have been proposed to improve the robustness of neural networks on adversarial examples, among which provable defense methods have been demonstrated to be effective to train neural networks that are certifiably robust to the attacker. However, most of these provable defense methods treat all examples equally during training process, which ignore the inconsistent constraint of certified robustness between correctly classified (natural) and misclassified examples. In this paper, we explore this inconsistency caused by misclassified examples and add a novel consistency regularization term to make better use of the misclassified examples. Specifically, we identified that the certified robustness of network can be significantly improved if the constraint of certified robustness on misclassified examples and correctly classified examples is consistent. Motivated by this discovery, we design a new defense regularization term called Misclassification Aware Adversarial Regularization (MAAR), which constrains the output probability distributions of all examples in the certified region of the misclassified example. Experimental results show that our proposed MAAR achieves the best certified robustness and comparable accuracy on CIFAR-10 and MNIST datasets in comparison with several state-of-the-art methods.

preprint2021arXiv

Towards Evaluating the Robustness of Deep Diagnostic Models by Adversarial Attack

Deep learning models (with neural networks) have been widely used in challenging tasks such as computer-aided disease diagnosis based on medical images. Recent studies have shown deep diagnostic models may not be robust in the inference process and may pose severe security concerns in clinical practice. Among all the factors that make the model not robust, the most serious one is adversarial examples. The so-called "adversarial example" is a well-designed perturbation that is not easily perceived by humans but results in a false output of deep diagnostic models with high confidence. In this paper, we evaluate the robustness of deep diagnostic models by adversarial attack. Specifically, we have performed two types of adversarial attacks to three deep diagnostic models in both single-label and multi-label classification tasks, and found that these models are not reliable when attacked by adversarial example. We have further explored how adversarial examples attack the models, by analyzing their quantitative classification results, intermediate features, discriminability of features and correlation of estimated labels for both original/clean images and those adversarial ones. We have also designed two new defense methods to handle adversarial examples in deep diagnostic models, i.e., Multi-Perturbations Adversarial Training (MPAdvT) and Misclassification-Aware Adversarial Training (MAAdvT). The experimental results have shown that the use of defense methods can significantly improve the robustness of deep diagnostic models against adversarial attacks.