Understanding Catastrophic Overfitting in Single-step Adversarial Training

Hoki Kim, Woojin Lee, Jaewook Lee

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

59 Scopus citations

Abstract

Although fast adversarial training has demonstrated both robustness and efficiency, the problem of “catastrophic overfitting” has been observed. This is a phenomenon in which, during single-step adversarial training, robust accuracy against projected gradient descent (PGD) suddenly decreases to 0% after a few epochs, whereas robust accuracy against fast gradient sign method (FGSM) increases to 100%. In this paper, we demonstrate that catastrophic overfitting is very closely related to the characteristic of single-step adversarial training which uses only adversarial examples with the maximum perturbation, and not all adversarial examples in the adversarial direction, which leads to decision boundary distortion and a highly curved loss surface. Based on this observation, we propose a simple method that not only prevents catastrophic overfitting, but also overrides the belief that it is difficult to prevent multi-step adversarial attacks with single-step adversarial training.

Original languageEnglish
Title of host publication35th AAAI Conference on Artificial Intelligence, AAAI 2021
PublisherAssociation for the Advancement of Artificial Intelligence
Pages8119-8127
Number of pages9
ISBN (Electronic)9781713835974
DOIs
StatePublished - 2021
Event35th AAAI Conference on Artificial Intelligence, AAAI 2021 - Virtual, Online
Duration: 2 Feb 20219 Feb 2021

Publication series

Name35th AAAI Conference on Artificial Intelligence, AAAI 2021
Volume9B

Conference

Conference35th AAAI Conference on Artificial Intelligence, AAAI 2021
CityVirtual, Online
Period2/02/219/02/21

Fingerprint

Dive into the research topics of 'Understanding Catastrophic Overfitting in Single-step Adversarial Training'. Together they form a unique fingerprint.

Cite this