Title
Pyramid Adversarial Training Improves ViT Performance
Abstract
Aggressive data augmentation is a key component of the strong generalization capabilities of Vision Transformer (ViT). One such data augmentation technique is adversarial training (AT); however, many prior works [28,45] have shown that this often results in poor clean accuracy. In this work, we present pyramid adversarial training (PyramidAT), a simple and effective technique to improve ViT's overall performance. We pair it with a “matched” Dropout and stochastic depth regularization, which adopts the same Dropout and stochastic depth configuration for the clean and adversarial samples. Similar to the improvements on CNNs by AdvProp [61] (not directly applicable to ViT), our pyramid adversarial training breaks the trade-off between in-distribution accuracy and out-of-distribution robustness for ViT and related architectures. It leads to 1.82% absolute improvement on ImageNet clean accuracy for the ViT-B model when trained only on ImageNet-1K data, while simultaneously boosting performance on 7 ImageNet ro-bustness metrics, by absolute numbers ranging from 1.76% to 15.68%. We set a new state-of-the-art for ImageNet-C (41.42 mCE), ImageNet-R (53.92%), and ImageNet-Sketch (41.04%) without extra data, using only the ViT-B/16 backbone and our pyramid adversarial training. Our code is publicly available at pyramidat.github.io.
Year
DOI
Venue
2022
10.1109/CVPR52688.2022.01306
IEEE Conference on Computer Vision and Pattern Recognition
Keywords
DocType
Volume
retrieval,categorization,Deep learning architectures and techniques, Adversarial attack and defense, Machine learning, Recognition: detection
Conference
2022
Issue
Citations 
PageRank 
1
0
0.34
References 
Authors
0
8
Name
Order
Citations
PageRank
Charles Herrmann100.34
Kyle Sargent200.34
Jiang Lu375537.16
Ramin Zabih412976982.19
Huiwen Chang5264.73
Ce Liu63347188.04
Dilip Krishnan700.34
Deqing Sun8106144.84