Title
AutoAttacker: A reinforcement learning approach for black-box adversarial attacks
Abstract
Recent research has shown that machine learning models are susceptible to adversarial examples, allowing attackers to trick a machine learning model into making a mistake and producing an incorrect output. Adversarial examples are commonly constructed or discovered by using gradient-based methods that require white-box access to the model. In most real-world AI system deployments, having complete access to the machine learning model is an unrealistic threat model. However, it is possible for an attacker to construct adversarial examples even in the black-box case - where we assume solely a query capability to the model - with a variety of approaches each with its advantages and shortcomings. We introduce AutoAttacker, a novel reinforcement learning framework where agents learn how to operate around the black-box model by querying it, to effectively extract the underlying decision behaviour, and to undermine it successfully. AutoAttacker is a first of kind framework that uses reinforcement learning and assumes nothing about the differentiability or structure of the underlying function and is thus robust towards common defenses like gradient obfuscation or adversarial training. Finally, without differentiable output, as in binary classification, most methods cease to operate and require either an approximation of the gradient, or another approach altogether. Our approach, however, maintains the capability to function when the output descriptiveness diminishes.
Year
DOI
Venue
2019
10.1109/EuroSPW.2019.00032
2019 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)
Keywords
Field
DocType
reinforcement-learning,black-box-attack,adversarial-machine-learning
Black box (phreaking),Binary classification,Mistake,Threat model,Computer science,Theoretical computer science,Adversarial machine learning,Artificial intelligence,Obfuscation,Adversarial system,Reinforcement learning
Conference
ISBN
Citations 
PageRank 
978-1-7281-3027-9
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Ilias Tsingenopoulos100.34
Davy Preuveneers270565.56
Wouter Joosen32898287.70