Title
Machine Learning Explainability and Robustness: Connected at the Hip
Abstract
ABSTRACTThis tutorial examines the synergistic relationship between explainability methods for machine learning and a significant problem related to model quality: robustness against adversarial perturbations. We begin with a broad overview of approaches to explainable AI, before narrowing our focus to post-hoc explanation methods for predictive models. We discuss perspectives on what constitutes a "good'' explanation in various settings, with an emphasis on axiomatic justifications for various explanation methods. In doing so, we will highlight the importance of an explanation method's faithfulness to the target model, as this property allows one to distinguish between explanations that are unintelligible because of the method used to produce them, and cases where a seemingly poor explanation points to model quality issues. Next, we introduce concepts surrounding adversarial robustness, including adversarial attacks as well as a range of corresponding state-of-the-art defenses. Finally, building on the knowledge presented thus far, we present key insights from the recent literature on the connections between explainability and robustness, showing that many commonly-perceived explainability issues may be caused by non-robust model behavior. Accordingly, a careful study of adversarial examples and robustness can lead to models whose explanations better appeal to human intuition and domain knowledge.
Year
DOI
Venue
2021
10.1145/3447548.3470806
Knowledge Discovery and Data Mining
Keywords
DocType
Citations 
Explainability, Machine Learning, Robustness
Conference
0
PageRank 
References 
Authors
0.34
0
6
Name
Order
Citations
PageRank
Anupam Datta1161787.21
Matt Fredrikson297248.56
Klas Leino303.04
Kaiji Lu400.68
Shayak Sen5968.89
Wang Zifan662.78