Title
Interpreting Black Box Models with Statistical Guarantees.
Abstract
While many methods for interpreting machine learning models have been proposed, they are frequently ad hoc, difficult to evaluate, and come with no statistical guarantees on the error rate. This is especially problematic in scientific domains, where interpretations must be accurate and reliable. In this paper, we cast black box model interpretation as a hypothesis testing problem. The task is to discover features by testing whether the model prediction is significantly different from what would be expected if the features were replaced with randomly-sampled counterfactuals. We derive a multiple hypothesis testing framework for finding important features that enables control over the false discovery rate. We propose two testing methods, as well as analogs of one-sided and two-sided tests. In simulation, the methods have high power and compare favorably against existing interpretability methods. When applied to vision and language models, the framework selects features that intuitively explain model predictions.
Year
Venue
DocType
2019
arXiv: Machine Learning
Journal
Volume
Citations 
PageRank 
abs/1904.00045
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Collin Burns101.69
Jesse Thomason213914.60
Wesley Tansey3103.08