Interpreting Black Box Models with Statistical Guarantees. - Citegraph

Paper Info

Title
Interpreting Black Box Models with Statistical Guarantees.

Abstract
While many methods for interpreting machine learning models have been proposed, they are frequently ad hoc, difficult to evaluate, and come with no statistical guarantees on the error rate. This is especially problematic in scientific domains, where interpretations must be accurate and reliable. In this paper, we cast black box model interpretation as a hypothesis testing problem. The task is to discover features by testing whether the model prediction is significantly different from what would be expected if the features were replaced with randomly-sampled counterfactuals. We derive a multiple hypothesis testing framework for finding important features that enables control over the false discovery rate. We propose two testing methods, as well as analogs of one-sided and two-sided tests. In simulation, the methods have high power and compare favorably against existing interpretability methods. When applied to vision and language models, the framework selects features that intuitively explain model predictions.

Year	Venue	DocType
2019	arXiv: Machine Learning	Journal
Volume	Citations	PageRank
abs/1904.00045	0	0.34
References	Authors
0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Collin Burns	1	0	1.69
Jesse Thomason	2	139	14.60
Wesley Tansey	3	10	3.08

1