Title
Detecting Bias in Black-Box Models Using Transparent Model Distillation.
Abstract
Black-box risk scoring models permeate our lives, yet are typically proprietary and opaque. We propose a transparent model distillation approach to detect bias in such models. Model distillation was originally designed to distill knowledge from a large, complex teacher model to a faster, simpler student model without significant loss in prediction accuracy. We add a third restriction - transparency. In this paper we use data sets that contain two labels to train on: the risk score predicted by a black-box model, as well as the actual outcome the risk score was intended to predict. This allows us to compare models that predict each label. For a particular class of student models - interpretable tree additive models with pairwise interactions (GA2Ms) - we provide confidence intervals for the difference between the risk score and actual outcome models. This presents a new method for detecting bias in black-box risk scores by assessing if contributions of protected features to the risk score are statistically different from their contributions to the actual outcome.
Year
Venue
Field
2017
arXiv: Machine Learning
Framingham Risk Score,Black box (phreaking),Pairwise comparison,Data mining,Data set,Additive model,Distillation,Artificial intelligence,Confidence interval,Mathematics,Machine learning
DocType
Volume
Citations 
Journal
abs/1710.06169
4
PageRank 
References 
Authors
0.43
21
4
Name
Order
Citations
PageRank
Sarah Tan193.68
Rich Caruana24503655.71
Giles Hooker35913.40
Yin Lou450628.82