Disentangled Attribution Curves for Interpreting Random Forests and Boosted Trees. - Citegraph

Paper Info

Title
Disentangled Attribution Curves for Interpreting Random Forests and Boosted Trees.

Abstract
Tree ensembles, such as random forests and AdaBoost, are ubiquitous machine learning models known for achieving strong predictive performance across a wide variety of domains. However, this strong performance comes at the cost of interpretability (i.e. users are unable to understand the relationships a trained random forest has learned and why it is making its predictions). In particular, it is challenging to understand how the contribution of a particular feature, or group of features, varies as their value changes. To address this, we introduce Disentangled Attribution Curves (DAC), a method to provide interpretations of tree ensemble methods in the form of (multivariate) feature importance curves. For a given variable, or group of variables, DAC plots the importance of a variable(s) as their value changes. We validate DAC on real data by showing that the curves can be used to increase the accuracy of logistic regression while maintaining interpretability, by including DAC as an additional feature. In simulation studies, DAC is shown to out-perform competing methods in the recovery of conditional expectations. Finally, through a case-study on the bike-sharing dataset, we demonstrate the use of DAC to uncover novel insights into a dataset.

Year	Venue	DocType
2019	arXiv: Machine Learning	Journal
Volume	Citations	PageRank
abs/1905.07631	0	0.34
References	Authors
0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Summer Devlin	1	0	0.34
Chandan Singh	2	26	4.57
W. James Murdoch	3	32	2.61
Bin Yu	4	1984	241.03

1