Title
Veridical Data Science.
Abstract
Veridical data science extracts reliable and reproducible information from data, with an enriched technical language to communicate and evaluate empirical evidence in the context of human decisions and domain knowledge. Building and expanding on principles of statistics, machine learning, and the sciences, we propose the predictability, computability, and stability (PCS) framework forveridical data science. Our framework is comprised of both a workflow and documentation and aims to provide responsible, reliable, reproducible, and transparent results across the entire data science life cycle. Moreover, we propose the PDR desiderata for interpretable machine learning as part of veridical data science (with PDR standing for predictive accuracy, predictive accuracy and relevancy to a human audience and a particular domain problem). The PCS framework will be illustrated through the development of the DeepTune framework for characterizing V4 neurons. DeepTune builds predictive models using DNNs and ridge regression and applies the stability principle to obtain stable interpretations of 18 predictive models. Finally, a general DNN interpretaion method based on contexual decomposition (CD) will be discussed with applications to sentiment analysis and cosmological parameter estimation.
Year
DOI
Venue
2020
10.1145/3336191.3372191
WSDM '20: The Thirteenth ACM International Conference on Web Search and Data Mining Houston TX USA February, 2020
Keywords
DocType
ISBN
data science, predictability, computability, stability (PCS), interpretable machine learning (PDR), iterative random forests (iRF), contextual decomposition (CD) for deep neural nets (DNNs)
Conference
978-1-4503-6822-3
Citations 
PageRank 
References 
0
0.34
0
Authors
1
Name
Order
Citations
PageRank
Bin Yu11984241.03