Title
LAMP: data provenance for graph based machine learning algorithms through derivative computation
Abstract
Data provenance tracking determines the set of inputs related to a given output. It enables quality control and problem diagnosis in data engineering. Most existing techniques work by tracking program dependencies. They cannot quantitatively assess the importance of related inputs, which is critical to machine learning algorithms, in which an output tends to depend on a huge set of inputs while only some of them are of importance. In this paper, we propose LAMP, a provenance computation system for machine learning algorithms. Inspired by automatic differentiation (AD), LAMP quantifies the importance of an input for an output by computing the partial derivative. LAMP separates the original data processing and the more expensive derivative computation to different processes to achieve cost-effectiveness. In addition, it allows quantifying importance for inputs related to discrete behavior, such as control flow selection. The evaluation on a set of real world programs and data sets illustrates that LAMP produces more precise and succinct provenance than program dependence based techniques, with much less overhead. Our case studies demonstrate the potential of LAMP in problem diagnosis in data engineering.
Year
DOI
Venue
2017
10.1145/3106237.3106291
ESEC/SIGSOFT FSE
Keywords
Field
DocType
Data Provenance,Machine Learning,Debugging
Data mining,Data set,Data processing,Computer science,Automatic differentiation,Real-time computing,Partial derivative,Theoretical computer science,Information engineering,Artificial intelligence,Computation,Control flow,Algorithm,Machine learning,Debugging
Conference
ISBN
Citations 
PageRank 
978-1-4503-5105-8
5
0.43
References 
Authors
24
7
Name
Order
Citations
PageRank
Shiqing Ma11036.20
Yousra Aafer226413.36
Zhaogui Xu3414.70
Wen-Chuan Lee420320.36
Juan Zhai5678.56
Yingqi Liu6726.79
Xiangyu Zhang7425.14