Title
Explaining Outputs in Modern Data Analytics.
Abstract
We report on the design and implementation of a general framework for interactively explaining the outputs of modern data-parallel computations, including iterative data analytics. To produce explanations, existing works adopt a naive backward tracing approach which runs into known issues; naive backward tracing may identify: (i) too much information that is difficult to process, and (ii) not enough information to reproduce the output, which hinders the logical debugging of the program. The contribution of this work is twofold. First, we provide methods to effectively reduce the size of explanations based on the first occurrence of a record in an iterative computation. Second, we provide a general method for identifying explanations that are sufficient to reproduce the target output in arbitrary computations -- a problem for which no viable solution existed until now. We implement our approach on differential dataflow, a modern high-throughput, low-latency dataflow platform. We add a small (but extensible) set of rules to explain each of its data-parallel operators, and we implement these rules as differential dataflow operators themselves. This choice allows our implementation to inherit the performance characteristics of differential dataflow, and results in a system that efficiently computes and updates explanatory inputs even as the inputs of the reference computation change. We evaluate our system with various analytic tasks on real datasets, and we show that it produces concise explanations in tens of milliseconds, while remaining faster -- up to two orders of magnitude -- than even the best implementations that do not support explanations.
Year
DOI
Venue
2016
10.14778/2994509.2994530
PVLDB
Field
DocType
Volume
Data mining,Data analysis,Computer science,Implementation,Theoretical computer science,Dataflow,Operator (computer programming),Tracing,Database,Debugging,Computation
Journal
9
Issue
ISSN
Citations 
12
2150-8097
9
PageRank 
References 
Authors
0.44
25
4
Name
Order
Citations
PageRank
Zaheer Chothia1152.23
John Liagouris2729.04
Frank McSherry34289288.94
Timothy Roscoe43118299.48