Title
CheckCell: data debugging for spreadsheets
Abstract
Testing and static analysis can help root out bugs in programs, but not in data. This paper introduces data debugging, an approach that combines program analysis and statistical analysis to automatically find potential data errors. Since it is impossible to know a priori whether data are erroneous, data debugging instead locates data that has a disproportionate impact on the computation. Such data is either very important, or wrong. Data debugging is especially useful in the context of data-intensive programming environments that intertwine data with programs in the form of queries or formulas. We present the first data debugging tool, CheckCell, an add-in for Microsoft Excel. CheckCell identifies cells that have an unusually high impact on the spreadsheet's computations. We show that CheckCell is both analytically and empirically fast and effective. We show that it successfully finds injected typographical errors produced by a generative model trained with data entry from 169,112 Mechanical Turk tasks. CheckCell is more precise and efficient than standard outlier detection techniques. CheckCell also automatically identifies a key flaw in the infamous Reinhart and Rogoff spreadsheet.
Year
DOI
Venue
2014
10.1145/2660193.2660207
OOPSLA
Keywords
Field
DocType
debugging aids,errors,inputs,debugging,data-debugging,spreadsheets
Anomaly detection,Shotgun debugging,Programming language,Computer science,Static analysis,Program analysis,Typographical error,Debugging,Generative model,Algorithmic program debugging
Conference
Volume
Issue
ISSN
49
10
0362-1340
Citations 
PageRank 
References 
9
0.45
32
Authors
3
Name
Order
Citations
PageRank
Daniel W. Barowy1893.87
Dimitar Gochev290.45
Emery D. Berger3104855.87