Abstract | ||
---|---|---|
Localizing type errors is challenging in languages with global type inference, as the type checker must make assumptions about what the programmer intended to do. We introduce Nate, a data-driven approach to error localization based on supervised learning. Nate analyzes a large corpus of training data -- pairs of ill-typed programs and their "fixed" versions -- to automatically learn a model of where the error is most likely to be found. Given a new ill-typed program, Nate executes the model to generate a list of potential blame assignments ranked by likelihood. We evaluate Nate by comparing its precision to the state of the art on a set of over 5,000 ill-typed OCaml programs drawn from two instances of an introductory programming course. We show that when the top-ranked blame assignment is considered, Nate's data-driven model is able to correctly predict the exact sub-expression that should be changed 72% of the time, 28 points higher than OCaml and 16 points higher than the state-of-the-art SHErrLoc tool. Furthermore, Nate's accuracy surpasses 85% when we consider the top two locations and reaches 91% if we consider the top three.
|
Year | DOI | Venue |
---|---|---|
2017 | 10.1145/3138818 | Proceedings of the ACM on Programming Languages |
Keywords | DocType | Volume |
fault localization,type errors | Journal | 1 |
Issue | ISSN | Citations |
OOPSLA | 2475-1421 | 2 |
PageRank | References | Authors |
0.35 | 30 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Eric L. Seidel | 1 | 50 | 5.15 |
Huma Sibghat | 2 | 2 | 0.35 |
Kamalika Chaudhuri | 3 | 1503 | 96.90 |
Westley Weimer | 4 | 3510 | 162.27 |
Ranjit Jhala | 5 | 2183 | 111.68 |