Title
Demystifying “bad” error messages in data science libraries
Abstract
ABSTRACTError messages are critical starting points for debugging. Unfortunately, they seem to be notoriously cryptic, confusing, and uninformative. Yet, it still remains a mystery why error messages receive such bad reputations, especially given that they are merely very short pieces of natural language text. In this paper, we empirically demystify the causes and fixes of "bad" error messages, by qualitatively studying 201 Stack Overflow threads and 335 GitHub issues. We specifically focus on error messages encountered in data science development, which is an increasingly important but not well studied domain. We found that the causes of "bad" error messages are far more complicated than poor phrasing or flawed articulation of error message content. Many error messages are inherently and inevitably misleading or uninformative, since libraries do not know user intentions and cannot "see" external errors. Fixes to error-message-related issues mostly involve source code changes, while exclusive message content updates only take up a small portion. In addition, whether an error message is informative or helpful is not always clear-cut; even error messages that clearly pinpoint faults and resolutions can still cause confusion for certain users. These findings thus call for a more in-depth investigation on how error messages should be evaluated and improved in the future.
Year
DOI
Venue
2021
10.1145/3468264.3468560
Foundations of Software Engineering
Keywords
DocType
Citations 
Error message, debugging aid, data science, empirical study
Conference
0
PageRank 
References 
Authors
0.34
20
6
Name
Order
Citations
PageRank
Yida Tao11386.29
Zhihui Chen200.34
Yepang Liu341524.58
Jifeng Xuan453928.76
Zhiwu Xu55811.32
Shengchao Qin671162.81