Demystifying “bad” error messages in data science libraries - Citegraph

Paper Info

Title
Demystifying “bad” error messages in data science libraries

Abstract
ABSTRACTError messages are critical starting points for debugging. Unfortunately, they seem to be notoriously cryptic, confusing, and uninformative. Yet, it still remains a mystery why error messages receive such bad reputations, especially given that they are merely very short pieces of natural language text. In this paper, we empirically demystify the causes and fixes of "bad" error messages, by qualitatively studying 201 Stack Overflow threads and 335 GitHub issues. We specifically focus on error messages encountered in data science development, which is an increasingly important but not well studied domain. We found that the causes of "bad" error messages are far more complicated than poor phrasing or flawed articulation of error message content. Many error messages are inherently and inevitably misleading or uninformative, since libraries do not know user intentions and cannot "see" external errors. Fixes to error-message-related issues mostly involve source code changes, while exclusive message content updates only take up a small portion. In addition, whether an error message is informative or helpful is not always clear-cut; even error messages that clearly pinpoint faults and resolutions can still cause confusion for certain users. These findings thus call for a more in-depth investigation on how error messages should be evaluated and improved in the future.

Year	DOI	Venue
2021	10.1145/3468264.3468560	Foundations of Software Engineering
Keywords	DocType	Citations
Error message, debugging aid, data science, empirical study	Conference	0
PageRank	References	Authors
0.34	20	6

Authors (6 rows)

Cited by (0 rows)

References (20 rows)

Name	Order	Citations	PageRank
Yida Tao	1	138	6.29
Zhihui Chen	2	0	0.34
Yepang Liu	3	415	24.58
Jifeng Xuan	4	539	28.76
Zhiwu Xu	5	58	11.32
Shengchao Qin	6	711	62.81

1