Title
Understanding Performance Concerns in the API Documentation of Data Science Libraries
Abstract
The development of efficient data science applications is often impeded by unbearably long execution time and rapid RAM exhaustion. Since API documentation is the primary information source for troubleshooting, we investigate how performance concerns are documented in popular data science libraries. Our quantitative results reveal the prevalence of data science APIs that are documented in performance-related context and the infrequent maintenance activities on such documentation. Our qualitative analyses further reveal that crowd documentation like Stack Overflow and GitHub are highly complementary to official documentation in terms of the API coverage, the knowledge distribution, as well as the specific information conveyed through performance-related content. Data science practitioners could benefit from our findings by learning a more targeted search strategy for resolving performance issues. Researchers can be more assured of the advantages of integrating both the official and the crowd documentation to achieve a holistic view on the performance concerns in data science development.
Year
DOI
Venue
2020
10.1145/3324884.3416543
2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)
Keywords
DocType
ISSN
API documentation,performance,data science,empirical study
Conference
1938-4300
ISBN
Citations 
PageRank 
978-1-7281-7281-1
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Yida Tao11386.29
Jiefang Jiang200.34
Yepang Liu341524.58
Zhiwu Xu45811.32
Shengchao Qin571162.81