Title
Automated Identification of Libraries from Vulnerability Data: Can We Do Better?
Abstract
Software engineers depend heavily on software libraries and have to update their dependencies once vulnerabilities are found in them. Software Composition Analysis (SCA) helps developers identify vulnerable libraries used by an application. A key challenge is the identification of libraries related to a given reported vulnerability in the National Vulnerability Database (NVD), which may not ex-plicitly indicate the affected libraries. Recently, researchers have tried to address the problem of identifying the libraries from an NVD report by treating it as an extreme multi-label learning (XML) problem, characterized by its large number of possible labels and severe data sparsity. As input, the NVD report is provided, and as output, a set of relevant libraries is returned. In this work, we evaluated multiple XML techniques. While pre-vious work only evaluated a traditional XML technique, FastXML, we trained four other traditional XML models (DiSMEC, Parabel, Bonsai, ExtremeText) as well as two deep learning-based models (XML-CNN and LightXML). We compared both their effectiveness and the time cost of training and using the models for predictions. We find that other than DiSMEC and XML-CNN, recent XML mod-els outperform the FastXML model by 3%-10% in terms of F1-scores on Top-k (k=1,2,3) predictions. Furthermore, we observe significant improvements in both the training and prediction time of these XML models, with Bonsai and Parabel model achieving 627x and 589x faster training time and 12x faster prediction time from the FastXML baseline. We discuss the implications of our experimental results and highlight limitations for future work to address.
Year
DOI
Venue
2022
10.1145/3524610.3527893
2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)
Keywords
DocType
ISSN
multi-label classification,machine learning,vulnerability report
Conference
2643-7147
ISBN
Citations 
PageRank 
978-1-6654-5209-0
0
0.34
References 
Authors
23
7
Name
Order
Citations
PageRank
Stefanus A. Haryono111.70
Hong Jin Kang200.34
Abhishek Sharma300.68
Asankhaya Sharma471.76
Andrew Santosa500.34
Ang Ming Yi600.34
David Lo701.69