Title
A Systematic Literature Review and Meta-Analysis on Cross Project Defect Prediction
Abstract
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Background:</italic> Cross project defect prediction (CPDP) recently gained considerable attention, yet there are no systematic efforts to analyse existing empirical evidence. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Objective:</italic> To synthesise literature to understand the state-of-the-art in CPDP with respect to metrics, models, data approaches, datasets and associated performances. Further, we aim to assess the performance of CPDP versus within project DP models. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Method:</italic> We conducted a systematic literature review. Results from primary studies are synthesised (thematic, meta-analysis) to answer research questions. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Results:</italic> We identified 30 primary studies passing quality assessment. Performance measures, except precision, vary with the choice of metrics. Recall, precision, f-measure, and AUC are the most common measures. Models based on Nearest-Neighbour and Decision Tree tend to perform well in CPDP, whereas the popular naïve Bayes yields average performance. Performance of ensembles varies greatly across f-measure and AUC. Data approaches address CPDP challenges using row/column processing, which improve CPDP in terms of recall at the cost of precision. This is observed in multiple occasions including the meta-analysis of CPDP versus WPDP. NASA and Jureczko datasets seem to favour CPDP over WPDP more frequently. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Conclusion:</italic> CPDP is still a challenge and requires more research before trustworthy applications can take place. We provide guidelines for further research.
Year
DOI
Venue
2019
10.1109/TSE.2017.2770124
IEEE Transactions on Software Engineering
Keywords
Field
DocType
Object oriented modeling,Systematics,Measurement,Bibliographies,Predictive models,Context modeling,Data models
Decision tree,Data modeling,Data mining,Systematic review,Empirical evidence,Naive Bayes classifier,Computer science,Context model,Cross project,Meta-analysis
Journal
Volume
Issue
ISSN
45
2
0098-5589
Citations 
PageRank 
References 
22
0.57
0
Authors
3
Name
Order
Citations
PageRank
Seyedrebvar Hosseini1220.91
Burak Turhan2557.54
Dimuthu Gunarathna3220.57