Abstract | ||
---|---|---|
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Background:</italic>
Cross project defect prediction (CPDP) recently gained considerable attention, yet there are no systematic efforts to analyse existing empirical evidence.
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Objective:</italic>
To synthesise literature to understand the state-of-the-art in CPDP with respect to metrics, models, data approaches, datasets and associated performances. Further, we aim to assess the performance of CPDP versus within project DP models.
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Method:</italic>
We conducted a systematic literature review. Results from primary studies are synthesised (thematic, meta-analysis) to answer research questions.
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Results:</italic>
We identified 30 primary studies passing quality assessment. Performance measures, except precision, vary with the choice of metrics. Recall, precision, f-measure, and AUC are the most common measures. Models based on Nearest-Neighbour and Decision Tree tend to perform well in CPDP, whereas the popular naïve Bayes yields average performance. Performance of ensembles varies greatly across f-measure and AUC. Data approaches address CPDP challenges using row/column processing, which improve CPDP in terms of recall at the cost of precision. This is observed in multiple occasions including the meta-analysis of CPDP versus WPDP. NASA and Jureczko datasets seem to favour CPDP over WPDP more frequently.
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Conclusion:</italic>
CPDP is still a challenge and requires more research before trustworthy applications can take place. We provide guidelines for further research. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/TSE.2017.2770124 | IEEE Transactions on Software Engineering |
Keywords | Field | DocType |
Object oriented modeling,Systematics,Measurement,Bibliographies,Predictive models,Context modeling,Data models | Decision tree,Data modeling,Data mining,Systematic review,Empirical evidence,Naive Bayes classifier,Computer science,Context model,Cross project,Meta-analysis | Journal |
Volume | Issue | ISSN |
45 | 2 | 0098-5589 |
Citations | PageRank | References |
22 | 0.57 | 0 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Seyedrebvar Hosseini | 1 | 22 | 0.91 |
Burak Turhan | 2 | 55 | 7.54 |
Dimuthu Gunarathna | 3 | 22 | 0.57 |