Title | ||
---|---|---|
Augmenting commit classification by using fine-grained source code changes and a pre-trained deep neural language model |
Abstract | ||
---|---|---|
Context: Analyzing software maintenance activities is very helpful in ensuring cost-effective evolution and development activities. The categorization of commits into maintenance tasks supports practitioners in making decisions about resource allocation and managing technical debt. Objective: In this paper, we propose to use a pre-trained language neural model, namely BERT (Bidirectional Encoder Representations from Transformers) for the classification of commits into three categories of mainte-nance tasks ? corrective, perfective and adaptive. The proposed commit classification approach will help the classifier better understand the context of each word in the commit message. Methods: We built a balanced dataset of 1793 labeled commits that we collected from publicly available datasets. We used several popular code change distillers to extract fine-grained code changes that we have incorporated into our dataset as additional features to BERT?s word representation features. In our study, a deep neural network (DNN) classifier has been used as an additional layer to fine-tune the BERT model on the task of commit classification. Several models have been evaluated to come up with a deep analysis of the impact of code changes on the classification performance of each commit category. Results and conclusions: Experimental results have shown that the DNN model trained on BERT?s word representations and Fixminer code changes (DNN@BERT+Fix_cc) provided the best performance and achieved 79.66% accuracy and a macro-average f1 score of 0.8. Comparison with the state-of-the-art model that combines keywords and code changes (RF@KW+CD_cc) has shown that our model achieved approximately 8% improvement in accuracy. Results have also shown that a DNN model using only BERT?s word representation features achieved an improvement of 5% in accuracy compared to the RF@KW+CD_cc model. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1016/j.infsof.2021.106566 | Information and Software Technology |
Keywords | DocType | Volume |
Software maintenance,Commit classification,Code changes,Deep neural networks,Pre-trained neural language model | Journal | 135 |
ISSN | Citations | PageRank |
0950-5849 | 1 | 0.35 |
References | Authors | |
0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Lobna Ghadhab | 1 | 1 | 0.35 |
Ilyes Jenhani | 2 | 81 | 7.13 |
Mohamed Wiem Mkaouer | 3 | 228 | 28.58 |
Montassar Ben Messaoud | 4 | 1 | 0.35 |