Title
Impact of structural weighting on a latent Dirichlet allocation-based feature location technique.
Abstract
Text retrieval-based feature location techniques (FLTs) use information from the terms present in documents in classes and methods. However, relevant terms originating from certain locations (eg, method names) often comprise only a small part of the entire method lexicon. Feature location techniques should benefit from techniques that make greater use of this information. The primary objective of this study was to investigate how weighting terms from different locations in source code can improve a latent Dirichlet allocation (LDA)-based FLT. We conducted an empirical study of 4 subject software systems and 372 features. For each subject system, we trained 1024 different LDA models with new weighting schemes applied to leading comments, method names, parameters, body comments, and local variables. We conducted both a quantitative and qualitative analysis to identify the effects of using the weighting schemes on the performance of the LDA-based FLT. We evaluated weighting schemes based on mean reciprocal rank and spread of effectiveness measures. In addition, we conducted a factorial analysis to identify which locations have a main impact on the results of the FLT. We then examined the effects of adding information from class comments, class names, and fields to the top 10 configurations for each system. This results in an additional 640 different LDA models for each system. From our results, we identified a significant effect in the performance of an LDA-based weighting configuration when applying our weighting schemes to the LDA-based FLT. Furthermore, we found that adding information from each method's containing class can improve the effectiveness of an LDA-based FLT. Finally, we identified a set of recommendations for identifying better weighting schemes for LDA.
Year
DOI
Venue
2018
10.1002/smr.1892
JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS
Keywords
Field
DocType
feature location,program comprehension,static analysis,term weighting,text retrieval
Data mining,Latent Dirichlet allocation,Weighting,Pattern recognition,Source code,Static analysis,Software system,Mean reciprocal rank,Artificial intelligence,Engineering,Program comprehension,Local variable
Journal
Volume
Issue
ISSN
30.0
1.0
2047-7473
Citations 
PageRank 
References 
3
0.36
17
Authors
3
Name
Order
Citations
PageRank
Brian P. Eddy1816.53
Nicholas A. Kraft268435.95
Jeff Gray3973116.57