Title
A comparison of stemmers on source code identifiers for software search
Abstract
As the popularity of text-based source code analysis grows, the use of stemmers to strip suffixes has increased. Stemmers have been used to more accurately determine relevance between a keyword query and methods in source code for search, exploration, and bug localization. In this paper, we investigate which traditional stemmers perform best on the domain of software, specifically, Java source code. We compare the stemmers using two case studies: a comparative analysis of the unified word classes in terms of accuracy and completeness, as well as an investigation into the effectiveness of stemming for software search. Our results indicate that relative stemmer effectiveness varies with a software engineering tool such as search, justifying further research into this area.
Year
DOI
Venue
2011
10.1109/ICSM.2011.6080817
ICSM
Keywords
Field
DocType
text-based source code analysis,java source code,software search,source code identifiers,source code,relative stemmer effectiveness,comparative analysis,case study,traditional stemmers,bug localization,software engineering tool,adders,software engineering,stemming,java,accuracy
Programming language,Systems engineering,Information retrieval,Identifier,Computer science,Source code,Software,Java,Completeness (statistics),Java source code
Conference
Citations 
PageRank 
References 
7
0.48
0
Authors
3
Name
Order
Citations
PageRank
Andrew Wiese170.48
Valerie Ho270.48
Emily Hill383434.58