Title
Self-Attention Networks for Code Search
Abstract
Context: Developers tend to search and reuse code snippets from a large-scale codebase when they want to implement some functions that exist in the previous projects, which can enhance the efficiency of software development. Objective: As the first deep learning-based code search model, DeepCS outperforms prior models such as Sourcere and CodeHow. However, it utilizes two separate LSTM to represent code snippets and natural language descriptions respectively, which ignores semantic relations between code snippets and their descriptions. Consequently, the performance of DeepCS falls into the bottleneck, and thus our objective is to break this bottleneck. Method: We propose a self-attention joint representation learning model, named SAN-CS (Self -Attention Network for Code Search). Comparing with DeepCS, we directly utilize the self-attention network to construct our code search model. By a weighted average operation, self-attention networks can fully capture the contextual information of code snippets and their descriptions. We first utilize two individual self-attention networks to represent code snippets and their descriptions, respectively, and then we utilize the self-attention network to conduct an extra joint representation network for code snippets and their descriptions, which can build semantic relationships between code snippets and their descriptions. Therefore, SAN-CS can break the performance bottleneck of DeepCS.Results: We evaluate SAN-CS on the dataset shared by Gu et al. and choose two baseline models, DeepCS and CARLCS-CNN. Experimental results demonstrate that SAN-CS achieves significantly better performance than DeepCS and CARLCS-CNN. In addition, SAN-CS has better execution efficiency than DeepCS at the training and testing phase.Conclusion: This paper proposes a code search model, SAN-CS. It utilizes the self-attention network to perform the joint attention representations for code snippets and their descriptions, respectively. Experimental results verify the effectiveness and efficiency of SAN-CS.
Year
DOI
Venue
2021
10.1016/j.infsof.2021.106542
Information and Software Technology
Keywords
DocType
Volume
Code search,Self-attention mechanism,Joint embedding
Journal
134
ISSN
Citations 
PageRank 
0950-5849
1
0.35
References 
Authors
0
4
Name
Order
Citations
PageRank
Sen Fang110.35
You-Shuai Tan210.35
Tao Zhang312814.89
Yepang Liu441524.58