Title
Reverse engineering variability from requirement documents based on probabilistic relevance and word embedding.
Abstract
Feature and variability extraction from different artifacts is an indispensable activity to support systematic integration of single software systems and Software Product Line (SPL). Beyond manually extracting variability, a variety of approaches, such as feature location in source code and feature extraction in requirements, has been proposed to provide an automatic identification of features and their variation points. Compared with source code, requirements contain more complete variability information and provide traceability links to other artifacts from early development phases. In this paper, we propose a method to automatically extract features and relationships based on a probabilistic relevance and word embedding. In particular, our technique consists of three steps: First, we apply word2vec to obtain a prediction model, which we use to determine the word level similarity of requirements. Second, based on word level similarity and the significance of a word in a domain, we compute the requirements level similarity using probabilistic relevance. Third, we adopt hierarchical clustering to group features and we define four criteria to detect variation points between identified features. We perform a case study to evaluate the usability and robustness of our method and to compare it with the results of other related approaches. Initial results reveal that our approach identifies the majority of features correctly and also extracts variability information with reasonable accuracy.
Year
Venue
Field
2018
SPLC
Data mining,Source code,Computer science,Reverse engineering,Software system,Feature extraction,Control engineering,Software product line,Probabilistic logic,Word2vec,Word embedding
DocType
Citations 
PageRank 
Conference
1
0.35
References 
Authors
19
3
Name
Order
Citations
PageRank
Yang Li1659125.00
Sandro Schulze225923.43
Gunter Saake33255639.75