Title
Extracting domain models from natural-language requirements: approach and industrial evaluation.
Abstract
Domain modeling is an important step in the transition from natural-language requirements to precise specifications. For large systems, building a domain model manually is a laborious task. Several approaches exist to assist engineers with this task, whereby candidate domain model elements are automatically extracted using Natural Language Processing (NLP). Despite the existing work on domain model extraction, important facets remain under-explored: (1) there is limited empirical evidence about the usefulness of existing extraction rules (heuristics) when applied in industrial settings; (2) existing extraction rules do not adequately exploit the natural-language dependencies detected by modern NLP technologies; and (3) an important class of rules developed by the information retrieval community for information extraction remains unutilized for building domain models. Motivated by addressing the above limitations, we develop a domain model extractor by bringing together existing extraction rules in the software engineering literature, extending these rules with complementary rules from the information retrieval literature, and proposing new rules to better exploit results obtained from modern NLP dependency parsers. We apply our model extractor to four industrial requirements documents, reporting on the frequency of different extraction rules being applied. We conduct an expert study over one of these documents, investigating the accuracy and overall effectiveness of our domain model extractor.
Year
DOI
Venue
2016
10.1145/2976767.2976769
MoDELS
Keywords
Field
DocType
Model Extraction, Natural-Language Requirements, Natural Language Processing, Case Study Research.
Data mining,Domain engineering,Empirical evidence,Computer science,Exploit,Heuristics,Information extraction,Parsing,Domain model,Natural language requirements
Conference
Citations 
PageRank 
References 
15
0.65
1
Authors
4
Name
Order
Citations
PageRank
Chetan Arora129629.51
Mehrdad Sabetzadeh298861.84
Lionel C. Briand38795481.98
Frank Zimmer413711.95