Title
Clustering OSS License Statements toward Automatic Generation of License Rules
Abstract
Reusing open source software (OSS) components for own software products has become common in the modarn software development. Automated license identification tools has been proposed to help developers identify OSS licenses, since a large number of licenses sometimes must be checked to be reused. Of the existing tools, Ninka can most correctly identify licenses of each source file by using regular expressions. In case Ninka does not have license identification rules for unknown licenses, Ninka reports they are "unknown licenses" which must be checked by developers manually. Since completely-new or derived OSS licenses appear nearly every year, a license identification tool should be appropriately maintained by adding regular expressions corresponding to the new licenses. The final goal of our study is to construct a method to automatically create candidates of regular expressions to be added to a license identification tool such as Ninka. Toward achieving the goal, files identified as unknown licenses must be classified by license. In this paper, we propose a hierarchical clustering which divides unknown licenses into clusters of the same licenses. We conduct a case study to confirm the usefulness of our clustering method when it is applied for classifying 2,838 unknown license files of Debian v7.8.0. As a result, it is confirmed that our method can create clusters which are suitable as candidates for generating license rules automatically.
Year
DOI
Venue
2016
10.1109/IWESEP.2016.20
2016 7th International Workshop on Empirical Software Engineering in Practice (IWESEP)
Keywords
DocType
ISSN
OSS license,license identification,hierarchical clustering
Conference
2333-519X
Citations 
PageRank 
References 
0
0.34
8
Authors
3
Name
Order
Citations
PageRank
Yunosuke Higashi100.34
Yuki Manabe272.52
Masao Ohira327520.89