Title
Proposing And Evaluating Clone Detection Approaches With Preprocessing Input Source Files
Abstract
So far, many approaches for detecting code clones have been proposed based on the different degrees of normalizations (e.g. removal of white spaces, tokenization, and regularization of identifiers). Different degrees of normalizations lead to different granularities of source code to be detect as code clones. To investigate how the normalizations impact the code clone detection, this study proposes six approaches for detecting code clones with preprocessing input source files using different degrees of normalizations. More precisely, each normalization is applied to the input source files and then equivalence class partitioning is performed to the files in the preprocessing. After that, code clones are detected from a set of files that are representatives of each equivalence class using a token-based code clone detection tool named CCFinder. The proposed approaches can be categorized into two types, approaches with non-normalization and normalization. The former is the detection of only identical files without any normalization. Meanwhile, the latter category is the detection of identical files with different degrees of normalizations such as removal of all lines containing macros. From the case study, we observed that our proposed approaches detect code clones faster than the approach that uses only CCFinder. We also found the approach with non-normalization is the fastest among the proposed approaches in many cases.
Year
DOI
Venue
2015
10.1587/transinf.2014EDP7292
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS
Keywords
Field
DocType
code clone, hash function, source code transformation
Tokenization (data security),Data mining,Equivalence partitioning,Normalization (statistics),Identifier,Pattern recognition,Source code,Computer science,Preprocessor,Artificial intelligence,Hash function,Equivalence class
Journal
Volume
Issue
ISSN
E98D
2
1745-1361
Citations 
PageRank 
References 
0
0.34
14
Authors
4
Name
Order
Citations
PageRank
Eunjong Choi17611.21
Norihiro Yoshida219623.33
Y. Higo3455.73
Katsuro Inoue42424172.31