Title
CUSTODES: automatic spreadsheet cell clustering and smell detection using strong and weak features.
Abstract
Various techniques have been proposed to detect smells in spreadsheets, which are susceptible to errors. These techniques typically detect spreadsheet smells through a mechanism based on a fixed set of patterns or metric thresholds. Unlike conventional programs, tabulation styles vary greatly across spreadsheets. Smell detection based on fixed patterns or metric thresholds, which are insensitive to the varying tabulation styles, can miss many smells in one spreadsheet while reporting many spurious smells in another. In this paper, we propose CUSTODES to effectively cluster spreadsheet cells and detect smells in these clusters. The clustering mechanism can automatically adapt to the tabulation styles of each spreadsheet using strong and weak features. These strong and weak features capture the invariant and variant parts of tabulation styles, respectively. As smelly cells in a spreadsheet normally occur in minority, they can be mechanically detected as clusters' outliers in feature spaces. We implemented and applied CUSTODES to 70 spreadsheets files randomly sampled from the EUSES corpus. These spreadsheets contain 1,610 formula cell clusters. Experimental results confirmed that CUSTODES is effective. It successfully detected harmful smells that can induce computation anomalies in spreadsheets with an F-measure of 0.72, outperforming state-of-the-art techniques.
Year
DOI
Venue
2016
10.1145/2884781.2884796
ICSE
Keywords
Field
DocType
Spreadsheets, cell clustering, smell detection, feature modeling, end-user programming
Data mining,Computer science,Outlier,Feature extraction,Software,Cluster analysis,Spurious relationship,Feature modeling,Computation
Conference
ISSN
ISBN
Citations 
0270-5257
978-1-4503-3900-1
16
PageRank 
References 
Authors
0.56
32
4
Name
Order
Citations
PageRank
S. C. Cheung12657162.89
Wanjun Chen2160.56
Yepang Liu341524.58
Chang Xu448736.94