Title
MIC check: a correlation tactic for ESE data
Abstract
Empirical software engineering researchers are concerned with understanding the relationships between outcomes of interest, e.g. defects, and process and product measures. The use of correlations to uncover strong relationships is a natural precursor to multivariate modeling. Unfortunately, correlation coefficients can be difficult and/or misleading to interpret. For example, a strong correlation occurs between variables that stand in a polynomial relationship; this may lead one mistakenly, and eventually misleadingly, to model a polynomially related variable in a linear regression. Likewise, a non-monotonic functional, or even non-functional relationship might be entirely missed by a correlation coefficient. Outliers can influence standard correlation measures, tied values can unduly influence even robust non-parametric rank correlation, measures, and smaller sample sizes can cause instability in correlation measures. A new bivariate measure of association, Maximal Information Coefficient (MIC) [1], promises to simultaneously discover if two variables have: a) any association, b) a functional relationship, and c) a non-linear relationship. The MIC is a very useful complement to standard and rank correlation measures. It separately characterizes the existence of a relationship and its precise nature; thus, it enables more informed choices in modeling non-functional and non-linear relationships, and a more nuanced indicator of potential problems with the values reported by standard and rank correlation measures. We illustrate the use of MIC using a variety of software engineering metrics. We study and explain the distributional properties of MIC and related measures in software engineering data, and illustrate the value of these measures for the empirical software engineering researcher.
Year
DOI
Venue
2012
10.1109/MSR.2012.6224295
MSR
Keywords
Field
DocType
empirical software engineering,software measurement,software metrics,linear regression,correlation,rank correlation,software engineering,sample size,monotone function,dataset
Rank correlation,Data mining,Correlation coefficient,Computer science,Correlation,Software metric,Maximal information coefficient,Bivariate analysis,Statistics,Sample size determination,Linear regression
Conference
ISBN
Citations 
PageRank 
978-1-4673-1761-0
3
0.37
References 
Authors
5
3
Name
Order
Citations
PageRank
Daryl Posnett157819.11
Premkumar Devanbu24956357.68
Vladimir Filkov3150375.32