Title
Correlation Clustering Revisited: The "True" Cost of Error Minimization Problems
Abstract
Correlation Clustering was defined by Bansal, Blum, and Chawla as the problem of clustering a set of elements based on a, possibly inconsistent, binary similarity function between element pairs. Their setting is agnostic in the sense that a ground truth clustering is not assumed to exist, and the cost of a solution is computed against the input similarity function. This problem has been studied in theory and in practice and has been subsequently proven to be APX-Hard. In this work we assume that there does exist an unknown correct clustering of the data. In this setting, we argue that it is more reasonable to measure the output clustering's accuracy against the unknown underlying true clustering. We present two main results. The first is a novel method for continuously morphing a general (non-metric) function into a pseudometric. This technique may be useful for other metric embedding and clustering problems. The second is a simple algorithm for randomly rounding a pseudometric into a clustering. Combining the two, we obtain a certificate for the possibility of getting a solution of factor strictly less than 2 for our problem. This approximation coefficient could not have been achieved by considering the agnostic version of the problem unless P = NP .
Year
DOI
Venue
2009
10.1007/978-3-642-02927-1_4
ICALP
Keywords
Field
DocType
correlation clustering revisited,approximation coefficient,unknown correct clustering,correlation clustering,error minimization problems,agnostic version,input similarity function,binary similarity function,unknown underlying true clustering,output clustering,ground truth clustering,clustering problem,distance function,ground truth,combinatorial optimization,triangle inequality,record linkage
k-medians clustering,Fuzzy clustering,Discrete mathematics,Canopy clustering algorithm,CURE data clustering algorithm,Combinatorics,Data stream clustering,Correlation clustering,Computer science,Algorithm,Constrained clustering,Cluster analysis
Conference
Volume
ISSN
Citations 
5555
0302-9743
11
PageRank 
References 
Authors
0.74
14
2
Name
Order
Citations
PageRank
Nir Ailon1111470.74
Edo Liberty239724.83