Correlation clustering with noisy input - Citegraph

Paper Info

Title
Correlation clustering with noisy input

Abstract
Correlation clustering is a type of clustering that uses a basic form of input data: For every pair of data items, the input specifies whether they are similar (belonging to the same cluster) or dissimilar (belonging to different clusters). This information may be inconsistent, and the goal is to find a clustering (partition of the vertices) that disagrees with as few pieces of information as possible. Correlation clustering is APX-hard for worst-case inputs. We study the following semi-random noisy model to generate the input: start from an arbitrary partition of the vertices into clusters. Then, for each pair of vertices, the similarity information is corrupted (noisy) independently with probability p. Finally, an adversary generates the input by choosing similarity/dissimilarity information arbitrarily for each corrupted pair of vertices. In this model, our algorithm produces a clustering with cost at most 1 + O(n-1/6) times the cost of the optimal clustering, as long as p ≤ 1/2 - n-1/3. Moreover, if all clusters have size at least1 c1√n then we can exactly reconstruct the planted clustering. If the noise p is small, that is, p ≤ n-δ/60, then we can exactly reconstruct all clusters of the planted clustering that have size at least 3150/δ, and provide a certificate (witness) proving that those clusters are in any optimal clustering. Among other techniques, we use the natural semi-definite programming relaxation followed by an interesting rounding phase. The analysis uses SDP duality and spectral properties of random matrices.

Year	DOI	Venue
2010	10.5555/1873601.1873659	SODA
Keywords	Field	DocType
corrupted pair,dissimilarity information,input data,worst-case input,arbitrary partition,optimal clustering,correlation clustering,similarity information,noise p,noisy input,probability p,random matrices	k-medians clustering,Fuzzy clustering,Discrete mathematics,CURE data clustering algorithm,Combinatorics,Complete-linkage clustering,Correlation clustering,Determining the number of clusters in a data set,Cluster analysis,Mathematics,Single-linkage clustering	Conference
Volume	ISBN	Citations
135	978-0-89871-698-6	30
PageRank	References	Authors
1.09	21	2

Authors (2 rows)

Cited by (30 rows)

References (21 rows)

Name	Order	Citations	PageRank
Claire Mathieu	1	452	25.78
Warren Schudy	2	221	12.08

1