Title
Clustering heterogeneous data with mutual semi-supervision
Abstract
We propose a new methodology for clustering data comprising multiple domains or parts, in such a way that the separate domains mutually supervise each other within a semi-supervised learning framework. Unlike existing uses of semi-supervised learning, our methodology does not assume the presence of labels from part of the data, but rather, each of the different domains of the data separately undergoes an unsupervised learning process, while sending and receiving supervised information in the form of data constraints to/from the other domains. The entire process is an alternation of semi-supervised learning stages on the different data domains, based on Basu et al.'s Hidden Markov Random Fields (HMRF) variation of the K-means algorithm for semi-supervised clustering that combines the constraint-based and distance-based approaches in a unified model. Our experiments demonstrate a successful mutual semi-supervision between the different domains during clustering, that is superior to the traditional heterogeneous domain clustering baselines consisting of converting the domains to a single domain or clustering each of the domains separately.
Year
DOI
Venue
2012
10.1007/978-3-642-34109-0_4
SPIRE
Keywords
Field
DocType
semi-supervised learning stage,mutual semi-supervision,data constraint,different data domain,heterogeneous data,semi-supervised learning framework,semi-supervised clustering,entire process,semi-supervised learning,different domain,unsupervised learning process,clustering data
Fuzzy clustering,Data mining,CURE data clustering algorithm,Data stream clustering,Correlation clustering,Computer science,Constrained clustering,Conceptual clustering,Cluster analysis,Single-linkage clustering
Conference
Volume
ISSN
Citations 
7608
0302-9743
2
PageRank 
References 
Authors
0.37
14
2
Name
Order
Citations
PageRank
Artur Abdullin1101.88
Olfa Nasraoui21515164.53