Title
Automatic constraints generation for semisupervised clustering: experiences with documents classification
Abstract
In the last times, semi-supervised clustering has been an area that has received a lot of attention. It is distinguished from more traditional unsupervised approaches on the use of a small amount of supervision to \"steer\" clustering. Unfortunately in the real world, the supervision is not always available: data to process are often too large and so the cost (in terms of time and human resources) for user-provided information is not conceivable. To address this issue, this work presents an automatic generation of the supervision, by the analysis of the data structure itself. This analysis is performed using a partitional clustering algorithm that discovers relationships between pairs of instances that may be used as a semi-supervision in the clustering process. The methodology has been studied in the document clustering domain, an area where novel approaches for accurate documents classifications are strongly required. Experimental result shows the validity of this approach.
Year
DOI
Venue
2016
10.1007/s00500-015-1643-3
Soft Computing - A Fusion of Foundations, Methodologies and Applications
Keywords
Field
DocType
Class Label, Side Information, Normalize Mutual Information, Document Cluster, Pairwise Constraint
Data mining,Canopy clustering algorithm,Fuzzy clustering,CURE data clustering algorithm,Clustering high-dimensional data,Data stream clustering,Computer science,Constrained clustering,Artificial intelligence,Conceptual clustering,Cluster analysis,Machine learning
Journal
Volume
Issue
ISSN
20
6
1432-7643
Citations 
PageRank 
References 
5
0.41
20
Authors
5
Name
Order
Citations
PageRank
Irene Diaz-Valenzuela1292.92
Vincenzo Loia21792148.86
María J. Martín-Bautista31339.90
Sabrina Senatore445640.85
Amparo Vila5557.21