Title
An information-theoretic approach to hierarchical clustering of uncertain data.
Abstract
Uncertain data clustering has become central in mining data whose observed representation is naturally affected by imprecision, staling, or randomness that is implicit when storing this data from real-word sources. Most existing methods for uncertain data clustering follow a partitional or a density-based clustering approach, whereas little research has been devoted to the hierarchical clustering paradigm. In this work, we push forward research in hierarchical clustering of uncertain data by introducing a well-founded solution to the problem via an information-theoretic approach, following the initial idea described in our earlier work[26]. We propose a prototype-based agglomerative hierarchical clustering method, dubbed U-AHC, which employs a new uncertain linkage criterion for cluster merging. This criterion enables the comparison of (sets of) uncertain objects based on information-theoretic as well as expected-distance measures. To assess our proposal, we have conducted a comparative evaluation with state-of-the-art algorithms for clustering uncertain objects, on both benchmark and real datasets. We also compare with two basic definitions of agglomerative hierarchical clustering that are treated as baseline methods in terms of accuracy and efficiency of the clustering results, respectively. Main experimental findings reveal that U-AHC generally outperforms competing methods in accuracy and, from an efficiency viewpoint, is comparable to the fastest baseline version of agglomerative hierarchical clustering.
Year
DOI
Venue
2017
10.1016/j.ins.2017.03.030
Inf. Sci.
Keywords
Field
DocType
Clustering,Hierarchical clustering,Uncertain data,Information theory,Probability distributions,Mixture models
Hierarchical clustering,Fuzzy clustering,Canopy clustering algorithm,Data mining,CURE data clustering algorithm,Correlation clustering,Artificial intelligence,Brown clustering,Cluster analysis,Mathematics,Machine learning,Single-linkage clustering
Journal
Volume
Issue
ISSN
402
C
0020-0255
Citations 
PageRank 
References 
3
0.38
36
Authors
4
Name
Order
Citations
PageRank
Francesco Gullo148332.63
Giovanni Ponti213314.31
Andrea Tagarelli347552.29
Sergio Greco41249265.35