Title
Comparison of Cluster Representations from Partial Second- to Full Fourth-Order Cross Moments for Data Stream Clustering
Abstract
Under seven external clustering evaluation measures, a comparison is made for cluster representations from the partial second order to the fourth order in data stream clustering. Two external clustering evaluation measures, purity and cross entropy, adopted for data stream clustering performance evaluation in the past, penalize the performance of an algorithm when each hypothesized cluster contains points in different target classes or true clusters, while ignoring the issue of points in a target class falling into different hypothesized clusters. The seven measures will address both sides of the clustering performance. The represented geometry by the partial second-order statistics of a cluster is non-oblique ellipsoidal and cannot describe the orientation, asymmetry, or peakedness of a cluster. The higher-order cluster representation presented in this paper introduces the third and fourth cross moments, enabling the cluster geometry to be beyond an ellipsoid. The higher-order statistics allow two clusters with different representations to merge into a multivariate normal cluster, using normality tests based on multivariate skewness and kurtosis. The clustering performance under the seven external clustering evaluation measures with a synthetic and two real data streams demonstrates the effectiveness of the higher-order cluster representations.
Year
DOI
Venue
2008
10.1109/ICDM.2008.143
ICDM
Keywords
Field
DocType
cluster representation,external clustering evaluation measure,true cluster,multivariate normal cluster,data stream clustering,cluster geometry,cluster representations,full fourth-order cross moments,clustering performance,performance evaluation,partial second,data stream,higher-order cluster representation,gaussian mixture model,uncertainty,multivariate normal,second order,clustering algorithms,data structures,entropy,data mining,merging,covariance matrix,gaussian processes,higher order,probability density function,cross entropy
Data mining,Fuzzy clustering,Computer science,Artificial intelligence,Cluster analysis,Single-linkage clustering,k-medians clustering,Clustering high-dimensional data,Data stream clustering,Complete-linkage clustering,Pattern recognition,Correlation clustering,Machine learning
Conference
ISSN
Citations 
PageRank 
1550-4786
7
0.65
References 
Authors
7
2
Name
Order
Citations
PageRank
Mingzhou (Joe) Song1152.27
Lin Zhang273.69