Title
Cluster Summarization with Dense Region Detection.
Abstract
This paper introduces a new approach to summarize clusters by finding dense regions, and representing each cluster as a Gaussian Mixture Model (GMM). The GMM summarization allows us to summarize a cluster efficiently, then regenerate the original data with high accuracy. Unlike the classical representation of a cluster using a radius and a center, the proposed approach keeps information of the shape, as well as distributions of the samples in the clusters. Considering the GMM as a parametric model (number of Gaussian mixtures in each GMM), we propose a method to find number of Gaussian mixtures automatically. Each GMM is able to summarize a cluster generated by any kind of clustering algorithms and regenerate the original data with high accuracy. Moreover, when a new sample is presented to the GMMs of clusters, a membership value is calculated for each cluster. Then, using the membership values, the new incoming sample is assigned to the closest cluster. Employing the GMMs to summarize clusters offers several advantages with regards to accuracy, detection rate, memory efficiency and time complexity. We evaluate the proposed method on a variety of datasets, both synthetic dataset and real datasets from the UCI repository. We examine the quality of the summarized clusters generated by the proposed method in terms of DUNN, DB, SD and SSD indexes, and compare them with that of the well-known ABACUS method. We also employ the proposed algorithm in anomaly detection applications, and study the performance of the proposed method in terms of false alarm and detection rates, and compare them with Negative Selection, Naive models, and ABACUS. Furthermore, we evaluate the memory usage and processing time of the proposed algorithms with other algorithms. The results illustrate that our algorithm outperforms other well-known anomaly detection algorithms in terms of accuracy, detection rate, as well as memory usage and processing time.
Year
DOI
Venue
2014
10.1007/978-3-319-25840-9_5
Communications in Computer and Information Science
Field
DocType
Volume
Anomaly detection,Data mining,Automatic summarization,False alarm,Computer science,Gaussian,Constant false alarm rate,Cluster analysis,Time complexity,Mixture model
Conference
553
ISSN
Citations 
PageRank 
1865-0929
1
0.36
References 
Authors
17
4
Name
Order
Citations
PageRank
Elnaz Bigdeli1214.44
Mehdi Mohammadi2109150.02
Bijan Raahemi315522.29
Stan Matwin43025344.20