Title
A fast and noise resilient cluster-based anomaly detection
Abstract
Clustering, while systematically applied in anomaly detection, has a direct impact on the accuracy of the detection methods. Existing cluster-based anomaly detection methods are mainly based on spherical shape clustering. In this paper, we focus on arbitrary shape clustering methods to increase the accuracy of the anomaly detection. However, since the main drawback of arbitrary shape clustering is its high memory complexity, we propose to summarize clusters first. For this, we design an algorithm, called Summarization based on Gaussian Mixture Model (SGMM), to summarize clusters and represent them as Gaussian Mixture Models (GMMs). After GMMs are constructed, incoming new samples are presented to the GMMs, and their membership values are calculated, based on which the new samples are labeled as \"normal\" or \"anomaly.\" Additionally, to address the issue of noise in the data, instead of labeling samples individually, they are clustered first, and then each cluster is labeled collectively. For this, we present a new approach, called Collective Probabilistic Anomaly Detection (CPAD), in which, the distance of the incoming new samples and the existing SGMMs is calculated, and then the new cluster is labeled the same as of the closest cluster. To measure the distance of two GMM-based clusters, we propose a modified version of the Kullback---Libner measure. We run several experiments to evaluate the performances of the proposed SGMM and CPAD methods and compare them against some of the well-known algorithms including ABACUS, local outlier factor (LOF), and one-class support vector machine (SVM). The performance of SGMM is compared with ABACUS using Dunn and DB metrics, and the results indicate that the SGMM performs superior in terms of summarizing clusters. Moreover, the proposed CPAD method is compared with the LOF and one-class SVM considering the performance criteria of (a) false alarm rate, (b) detection rate, and (c) memory efficiency. The experimental results show that the CPAD method is noise resilient, memory efficient, and its accuracy is higher than the other methods.
Year
DOI
Venue
2017
10.1007/s10044-015-0484-0
Pattern Anal. Appl.
Keywords
Field
DocType
Anomaly detection, Arbitrary shape clustering, Gaussian Mixture Model, Distribution distance
Anomaly detection,Local outlier factor,Automatic summarization,Pattern recognition,Support vector machine,Artificial intelligence,Probabilistic logic,Constant false alarm rate,Cluster analysis,Mixture model,Machine learning,Mathematics
Journal
Volume
Issue
ISSN
20
1
1433-755X
Citations 
PageRank 
References 
5
0.42
31
Authors
4
Name
Order
Citations
PageRank
Elnaz Bigdeli1214.44
Mehdi Mohammadi2109150.02
Bijan Raahemi315522.29
Stan Matwin43025344.20