Title
Capabilities of outlier detection schemes in large datasets, framework and methodologies
Abstract
Outlier detection is concerned with discovering exceptional behaviors of objects. Its theoretical principle and practical implementation lay a foundation for some important applications such as credit card fraud detection, discovering criminal behaviors in e-commerce, discovering computer intrusion, etc. In this paper, we first present a unified model for several existing outlier detection schemes, and propose a compatibility theory, which establishes a framework for describing the capabilities for various outlier formulation schemes in terms of matching users'intuitions. Under this framework, we show that the density-based scheme is more powerful than the distance-based scheme when a dataset contains patterns with diverse characteristics. The density-based scheme, however, is less effective when the patterns are of comparable densities with the outliers. We then introduce a connectivity-based scheme that improves the effectiveness of the density-based scheme when a pattern itself is of similar density as an outlier. We compare density-based and connectivity-based schemes in terms of their strengths and weaknesses, and demonstrate applications with different features where each of them is more effective than the other. Finally, connectivity-based and density-based schemes are comparatively evaluated on both real-life and synthetic datasets in terms of recall, precision, rank power and implementation-free metrics.
Year
DOI
Venue
2007
10.1007/s10115-005-0233-6
Knowl. Inf. Syst.
Keywords
Field
DocType
existing outlier detection scheme,compatibility theory,outlier detection,comparable density,distance-based scheme,large datasets,connectivity-based scheme,credit card fraud detection,computer intrusion,various outlier formulation scheme,density-based scheme,unified model,e commerce
Data mining,Anomaly detection,Credit card fraud,Intrusion,Computer science,Outlier,Credit card,Artificial intelligence,Unified Model,Strengths and weaknesses,Machine learning
Journal
Volume
Issue
ISSN
11
1
0219-3116
Citations 
PageRank 
References 
31
1.22
23
Authors
4
Name
Order
Citations
PageRank
Jian Tang1526148.30
Zhixiang Chen239633.28
Ada Wai-Chee Fu34646417.59
David W. Cheung41511156.71