Title
Prior-free rare category detection: More effective and efficient solutions
Abstract
Identifying statistically significant anomalies in an unlabeled data set is of key importance in many applications such as financial security and remote sensing. Rare category detection (RCD) helps address this issue by passing candidate data examples to a labeling oracle (e.g., a human expert) for labeling. A challenging task in RCD is to discover all categories without any prior information about the given data set. A few approaches have been proposed to address this issue, which are on quadratic or cubic time complexities w.r.t. the data set size N and require considerable labeling queries involving time-consuming and expensive labeling effort of a human expert. In this paper, aiming at solutions with lower time complexity and less labeling queries, we propose two prior-free (i.e., without any prior information about a given data set) RCD algorithms, namely (1) iFRED which achieves linear time complexity w.r.t. N, and (2) vFRED which substantially reduces the number of labeling queries. This is done by tabulating each dimension of the data set into bins, followed by zooming out to shrink each bin down to a position and conducting wavelet analysis on the data density function to fast locate the position (i.e., a bin) of a rare category, and zooming in the located bin to select candidate data examples for labeling. Theoretical analysis guarantees the effectiveness of our algorithms, and comprehensive experiments on both synthetic and real data sets further verify the effectiveness and efficiency.
Year
DOI
Venue
2014
10.1016/j.eswa.2014.06.026
Expert Systems with Applications: An International Journal
Keywords
Field
DocType
histogram density estimation,prior-free,rare category detection,wavelet analysis
Data mining,Data set,Bin,Computer science,Quadratic equation,Data density,Oracle,Zoom,Artificial intelligence,Time complexity,Machine learning,Wavelet
Journal
Volume
Issue
ISSN
41
17
0957-4174
Citations 
PageRank 
References 
9
0.57
21
Authors
5
Name
Order
Citations
PageRank
Zhenguang Liu1475.09
Kevin Chiew211611.06
Qinming He337141.53
Hao Huang4897.77
Butian Huang5161.37