Title
Distributed anomaly detection using 1-class SVM for vertically partitioned data
Abstract
There has been a tremendous increase in the volume of sensor data collected over the last decade for different monitoring tasks. For example, petabytes of earth science data are collected from modern satellites, in situ sensors and different climate models. Similarly, huge amount of flight operational data is downloaded for different commercial airlines. These different types of data sets need to be analyzed for finding outliers. Information extraction from such rich data sources using advanced data mining methodologies is a challenging task not only because of the massive volume of data but also because these data sets are physically stored at different geographical locations with only a subset of features available at any location. Moving these petabytes of data to a single location may waste a lot of bandwidth. To solve this problem, in this paper, we present a novel algorithm which can identify outliers in the entire data without moving all the data to a single location. The method we propose only centralizes a very small sample from the different data subsets at different locations. We analytically prove and experimentally verify that the algorithm offers high accuracy compared to complete centralization with only a fraction of the communication cost. We show that our algorithm is highly relevant to both earth sciences and aeronautics by describing applications in these domains. The performance of the algorithm is demonstrated on two large publicly available data sets: (i) the NASA MODIS satellite images and (ii) a simulated aviation data set generated by the ‘Commercial Modular Aero-Propulsion System Simulation’ (CMAPSS). © 2011 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 4: 393–406, 2011 (A shorter version of this paper was published in NASA Conference on Intelligent Data Understanding 2010.)
Year
DOI
Venue
2011
10.1002/sam.10125
Statistical Analysis and Data Mining
Keywords
Field
DocType
anomaly detection,partitioned data,entire data,different data subsets,simulated aviation data,advanced data mining methodology,flight operational data,1-class svm,available data set,earth science data,sensor data,rich data source,machines,vector
Anomaly detection,Data mining,Data set,Petabyte,Computer science,Support vector machine,Outlier,Information extraction,Data type,Artificial intelligence,Modular design,Machine learning
Journal
Volume
Issue
Citations 
4
4
9
PageRank 
References 
Authors
0.56
18
3
Name
Order
Citations
PageRank
Kamalika Das116813.46
Kanishka Bhaduri228918.96
Petr Votava36610.03