Title
A Big Data Online Cleaning Algorithm Based on Dynamic Outlier Detection
Abstract
To effectively clean the large-scale, mixed and inaccurate monitoring or collective data, reduce the cost of data cache and ensure the consistent deviation detection on timing data of each cycle, a big data online cleaning algorithm based on dynamic outlier detection has been proposed. The data cleaning method is improved by local outliner detection upon density, sampling cluster uniformly dilution Euclidean distance matrix retaining some corrections into next cycle of cleaning, which avoids a sampling causing overall cleaning deviation and reduces amount of calculation within data cleaning stable time, enhancing the speed greatly. Finally, the distributed solutions on online cleaning algorithm based on Hadoop platform.
Year
DOI
Venue
2015
10.1109/CyberC.2015.68
CyberC
Keywords
Field
DocType
component, online cleaning, deviation detection, dynamic outlier detection, big data
Anomaly detection,Data mining,Computer science,Euclidean distance,Algorithm,Real-time computing,Data cache,Sampling (statistics),Distributed database,Big data,Euclidean distance matrix
Conference
Citations 
PageRank 
References 
1
0.41
3
Authors
5
Name
Order
Citations
PageRank
Yinglong Diao111.42
Ke-yan Liu212.44
Xiaoli Meng3474.19
Xueshun Ye410.41
Kaiyuan He511.09