Abstract | ||
---|---|---|
To effectively clean the large-scale, mixed and inaccurate monitoring or collective data, reduce the cost of data cache and ensure the consistent deviation detection on timing data of each cycle, a big data online cleaning algorithm based on dynamic outlier detection has been proposed. The data cleaning method is improved by local outliner detection upon density, sampling cluster uniformly dilution Euclidean distance matrix retaining some corrections into next cycle of cleaning, which avoids a sampling causing overall cleaning deviation and reduces amount of calculation within data cleaning stable time, enhancing the speed greatly. Finally, the distributed solutions on online cleaning algorithm based on Hadoop platform. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1109/CyberC.2015.68 | CyberC |
Keywords | Field | DocType |
component, online cleaning, deviation detection, dynamic outlier detection, big data | Anomaly detection,Data mining,Computer science,Euclidean distance,Algorithm,Real-time computing,Data cache,Sampling (statistics),Distributed database,Big data,Euclidean distance matrix | Conference |
Citations | PageRank | References |
1 | 0.41 | 3 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yinglong Diao | 1 | 1 | 1.42 |
Ke-yan Liu | 2 | 1 | 2.44 |
Xiaoli Meng | 3 | 47 | 4.19 |
Xueshun Ye | 4 | 1 | 0.41 |
Kaiyuan He | 5 | 1 | 1.09 |