Abstract | ||
---|---|---|
We propose a novel sketching approach for streaming data that, even with limited computing resources, enables processing high volume and high velocity data efficiently. Our approach accounts for the fact that a stream of data is generally dynamic, with the underlying distribution possibly changing all the time. Specifically, we propose a hashing (sketching) technique that is able to automatically estimate a histogram from a stream of data by using a model with adaptive coefficients. Such a model is necessary to enable the preservation of histogram similarities, following the varying weight/importance of the generated histograms. To address the dynamic properties of data streams, we develop a novel algorithm that can sketch the histograms from a data stream using multiple weighted factors. The results from our extensive experiments on both synthetic and real-world datasets show the effectiveness and the efficiency of the proposed method.
|
Year | DOI | Venue |
---|---|---|
2019 | 10.1145/3357384.3357958 | Proceedings of the 28th ACM International Conference on Information and Knowledge Management |
Keywords | Field | DocType |
concept drift, histogram, sketch, stream, weighted factors | Data mining,Histogram,Pattern recognition,Computer science,Artificial intelligence | Conference |
ISBN | Citations | PageRank |
978-1-4503-6976-3 | 0 | 0.34 |
References | Authors | |
0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Quang-Huy Duong | 1 | 0 | 0.68 |
Heri Ramampiaro | 2 | 154 | 20.46 |
Kjetil Nørvåg | 3 | 1311 | 79.26 |