Title | ||
---|---|---|
D $^2$ 2 HistoSketch: Discriminative and Dynamic Similarity-Preserving Sketching of Streaming Histograms |
Abstract | ||
---|---|---|
Histogram-based similarity has been widely adopted in many machine learning tasks. However, measuring histogram similarity is a challenging task for streaming histograms, where the elements of a histogram are observed one after the other in an online manner. The ever-growing cardinality of histogram elements over the data streams makes any similarity computation inefficient in that case. To tackle this problem, we propose in this paper D$^2$2HistoSketch, a similarity-preserving sketching method for streaming histograms to efficiently approximate their Discriminative and Dynamic similarity. D$^2$2HistoSketch can fast and memory-efficiently maintain a set of compact and fixed-size sketches of streaming histograms to approximate the similarity between histograms. To provide high-quality similarity approximations, D$^2$2HistoSketch considers both discriminative and gradual forgetting weights for similarity measurement, and seamlessly incorporates them in the sketches. Based on both synthetic and real-world datasets, our empirical evaluation shows that our method is able to efficiently and effectively approximate the similarity between streaming histograms while outperforming state-of-the-art sketching methods. Compared to full streaming histograms with both discriminative and gradual forgetting weights in particular, D$^2$2HistoSketch is able to dramatically reduce the classification time (with a 7500x speedup) at the expense of a small loss in accuracy only (about 3.25 percent). |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/TKDE.2018.2867468 | IEEE Transactions on Knowledge and Data Engineering |
Keywords | DocType | Volume |
Histograms,Streaming media,Task analysis,Maintenance engineering,Machine learning,Weight measurement,Memory management | Journal | 31 |
Issue | ISSN | Citations |
10 | 1041-4347 | 1 |
PageRank | References | Authors |
0.36 | 0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Dingqi Yang | 1 | 542 | 28.79 |
Bin Li | 2 | 694 | 50.02 |
Laura Rettig | 3 | 4 | 0.74 |
o de troyer | 4 | 1708 | 134.92 |