Title
D $^2$ 2 HistoSketch: Discriminative and Dynamic Similarity-Preserving Sketching of Streaming Histograms
Abstract
Histogram-based similarity has been widely adopted in many machine learning tasks. However, measuring histogram similarity is a challenging task for streaming histograms, where the elements of a histogram are observed one after the other in an online manner. The ever-growing cardinality of histogram elements over the data streams makes any similarity computation inefficient in that case. To tackle this problem, we propose in this paper D$^2$2HistoSketch, a similarity-preserving sketching method for streaming histograms to efficiently approximate their Discriminative and Dynamic similarity. D$^2$2HistoSketch can fast and memory-efficiently maintain a set of compact and fixed-size sketches of streaming histograms to approximate the similarity between histograms. To provide high-quality similarity approximations, D$^2$2HistoSketch considers both discriminative and gradual forgetting weights for similarity measurement, and seamlessly incorporates them in the sketches. Based on both synthetic and real-world datasets, our empirical evaluation shows that our method is able to efficiently and effectively approximate the similarity between streaming histograms while outperforming state-of-the-art sketching methods. Compared to full streaming histograms with both discriminative and gradual forgetting weights in particular, D$^2$2HistoSketch is able to dramatically reduce the classification time (with a 7500x speedup) at the expense of a small loss in accuracy only (about 3.25 percent).
Year
DOI
Venue
2019
10.1109/TKDE.2018.2867468
IEEE Transactions on Knowledge and Data Engineering
Keywords
DocType
Volume
Histograms,Streaming media,Task analysis,Maintenance engineering,Machine learning,Weight measurement,Memory management
Journal
31
Issue
ISSN
Citations 
10
1041-4347
1
PageRank 
References 
Authors
0.36
0
4
Name
Order
Citations
PageRank
Dingqi Yang154228.79
Bin Li269450.02
Laura Rettig340.74
o de troyer41708134.92