Abstract | ||
---|---|---|
In this work, we consider the problem of Cosine Similarity preserving dimensionality reduction (compression) for the sparse binary dataset. [18] suggested a compression algorithm for high dimensional, sparse, binary data for preserving Inner product and Hamming distance. In this work, we show that their proposed algorithm also works well for Cosine Similarity. We present a theoretical analysis of the dimension reduction bound and complement it with rigorous experimentation on real-world datasets. We compare our results with the state-of-the-art for the considered problem - SimHash [8], MinHash [21], Circulant Binary Embedding [25], and Densified one Permutation Hashing [20], and show that our result offers a significant saving in the compression time and the number of random bits required for the compression, and simultaneously provides comparable performance. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/BigMM50055.2020.00042 | 2020 IEEE Sixth International Conference on Multimedia Big Data (BigMM) |
Keywords | DocType | ISBN |
Cosine Similarity,Simhash,Minhash,Jaccard Similarity. | Conference | 978-1-7281-9326-7 |
Citations | PageRank | References |
0 | 0.34 | 12 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Rameshwar Pratap | 1 | 6 | 4.50 |
Karthik Revanuru | 2 | 0 | 0.34 |
Anirudh Ravi | 3 | 0 | 0.34 |
Raghav Kulkarni | 4 | 172 | 19.48 |