Abstract | ||
---|---|---|
List intersection is at the core of information retrieval systems. Existing disk-based intersection algorithms were optimized for hard disk drives (HDDs) since HDDs have dominated the storage market for decades. In particular, those HDDcentric algorithms read every relevant list entirely to memory to minimize expensive random reads by performing sequential reads, although many entries in the list may be useless. Such a tradeoff makes perfect sense on HDDs, because random reads are one to two orders of magnitude slower than sequential reads. However, fast solid state drives (SSDs) have changed this landscape by improving random I/O performance dramatically. More importantly, they are manufactured with multiple flash channels to support parallel I/Os. As a result, the performance gap between random and sequential reads becomes very small on SSDs. This means that HDD-optimized intersection algorithms might not be suitable on SSDs because the total amount of data accessed is unnecessarily high.To understand the impact of SSDs to list intersection, in this work, we tune existing in-memory intersection algorithms to be SSD-aware with the idea of parallel I/O skipping, and experimentally evaluate them on synthetic and real datasets. The results provide insights on how to design efficient SSD-optimized intersection algorithms. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/ICDE51399.2021.00161 | 2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021) |
DocType | ISSN | Citations |
Conference | 1084-4627 | 0 |
PageRank | References | Authors |
0.34 | 0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jianguo Wang | 1 | 69 | 6.18 |
Chunbin Lin | 2 | 0 | 0.34 |
Yannis Papakonstantinou | 3 | 5657 | 837.56 |
Steven Swanson | 4 | 1434 | 82.33 |