Title
Evaluating List Intersection On Ssds For Parallel I/O Skipping
Abstract
List intersection is at the core of information retrieval systems. Existing disk-based intersection algorithms were optimized for hard disk drives (HDDs) since HDDs have dominated the storage market for decades. In particular, those HDDcentric algorithms read every relevant list entirely to memory to minimize expensive random reads by performing sequential reads, although many entries in the list may be useless. Such a tradeoff makes perfect sense on HDDs, because random reads are one to two orders of magnitude slower than sequential reads. However, fast solid state drives (SSDs) have changed this landscape by improving random I/O performance dramatically. More importantly, they are manufactured with multiple flash channels to support parallel I/Os. As a result, the performance gap between random and sequential reads becomes very small on SSDs. This means that HDD-optimized intersection algorithms might not be suitable on SSDs because the total amount of data accessed is unnecessarily high.To understand the impact of SSDs to list intersection, in this work, we tune existing in-memory intersection algorithms to be SSD-aware with the idea of parallel I/O skipping, and experimentally evaluate them on synthetic and real datasets. The results provide insights on how to design efficient SSD-optimized intersection algorithms.
Year
DOI
Venue
2021
10.1109/ICDE51399.2021.00161
2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021)
DocType
ISSN
Citations 
Conference
1084-4627
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
Jianguo Wang1696.18
Chunbin Lin200.34
Yannis Papakonstantinou35657837.56
Steven Swanson4143482.33