Title
Streaming Data Reorganization at Scale with DeltaFS Indexed Massive Directories
Abstract
Complex storage stacks providing data compression, indexing, and analytics help leverage the massive amounts of data generated today to derive insights. It is challenging to perform this computation, however, while fully utilizing the underlying storage media. This is because, while storage servers with large core counts are widely available, single-core performance and memory bandwidth per core grow slower than the core count per die. Computational storage offers a promising solution to this problem by utilizing dedicated compute resources along the storage processing path. We present DeltaFS Indexed Massive Directories (IMDs), a new approach to computational storage. DeltaFS IMDs harvest available (i.e., not dedicated) compute, memory, and network resources on the compute nodes of an application to perform computation on data. We demonstrate the efficiency of DeltaFS IMDs by using them to dynamically reorganize the output of a real-world simulation application across 131,072 CPU cores. DeltaFS IMDs speed up reads by 1,740× while only slightly slowing down the writing of data during simulation I/O for in situ data processing.
Year
DOI
Venue
2020
10.1145/3415581
ACM Transactions on Storage
Keywords
DocType
Volume
In situ processing,computational storage
Journal
16
Issue
ISSN
Citations 
4
1553-3077
0
PageRank 
References 
Authors
0.34
0
8
Name
Order
Citations
PageRank
Qing Zheng100.34
Charles D. Cranor200.34
Ankush Jain300.34
Gregory R. Ganger44560383.16
Garth A. Gibson584961.69
George Amvrosiadis611110.40
Bradley W. Settlemyer700.34
Gary Grider800.34