Title
Lustre, hadoop, accumulo
Abstract
Data processing systems impose multiple views on data as it is processed by the system. These views include spreadsheets, databases, matrices, and graphs. There are a wide variety of technologies that can be used to store and process data through these different steps. The Lustre parallel file system, the Hadoop distributed file system, and the Accumulo database are all designed to address the largest and the most challenging data storage problems. There have been many ad-hoc comparisons of these technologies. This paper describes the foundational principles of each technology, provides simple models for assessing their capabilities, and compares the various technologies on a hypothetical common cluster. These comparisons indicate that Lustre provides 2x more storage capacity, is less likely to loose data during 3 simultaneous drive failures, and provides higher bandwidth on general purpose workloads. Hadoop can provide 4x greater read bandwidth on special purpose workloads. Accumulo provides 10 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">5</sup> lower latency on random lookups than either Lustre or Hadoop but Accumulo's bulk bandwidth is 10x less. Significant recent work has been done to enable mix-and-match solutions that allow Lustre, Hadoop, and Accumulo to be combined in different ways.
Year
DOI
Venue
2015
10.1109/HPEC.2015.7322476
2015 IEEE High Performance Extreme Computing Conference (HPEC)
Keywords
Field
DocType
Insider,Lustre,Hadoop,Accumulo,Big Data,Parallel Performance
Distributed File System,Metadata,File system,Computer science,Computer data storage,Server,Data processing system,Distributed database,Lustre (mineralogy),Operating system,Distributed computing
Journal
Volume
ISSN
Citations 
abs/1507.02357
2377-6943
3
PageRank 
References 
Authors
0.47
15
14
Name
Order
Citations
PageRank
Jeremy Kepner160661.58
William Arcand217517.77
David Bestor318119.08
Bill Bergeron416816.57
Chansup Byun518019.21
Lauren Edwards6372.62
Vijay Gadepally744950.53
Matthew Hubbell819220.93
Peter Michaleas920120.93
Julie Mullen1013815.22
Andrew Prout1118218.78
Antonio Rosa1217017.67
Charles Yee1314715.14
Albert Reuther1433537.32