Title
Analysis of HDFS under HBase: a facebook messages case study
Abstract
We present a multilayer study of the Facebook Messages stack, which is based on HBase and HDFS. We collect and analyze HDFS traces to identify potential improvements, which we then evaluate via simulation. Messages represents a new HDFS workload: whereas HDFS was built to store very large files and receive mostly-sequential I/O, 90% of files are smaller than 15MB and I/O is highly random. We find hot data is too large to easily fit in RAM and cold data is too large to easily fit in flash; however, cost simulations show that adding a small flash tier improves performance more than equivalent spending on RAM or disks. HBase's layered design offers simplicity, but at the cost of performance; our simulations show that network I/O can be halved if compaction bypasses the replication layer. Finally, although Messages is read-dominated, several features of the stack (i.e., logging, compaction, replication, and caching) amplify write I/O, causing writes to dominate disk I/O.
Year
Venue
Keywords
2014
FAST
new hdfs workload,compaction bypass,hot data,small flash tier,cost simulation,cold data,large file,replication layer,facebook messages,facebook messages case study,equivalent spending
Field
DocType
Volume
Workload,Computer science,Real-time computing,Operating system
Conference
39
Issue
Citations 
PageRank 
3
50
1.83
References 
Authors
16
7
Name
Order
Citations
PageRank
Tyler Harter122512.32
Dhruba Borthakur2202280.76
Siying Dong3512.52
Amitanand S. Aiyer445719.60
Liyin Tang5732.73
Andrea C. Arpaci-Dusseau63133307.84
Remzi H. Arpaci-Dusseau73120383.86