Title
The Read Amplification Analysis of NoSQL Database on Top of OSDs: A Case Study of HBase
Abstract
The NoSQL database has showed a great improvement for large-scale datasets storage and compute. As for the conventional deployment architecture, they can obtain tremendously good performance as storage and compute are running in the same node. Nevertheless, many NoSQL users have their own storage pool (e.g., OSD server pool) which provides different interfaces for different NoSQL databases. Many benefits are achieved from this new application scenarios, such as higher scalability, better flexibility and data maintainability. However, the physical separation of NoSQL application and storage nodes potentially influences the system performance. To better understand the new scenario, we take HBase, a common NoSQL database, as a study. HBase exploit a layered storage architecture with two main layers, a distributed database atop a distributed file system (HDFS). We perform experiments with YCSB benchmark and a new developed benchmark to evaluate the separation deployment of RegionServer and HDFS server and verify the read amplification of network I/O between them. Based on our observations we propose a novel OSD Agent direction consisting of HBase Local Storage Scanner and HBase Local Storage Compactor to reduce the amplification by filtering out the requested data from the data block on OSD server pool. From the simulation results, the OSD Agent can reduce the network traffic amplification from 36x-92x to 1.7x for HBase read operation, 1.2x-21x to 1.2x for HBase scan with column filter operation. Moreover, the OSD Agent makes the read performance less sensitive to the network bandwidth. The OSD Agent brings in great performance improvement for NoSQL Database on top of OSDs.
Year
DOI
Venue
2018
10.1109/BIGCOM.2018.00040
2018 4th International Conference on Big Data Computing and Communications (BIGCOM)
Keywords
Field
DocType
NoSQL,OSD,HBase,Read Amplification
Distributed File System,Computer science,Server,Block (data storage),NoSQL,Distributed database,Benchmark (computing),Database,Performance improvement,Scalability
Conference
ISBN
Citations 
PageRank 
978-1-5386-8022-3
0
0.34
References 
Authors
0
7
Name
Order
Citations
PageRank
Shiyong Liu142.44
Zhongwen Guo229933.99
Chen Liu33426.22
Xupeng Wang499.33
Guohua Wang501.69
Zhijin Qiu601.01
Xukun Qin751.25