Title
Reducing replication bandwidth for distributed document databases
Abstract
With the rise of large-scale, Web-based applications, users are increasingly adopting a new class of document-oriented database management systems (DBMSs) that allow for rapid prototyping while also achieving scalable performance. Like for other distributed storage systems, replication is important for document DBMSs in order to guarantee availability. The network bandwidth required to keep replicas synchronized is expensive and is often a performance bottleneck. As such, there is a strong need to reduce the replication bandwidth, especially for geo-replication scenarios where wide-area network (WAN) bandwidth is limited. This paper presents a deduplication system called sDedup that reduces the amount of data transferred over the network for replicated document DBMSs. sDedup uses similarity-based deduplication to remove redundancy in replication data by delta encoding against similar documents selected from the entire database. It exploits key characteristics of document-oriented workloads, including small item sizes, temporal locality, and the incremental nature of document edits. Our experimental evaluation of sDedup with three real-world datasets shows that it is able to achieve up to 38X reduction in data sent over the network, significantly outperforming traditional chunk-based deduplication techniques while incurring negligible performance overhead.
Year
DOI
Venue
2015
10.1145/2806777.2806840
IEEE International System-on-Chip (SoC) Conference
Field
DocType
Citations 
Data deduplication,Bottleneck,Locality of reference,Computer science,Distributed data store,Computer network,Redundancy (engineering),Bandwidth (signal processing),Delta encoding,Database,Distributed computing,Scalability
Conference
5
PageRank 
References 
Authors
0.47
29
5
Name
Order
Citations
PageRank
Lianghong Xu11145.76
Andrew Pavlo21614122.03
Sudipta Sengupta32603180.16
Jin Li470.94
Gregory R. Ganger54560383.16