Title
Hippo: An enhancement of pipeline-aware in-memory caching for HDFS
Abstract
In the age of big data, distributed computing frameworks tend to coexist and collaborate in pipeline using one scheduler. While a variety of techniques for reducing I/O latency have been proposed, these are rarely specific for the whole pipeline performance. This paper proposes memory management logic called “Hippo” which targets distributed systems and in particular “pipelined” applications that might span differing big data frameworks. Though individual frameworks may have internal memory management primitives, Hippo proposes to make a generic framework that works agnostic of these highlevel operations. To increase the hit ratio of in-memory cache, this paper discusses the granularity of caching and how Hippo leverages the job dependency graph to make memory retention and pre-fetching decisions. Our evaluations demonstrate that job dependency is essential to improve the cache performance and a global cache policy maker, in most cases, significantly outperforms explicit caching by users.
Year
DOI
Venue
2014
10.1109/ICCCN.2014.6911847
ICCCN
Keywords
Field
DocType
distributed computing frameworks,hdfs,distributed systems,memory management logic,cache storage,in-memory cache,pipelined applications,pipeline-aware in-memory caching,pipeline performance,job dependency,job dependency graph,big data frameworks,internal memory management primitives,hippo,i/o latency reducing,big data,caching granularity,prefetching decisions,memory retention,pipeline processing
Interleaved memory,Shared memory,Cache,Computer science,Latency (engineering),Cache-only memory architecture,Computer network,Memory management,Dependency graph,Big data,Distributed computing
Conference
Citations 
PageRank 
References 
2
0.40
4
Authors
4
Name
Order
Citations
PageRank
Lan Wei120.40
Wenbo Lian220.40
Kuien Liu320.40
Yongji Wang460675.34