Title
On availability of intermediate data in cloud computations
Abstract
This paper takes a renewed look at the problem of managing intermediate data that is generated during dataflow computations (e.g., MapReduce, Pig, Dryad, etc.) within clouds. We discuss salient features of this intermediate data and outline requirements for a solution. Our experiments show that existing local write-remote read solutions, traditional distributed file systems (e.g., HDFS), and support from transport protocols (e.g., TCP-Nice) cannot guarantee both data availability and minimal interference, which are our key requirements. We present design ideas for a new intermediate data storage system.
Year
Venue
Keywords
2009
HotOS
file system,minimal interference,new intermediate data storage,local write-remote,data availability,key requirement,renewed look,dataflow computation,cloud computation,present design idea,intermediate data,transport protocol,cloud computing,data storage
Field
DocType
Citations 
Data availability,Computer data storage,Computer science,Dataflow,Interference (wave propagation),Cloud computing,Computation,Salient,Distributed computing
Conference
28
PageRank 
References 
Authors
4.91
7
4
Name
Order
Citations
PageRank
Steven Y. Ko147145.08
Imranul Hoque213410.20
Brian Cho319915.57
Indranil Gupta41837143.92