Title
CoScan: cooperative scan sharing in the cloud
Abstract
We present CoScan, a scheduling framework that eliminates redundant processing in workflows that scan large batches of data in a map-reduce computing environment. CoScan merges Pig programs from multiple users at runtime to reduce I/O contention while adhering to soft deadline requirements in scheduling. This includes support for join workflows that operate on multiple data sources. Our solution maps well to workflows at many Internet companies which reuse data from a common set of inputs. Experiments on the PigMix data analytics benchmark exhibit orders of magnitude reduction in resource contention with minimal impact on latency.
Year
DOI
Venue
2011
10.1145/2038916.2038927
SoCC
Keywords
Field
DocType
resource contention,internet company,multiple user,pigmix data,scheduling framework,analytics benchmark exhibit order,coscan merges pig program,reuse data,o contention,multiple data source,cloud computing,program analysis,data confidentiality
Data analysis,Reuse,Scheduling (computing),Computer science,Latency (engineering),Computer network,Real-time computing,Program analysis,Workflow,The Internet,Cloud computing,Distributed computing
Conference
Citations 
PageRank 
References 
26
1.12
25
Authors
4
Name
Order
Citations
PageRank
Xiaodan Wang132632.31
Chris Olston23576316.59
Anish Das Sarma32028104.57
Randal Burns41955115.15