Title
Comet: batched stream processing for data intensive distributed computing
Abstract
Performance and resource optimization is an important research problem in data intensive distributed comput- ing. We present a new batched stream processing model that captures query correlations to expose I/O and com- putation redundancies for optimizations. The model is inspired by our empirical study on a trace from a pro- duction large-scale data processing cluster, which reveals significant redundancies caused by strong temporal and spatial correlations among queries. We have developed Comet, a query processing system that embraces the batched stream processing model for optimizations. We have integrated Comet with DryadLINQ. With its roots in query optimizations for database systems, Comet enables a set of new heuristics and opportunities tailored for distributed computing in DryadLINQ. Optimizations in Comet are effective. The evaluation of a micro-benchmark on a 40-machine clus- ter shows a 42% reduction in total machine time and over 40% reduction in total I/O. Our simulation on a real trace covering over 19 million machine hours shows an esti- mated I/O saving of over 50%.
Year
DOI
Venue
2010
10.1145/1807128.1807139
SoCC
Keywords
Field
DocType
batch computation,traditional batch processing model,incrementally bulk-appended data stream,effective query optimizations,batched stream processing,o reduction,o saving,40-node cluster,large-scale production data-processing cluster,query processing system,database system,distributed computing,empirical study,query optimization,data processing,spatial correlation,batch process,resource management,resource manager,stream processing
Resource management,Data stream mining,Data processing,Computer science,Parallel computing,Real-time computing,Comet,Batch processing,Stream processing,Computation
Conference
Citations 
PageRank 
References 
72
5.27
28
Authors
7
Name
Order
Citations
PageRank
Bingsheng He12810179.09
Mao Yang249630.94
Zhenyu Guo351239.61
Rishan Chen432617.81
Bing Su5816.31
Wei Lin622924.46
Lidong Zhou72136147.82