Title
Autopipelining for Data Stream Processing
Abstract
Stream processing applications use online analytics to ingest high-rate data sources, process them on-the-fly, and generate live results in a timely manner. The data flow graph representation of these applications facilitates the specification of stream computing tasks with ease, and also lends itself to possible runtime exploitation of parallelization on multicore processors. While the data flow graphs naturally contain a rich set of parallelization opportunities, exploiting them is challenging due to the combinatorial number of possible configurations. Furthermore, the best configuration is dynamic in nature; it can differ across multiple runs of the application, and even during different phases of the same run. In this paper, we propose an autopipelining solution that can take advantage of multicore processors to improve throughput of streaming applications, in an effective and transparent way. The solution is effective in the sense that it provides good utilization of resources by dynamically finding and exploiting sources of pipeline parallelism in streaming applications. It is transparent in the sense that it does not require any hints from the application developers. As a part of our solution, we describe a light-weight runtime profiling scheme to learn resource usage of operators comprising the application, an optimization algorithm to locate best places in the data flow graph to explore additional parallelism, and an adaptive control scheme to find the right level of parallelism. We have implemented our solution in an industrial-strength stream processing system. Our experimental evaluation based on microbenchmarks, synthetic workloads, as well as real-world applications confirms that our design is effective in optimizing the throughput of stream processing applications without requiring any changes to the application code.
Year
DOI
Venue
2013
10.1109/TPDS.2012.333
IEEE Trans. Parallel Distrib. Syst.
Keywords
Field
DocType
stream processing application,optimisation,data flow graph representation,parallel processing,stream computing task specification,data flow graphs,resources utilization,synthetic workloads,manufacturing data processing,parallelization runtime exploitation,microbenchmarks,application developer,multicore processor,resource allocation,application code,optimization algorithm,data flow graph,multiprocessing systems,adaptive control,data stream processing application,online analytics,adaptive control scheme,autopipelining,high-rate data sources,industrial-strength stream processing system,data stream processing,high-rate data source,autopipelining solution,stream processing,pipeline parallelism,real-world application,multicore processors,formal specification,runtime profiling scheme,pipeline processing,parallelization,instruction sets,multicore processing,throughput
Profiling (computer programming),Computer science,Instruction set,Parallel computing,Stream,Data-flow analysis,Real-time computing,Data parallelism,Stream processing,Multi-core processor,Data flow diagram,Distributed computing
Journal
Volume
Issue
ISSN
24
12
1045-9219
Citations 
PageRank 
References 
3
0.38
0
Authors
2
Name
Order
Citations
PageRank
Yuzhe Tang114721.06
Bugra Gedik22397108.79