Title
Thrill: High-performance algorithmic distributed batch data processing with C++
Abstract
We present the design and a first performance evaluation of Thrill - a prototype of a general purpose big data processing framework with a convenient data-flow style programming interface. Thrill is somewhat similar to Apache Spark and Apache Flink with at least two main differences. First, Thrill is based on C++ which enables performance advantages due to direct native code compilation, a more cache-friendly memory layout, and explicit memory management. In particular, Thrill uses template meta-programming to compile chains of subsequent local operations into a single binary routine without intermediate buffering and with minimal indirections. Second, Thrill uses arrays rather than multisets as its primary data structure which enables additional operations like sorting, prefix sums, window scans, or combining corresponding fields of several arrays (zipping). We compare Thrill with Apache Spark and Apache Flink using five kernels from the HiBench suite. Thrill is consistently faster and often several times faster than the other frameworks. At the same time, the source codes have a similar level of simplicity and abstraction.
Year
DOI
Venue
2016
10.1109/BigData.2016.7840603
2016 IEEE International Conference on Big Data (Big Data)
Keywords
DocType
Volume
C++,big data tool,distributed data processing
Conference
abs/1608.05634
ISBN
Citations 
PageRank 
978-1-4673-9006-4
10
0.88
References 
Authors
12
10
Name
Order
Citations
PageRank
Timo Bingmann1245.83
Michael Axtmann2223.72
Emanuel Jöbstl3101.22
sebastian lamm4344.10
Huyen Chau Nguyen5100.88
Alexander Noe6173.35
Sebastian Schlag7385.18
Matthias Stumpp8100.88
Tobias Sturm9141.39
peter sanders1036129.35