Title
MillWheel: fault-tolerant stream processing at internet scale
Abstract
MillWheel is a framework for building low-latency data-processing applications that is widely used at Google. Users specify a directed computation graph and application code for individual nodes, and the system manages persistent state and the continuous flow of records, all within the envelope of the framework's fault-tolerance guarantees. This paper describes MillWheel's programming model as well as its implementation. The case study of a continuous anomaly detector in use at Google serves to motivate how many of MillWheel's features are used. MillWheel's programming model provides a notion of logical time, making it simple to write time-based aggregations. MillWheel was designed from the outset with fault tolerance and scalability in mind. In practice, we find that MillWheel's unique combination of scalability, fault tolerance, and a versatile programming model lends itself to a wide variety of problems at Google.
Year
DOI
Venue
2013
10.14778/2536222.2536229
PVLDB
Keywords
Field
DocType
application code,internet scale,fault-tolerance guarantee,computation graph,continuous flow,individual node,versatile programming model,fault-tolerant stream processing,programming model,case study,continuous anomaly detector,fault tolerance
Data mining,Graph,Programming paradigm,Computer science,Fault tolerance,Stream processing,Detector,Database,Computation,The Internet,Scalability,Distributed computing
Journal
Volume
Issue
ISSN
6
11
2150-8097
Citations 
PageRank 
References 
186
6.19
25
Authors
10
Search Limit
100186
Name
Order
Citations
PageRank
Tyler Akidau128110.63
Alex Balikov21866.19
Kaya Bekiroğlu31866.53
Slava Chernyak42809.90
Josh Haberman51866.19
Reuven Lax62809.57
Sam McVeety72809.57
Daniel Mills866128.07
Paul Nordstrom91866.53
Sam Whittle102829.94