Title
Realtime Data Processing at Facebook.
Abstract
Realtime data processing powers many use cases at Facebook, including realtime reporting of the aggregated, anonymized voice of Facebook users, analytics for mobile applications, and insights for Facebook page administrators. Many companies have developed their own systems; we have a realtime data processing ecosystem at Facebook that handles hundreds of Gigabytes per second across hundreds of data pipelines. Many decisions must be made while designing a realtime stream processing system. In this paper, we identify five important design decisions that affect their ease of use, performance, fault tolerance, scalability, and correctness. We compare the alternative choices for each decision and contrast what we built at Facebook to other published systems. Our main decision was targeting seconds of latency, not milliseconds. Seconds is fast enough for all of the use cases we support and it allows us to use a persistent message bus for data transport. This data transport mechanism then paved the way for fault tolerance, scalability, and multiple options for correctness in our stream processing systems Puma, Swift, and Stylus. We then illustrate how our decisions and systems satisfy our requirements for multiple use cases at Facebook. Finally, we reflect on the lessons we learned as we built and operated these systems.
Year
DOI
Venue
2016
10.1145/2882903.2904441
SIGMOD/PODS'16: International Conference on Management of Data San Francisco California USA June, 2016
Field
DocType
ISBN
Data mining,Use case,Computer science,Correctness,Gigabyte,Usability,Fault tolerance,Stream processing,Analytics,Database,Scalability
Conference
978-1-4503-3531-7
Citations 
PageRank 
References 
17
0.72
17
Authors
10
Name
Order
Citations
PageRank
Guoqiang Jerry Chen1452.72
Janet L. Wiener23813600.46
Shridhar Iyer3170.72
Anshul Jaiswal4211.46
Ran Lei5211.46
Nikhil Simha6170.72
Wei Wang7170.72
Kevin Wilfong8211.46
Tim Williamson9181.07
Serhat Yilmaz10221.48