Title
Providing streaming joins as a service at Facebook
Abstract
AbstractStream processing applications reduce the latency of batch data pipelines and enable engineers to quickly identify production issues. Many times, a service can log data to distinct streams, even if they relate to the same real-world event (e.g., a search on Facebook's search bar). Furthermore, the logging of related events can appear on the server side with different delay, causing one stream to be significantly behind the other in terms of logged event times for a given log entry. To be able to stitch this information together with low latency, we need to be able to join two different streams where each stream may have its own characteristics regarding the degree in which its data is out-of-order. Doing so in a streaming fashion is challenging as a join operator consumes lots of memory, especially with significant data volumes. This paper describes an end-to-end streaming join service that addresses the challenges above through a streaming join operator that uses an adaptive stream synchronization algorithm that is able to handle the different distributions we observe in real-world streams regarding their event times. This synchronization scheme paces the parsing of new data and reduces overall operator memory footprint while still providing high accuracy. We have integrated this into a streaming SQL system and have successfully reduced the latency of several batch pipelines using this approach.
Year
DOI
Venue
2018
10.14778/3229863.3229869
Hosted Content
Field
DocType
Volume
Joins,Computer science,Database
Journal
11
Issue
ISSN
Citations 
12
2150-8097
4
PageRank 
References 
Authors
0.40
0
14
Name
Order
Citations
PageRank
Gabriela Jacques-Silva117111.81
Ran Lei2211.46
Luwei Cheng3778.38
Guoqiang Jerry Chen4452.72
Kuen Ching540.40
Tanji Hu640.74
Yuan Mei740.40
Kevin Wilfong8211.46
Rithin Shetty940.74
Serhat Yilmaz10221.48
Anirban Banerjee117511.29
Benjamin Heintz1240.74
Shridar Iyer1340.40
Anshul Jaiswal14211.46