Title
Flint: batch-interactive data-intensive processing on transient servers.
Abstract
Cloud providers now offer transient servers, which they may revoke at anytime, for significantly lower prices than on-demand servers, which they cannot revoke. The low price of transient servers is particularly attractive for executing an emerging class of workload, which we call Batch-Interactive Data-Intensive (BIDI), that is becoming increasingly important for data analytics. BIDI workloads require large sets of servers to cache massive datasets in memory to enable low latency operation. In this paper, we illustrate the challenges of executing BIDI workloads on transient servers, where revocations (akin to failures) are the common case. To address these challenges, we design Flint, which is based on Spark and includes automated checkpointing and server selection policies that i) support batch and interactive applications and ii) dynamically adapt to application characteristics. We evaluate a prototype of Flint using EC2 spot instances, and show that it yields cost savings of up to 90% compared to using on-demand servers, while increasing running time by < 2%.
Year
DOI
Venue
2016
10.1145/2901318.2901319
EuroSys
Field
DocType
Citations 
Spark (mathematics),Virtual machine,Workload,Computer science,Cache,Server,Real-time computing,Latency (engineering),Multi-core processor,Operating system,Cloud computing,Distributed computing
Conference
24
PageRank 
References 
Authors
0.86
27
5
Name
Order
Citations
PageRank
Prateek Sharma120114.12
Tian Guo26612.57
Xin He3462.25
David E. Irwin489998.12
Prashant J. Shenoy56386521.30