Pado: A Data Processing Engine for Harnessing Transient Resources in Datacenters. - Citegraph

Paper Info

Title
Pado: A Data Processing Engine for Harnessing Transient Resources in Datacenters.

Abstract
Datacenters are under-utilized, primarily due to unused resources on over-provisioned nodes of latency-critical jobs. Such idle resources can be used to run batch data analytic jobs to increase datacenter utilization, but these transient resources must be evicted whenever latency-critical jobs require them again. Resource evictions often lead to cascading recomputations, which is usually handled by checkpointing intermediate results on stable storages of eviction-free reserved resources. However, checkpointing has major shortcomings in its substantial overhead of transferring data back and forth. In this work, we step away from such approaches and focus on observing the job structure and the relationships between computations of the job. We carefully mark the computations that are most likely to cause a large number of recomputations upon evictions, to run them reliably using reserved resources. This lets us retain corresponding intermediate results effortlessly without any additional checkpointing. We design Pado, a general data processing engine, which carries out our idea with several optimizations that minimize the number of additional reserved nodes. Evaluation results show that Pado outperforms Spark 2.0.0 by up to 5.1×, and checkpoint-enabled Spark by up to 3.8×.

Year	DOI	Venue
2017	10.1145/3064176.3064181	EuroSys
Field	DocType	Citations
Data processing,Spark (mathematics),Idle,Computer science,Real-time computing,Operating system,Computation,Distributed computing	Conference	10
PageRank	References	Authors
0.62	17	8

Authors (8 rows)

Cited by (10 rows)

References (17 rows)

Name	Order	Citations	PageRank
Youngseok Yang	1	14	1.70
Geon-Woo Kim	2	10	0.62
Won Wook Song	3	11	0.98
Yunseong Lee	4	15	2.72
Andrew Chung	5	44	3.57
Zhengping Qian	6	350	17.04
Brian Cho	7	199	15.57
Byung-Gon Chun	8	3832	234.37

1