Don't Get Caught in the Cold, Warm-up Your JVM: Understand and Eliminate JVM Warm-up Overhead in Data-Parallel Systems. - Citegraph

Paper Info

Title
Don't Get Caught in the Cold, Warm-up Your JVM: Understand and Eliminate JVM Warm-up Overhead in Data-Parallel Systems.

Abstract
Many widely used, latency sensitive, data-parallel distributed systems, such as HDFS, Hive, and Spark choose to use the Java Virtual Machine (JVM), despite debate on the overhead of doing so. This paper analyzes the extent and causes of the JVM performance overhead in the above mentioned systems. Surprisingly, we find that the warm-up overhead, i.e., class loading and interpretation of bytecode, is frequently the bottleneck. For example, even an I/O intensive, 1GB read on HDFS spends 33% of its execution time in JVM warm-up, and Spark queries spend an average of 21 seconds in warm-up. The findings on JVM warm-up overhead reveal a contradiction between the principle of parallelization, i.e., speeding up long running jobs by parallelizing them into short tasks, and amortizing JVM warm-up overhead through long tasks. We solve this problem by designing HotTub, a new JVM that amortizes the warm-up overhead over the lifetime of a cluster node instead of over a single job by reusing a pool of already warm JVMs across multiple applications. The speed-up is significant. For example, using HotTub results in up to 1.8X speedups for Spark queries, despite not adhering to the JVM specification in edge cases.

Year	Venue	Field
2016	OSDI	Bottleneck,Spark (mathematics),Reuse,Latency (engineering),Computer science,Parallel computing,Real-time computing,Execution time,Big data,Bytecode,Operating system,Java virtual machine
DocType	Citations	PageRank
Conference	7	0.43
References	Authors
29	6

Authors (6 rows)

Cited by (7 rows)

References (29 rows)

Name	Order	Citations	PageRank
David Lion	1	13	2.22
Adrian Chiu	2	7	0.77
Hailong Sun	3	680	64.83
Xin Zhuang	4	20	2.09
Nikola Grcevski	5	38	3.75
Ding Yuan	6	526	20.80

1