Title
Investigating Automatic Parameter Tuning For Sql-On-Hadoop Systems
Abstract
SQL-on-Hadoop engines such as Hive provide a declarative interface for processing large-scale data over computing frameworks such as Hadoop. The underlying frameworks contain a large number of configuration parameters that can significantly impact performance, but which are hard to tune. The problem of automatic parameter tuning has become a lively research area and several sophisticated tuning advisors have been proposed for Hadoop. In this paper, we conduct an experimental study to explore the impact of Hadoop parameter tuning on Hive. We reveal that the performance of Hive queries does not necessarily improve when using Hadoop-focused tuning advisors out-of-the-box, at least when following the current approach of applying the same tuning setup uniformly for evaluating the entire query. After extending the Hive query processing engine, we propose an alternative tuning approach and experimentally show how current Hadoop tuning advisors can now provide good and robust performance for Hive queries, as well as improved cluster resource utilization. We share our observations with the community and hope to create an awareness for this problem as well as to initiate new research on automatic parameter tuning for SQL-on-Hadoop systems. (C) 2021 Elsevier Inc. All rights reserved.
Year
DOI
Venue
2021
10.1016/j.bdr.2021.100204
BIG DATA RESEARCH
Keywords
DocType
Volume
SQL-on-Hadoop, Parameter tuning, Self-tuning
Journal
25
ISSN
Citations 
PageRank 
2214-5796
1
0.35
References 
Authors
0
4