Major technical advancements in apache hive - Citegraph

Paper Info

Title
Major technical advancements in apache hive

Abstract
Apache Hive is a widely used data warehouse system for Apache Hadoop, and has been adopted by many organizations for various big data analytics applications. Closely working with many users and organizations, we have identified several shortcomings of Hive in its file formats, query planning, and query execution, which are key factors determining the performance of Hive. In order to make Hive continuously satisfy the requests and requirements of processing increasingly high volumes data in a scalable and efficient way, we have set two goals related to storage and runtime performance in our efforts on advancing Hive. First, we aim to maximize the effective storage capacity and to accelerate data accesses to the data warehouse by updating the existing file formats. Second, we aim to significantly improve cluster resource utilization and runtime performance of Hive by developing a highly optimized query planner and a highly efficient query execution engine. In this paper, we present a community-based effort on technical advancements in Hive. Our performance evaluation shows that these advancements provide significant improvements on storage efficiency and query execution performance. This paper also shows how academic research lays a foundation for Hive to improve its daily operations.

Year	DOI	Venue
2014	10.1145/2588555.2595630	SIGMOD Conference
Keywords	Field	DocType
systems,mapreduce,data warehouse,hadoop,hive,databases	File format,Data warehouse,Data mining,Computer science,Planner,Storage efficiency,Big data,Database,Scalability	Conference
Citations	PageRank	References
29	0.96	31
Authors
10

Authors (10 rows)

Cited by (29 rows)

References (31 rows)

Name	Order	Citations	PageRank
Yin Huai	1	579	21.77
Ashutosh Chauhan	2	30	1.32
Alan F. Gates	3	41	2.53
Günther Hagleitner	4	30	1.32
Eric N. Hanson	5	917	376.11
Owen O'Malley	6	34	1.37
Jitendra Pandey	7	29	0.96
Yuan Yuan	8	96	3.82
Rubao Lee	9	872	41.41
Xiaodong Zhang	10	5378	355.72

1