Index-based join operations in Hive - Citegraph

Paper Info

Title
Index-based join operations in Hive

Abstract
Indexing techniques are crucial for efficiency and scalability of processing queries over big data. Hive is a batch-oriented big data management engine that is well suited for data OLAP and data analysis applications. For very “selective” queries whose output sizes are a small fraction of the contributing data, the brute-force approach suffers from poor performance due to redundant disk I/O's or initiations of extra map operations. We make a first attempt and propose an index-based join technique to speed up the process and integrate it in Hive by mapping our design to the conceptual optimization flow. To evaluate the performance, we create and evaluate test queries on datasets generated using TPC-H benchmark. Our results indicate significant performance gain over relatively large data and/or highly selective queries having a two-way join and a single join condition.

Year	DOI	Venue
2013	10.1109/BigData.2013.6691768	BigData Conference
Keywords	Field	DocType
join operation,hive,data analysis applications,selective queries,tpc-h benchmark,indexing,batch oriented big data management engine,index based join operations,data mining,data olap applications,hadoop,indexing techniques,query processing,map and reduce functions	Data mining,Computer science,Search engine indexing,Big data management,Online analytical processing,Big data,Database,Scalability,Speedup	Conference
ISSN	Citations	PageRank
2639-1589	4	0.44
References	Authors
11	3

Authors (3 rows)

Cited by (4 rows)

References (11 rows)

Name	Order	Citations	PageRank
Mahsa Mofidpoor	1	4	0.44
Nematollaah Shiri	2	280	28.31
Thiruvengadam Radhakrishnan	3	117	32.44

1