Abstract | ||
---|---|---|
Hadoop has shown great power in processing vast data in parallel. Hive, the database on Hadoop, enables more experts to process relational data by providing sql-like interface. However, Hive does not provide an efficient approach for join, a common but expensive operator in relational database. Due to the importance of join, this paper proposes a novel hybrid algorithm, HJA, which can help to automatically choose the relatively better one among several methods, divide and memory copy merge, Partition Join(PJ) and na茂ve Hive join. Experiments show that HJA can get best performance in most situations. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1109/SKG.2011.13 | SKG |
Keywords | Field | DocType |
efficient approach,partition join,relational data,hybrid join algorithm,memory copy,relational database,map reduce,novel hybrid algorithm,great power,expensive operator,best performance,vast data,sql,hybrid algorithm,semantics,relational databases,parallel processing | SQL,Hash join,Data mining,Hybrid algorithm,Recursive join,Relational database,Computer science,Nested set model,Sort-merge join,Theoretical computer science,Block nested loop,Database | Conference |
Citations | PageRank | References |
1 | 0.35 | 3 |
Authors | ||
7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Weisong Hu | 1 | 62 | 5.76 |
Lili Ma | 2 | 32 | 5.37 |
Xiaowei Liu | 3 | 23 | 5.75 |
Hongwei Qi | 4 | 20 | 4.00 |
Li Zha | 5 | 1 | 0.35 |
Huaming Liao | 6 | 57 | 5.09 |
Yuezhuo Zhang | 7 | 12 | 1.59 |