Title
Performance of Left Outer Join on Hadoop with Right Side within Single Node Memory Size
Abstract
In this paper we compare performance results of different implementations of join operation in Hadoop in a scenario where right side (of the join) is within single node memory size. We present results for several implementations both in pure Map Reduce and in Pig, both basing on HDFS. We also compare distributed performance of those implementations with a single node implementation in MySQL. Results show that Pig implementations do not match pure Map Reduce versions by a bigger margin than expected. Moreover, we notice that Map tasks seem to be the element that influences performance the most, especially for the potentially more efficient methods. Currently, we achieved the best performance using a singleton pattern join. However, there are reasons to believe that this performance can be still improved with better control of the amount of Map tasks.
Year
DOI
Venue
2012
10.1109/WAINA.2012.20
AINA Workshops
Keywords
Field
DocType
mapreduce,hdfs,pig implementation,single node memory size,map reduce,parallel programming,storage management,pig,right side,bigger margin,single node implementation,left outer join,best performance,semantic expansion,performance result,better control,pure map reduce,join,mysql,map task,hadoop,pure map reduce version,sql,bioinformatics,indexes,java,semantics,ontologies,indexation,computer architecture
SQL,Ontology (information science),Computer science,Parallel computing,Implementation,Storage management,Java,Singleton pattern,Semantics
Conference
ISBN
Citations 
PageRank 
978-1-4673-0867-0
4
0.52
References 
Authors
3
5
Name
Order
Citations
PageRank
Byambajargal Byambajav140.52
Tomasz Wiktor Wlodarczyk27710.73
Chunming Rong3870100.18
Paea LePendu429421.32
Nigam Shah521220.11