Title
Optimizing the Join Operation on Hive to Accelerate Cross-Matching in Astronomy
Abstract
Cross-matching in astronomy is a basic procedure for comprehensibly analyzing the relations among different celestial objects. The aim is to search celestial objects in different catalogs and to determine if they are the same object. Basically, cross-matching can be expressed as a join query statement. Since celestial catalogs usually contain billion of stars, the join operator must be carefully designed and optimized for efficiency. In this paper, we focus on fulfilling cross-matching by MapReduce based join operators. The challenge is how to optimize the join operators to satisfy specific requirements of cross-matching. Therefore, we propose an optimized method and investigate its efficiency by theoretical analysis and experiment. Our study shows that the method has a remarkable improvement to previous work, especially when the data is very large.
Year
DOI
Venue
2014
10.1109/IPDPSW.2014.193
IPDPS Workshops
Keywords
Field
DocType
optimisation,string matching,join,cross-matching, astronomy, join, mapreduce,astronomy computing,mapreduce,cross-matching,astronomy,celestial object relations,join query statement,astronomy cross-matching,join operation optimization,query processing,distributed processing
Hash join,Astronomy,Recursive join,Computer science,Parallel computing,Theoretical computer science,Sort-merge join,Operator (computer programming),Distributed computing
Conference
Citations 
PageRank 
References 
2
0.43
10
Authors
6
Name
Order
Citations
PageRank
Liang Li120.77
Dixin Tang220.77
Taoying Liu3227.18
Hong Liu431.45
Wei Li588393.88
Chenzhou Cui6155.24