Title
CoS-HDFS: co-locating geo-distributed spatial data in hadoop distributed file system.
Abstract
Given the recent advancement in the ubiquitous positioning technologies, it is now common to query terabytes of spatial data. These massive data are usually geo-distributed across multiple data centers to ensure their availability. Yet, at least one replica of the data is stored close to where the data are generated. Spatial queries are complex and computationally intensive, and therefore, distributed computation platforms, such as Hadoop are now used to improve their execution time. However, Hadoop is agnostic to the spatial data characteristics, and it randomly partitions and locates the data stored in its distributed file system which degrades the performance of the execution of spatial queries. In this paper, we propose CoS-HDFS, an extension to the Hadoop Distributed File System (HDFS) that takes into account the spatial characteristics of the data and accordingly co-locates them on the HDFS nodes that span multiple data centers. We integrate CoS-HDFS with SpatialHadoop, a MapReduce framework that natively supports spatial data, to make use of its implementation of spatial indexes, operations, and query interfaces. We experimentally demonstrate significant reduction in the network usage and total execution time in the case of spatial join queries on the TIGER dataset.
Year
DOI
Venue
2016
10.1145/3006299.3006314
Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies
Keywords
Field
DocType
HDFS, Spatial Data, Co-location, Geo-distribution
Distributed File System,Spatial analysis,Data mining,Replica,Computer science,Terabyte,Spatial query,Distributed database,Database,Spatial database,Computation
Conference
ISBN
Citations 
PageRank 
978-1-5090-4468-9
1
0.36
References 
Authors
15
3
Name
Order
Citations
PageRank
Mariam Malak Fahmy110.36
Iman Elghandour2564.72
Magdy Nagi393.95