Title
Efficient parallel processing of range queries through replicated declustering
Abstract
A common technique used to minimize I/O in data intensive applications is data declustering over parallel servers. This technique involves distributing data among several disks so as to parallelize query retrieval and thus, improve performance. We focus on optimizing access to large spatial data, and the most common type of queries on such data, i.e., range queries. An optimal declustering scheme is one in which the processing for all range queries is balanced uniformly among the available disks. It has been shown that single copy based declustering schemes are non-optimal for range queries. In this paper, we integrate replication in conjunction with parallel disk declustering for efficient processing of range queries. We note that replication is largely used in database applications for several purposes like load balancing, fault tolerance and availability of data. We propose theoretical foundations for replicated declustering and propose a class of replicated declustering schemes, periodic allocations, which are shown to be strictly optimal for a number of disks. We propose a framework for replicated declustering, using a limited amount of replication and provide extensions to apply it on real data, which include arbitrary grids and a large number of disks. Our framework also provides an effective indexing scheme that enables fast identification of data of interest in parallel servers. In addition to optimal processing of single queries, we show that this framework is effective for parallel processing of multiple queries. We present experimental results comparing the proposed replication scheme to other techniques for both single queries and multiple queries, on synthetic and real data sets.
Year
DOI
Venue
2006
10.1007/s10619-006-9362-5
Distributed and Parallel Databases
Keywords
Field
DocType
Declustering,Replication,Parallel access,Range queries,Periodic allocation,Optimal parallel processing,Replicated declustering
Spatial analysis,Data set,Load balancing (computing),Computer science,Parallel computing,Range query (data structures),Server,Search engine indexing,Fault tolerance,Periodic graph (geometry),Distributed computing
Journal
Volume
Issue
ISSN
20
2
0926-8782
Citations 
PageRank 
References 
14
0.61
44
Authors
4
Name
Order
Citations
PageRank
Hakan Ferhatosmanoglu1135289.79
Ali Şaman Tosun219010.03
Guadalupe Canahuate39611.31
Aravind Ramachandran4261.44