Title
A scalable framework for continuous query evaluations over multidimensional, scientific datasets
Abstract
Efficient access to voluminous multidimensional datasets is essential for scientific applications. Fast evolving datasets present unique challenges during retrievals. Keeping data up-to-date can be expensive and may involve the following: repeated data queries, excessive data movements, and redundant data preprocessing. This paper focuses on the issue of efficient manipulation of query results in cases where the dataset is continuously evolving. Our approach provides an automated and scalable tracking and caching mechanism to evaluate continuous queries over data stored in a distributed storage system. We have designed and developed a distributed updatable cache that ensures the query output to contain the most recent data arrivals. We have developed a dormant cache framework to address strains on caching capacity due to intensive memory requirements. The data to be stored in the dormant cache are selected using the cached continuous query scheduling algorithm that we have designed and developed. This approach is evaluated in the context of Galileo, our distributed data storage framework. This paper includes an empirical evaluation performed on Amazon Web Services' cluster and a private cluster. Our performance benchmarks demonstrate the efficacy of our approach. Copyright (c) 2015 John Wiley & Sons, Ltd.
Year
DOI
Venue
2016
10.1002/cpe.3651
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE
Keywords
Field
DocType
continuous query,Galileo,query caching,time series data
Query optimization,Data mining,Web search query,Computer science,Cache,Scheduling (computing),Parallel computing,Distributed data store,Web query classification,Data pre-processing,Distributed computing,Scalability
Journal
Volume
Issue
ISSN
28
SP8
1532-0626
Citations 
PageRank 
References 
1
0.36
8
Authors
3
Name
Order
Citations
PageRank
cameron tolooee110.69
Matthew Malensek29310.44
Sangmi Lee Pallickara317024.46