Title
Active semantic caching to optimize multidimensional data analysis in parallel and distributed environments
Abstract
In this paper, we present a multi-query optimization framework based on the concept of active semantic caching. The framework permits the identification and transparent reuse of data and computation in the presence of multiple queries (or query batches) that specify user-defined operators and aggregations originating from scientific data-analysis applications. We show how query scheduling techniques, coupled with intelligent cache replacement policies, can further improve the performance of query processing by leveraging the active semantic caching operators. We also propose a methodology for functionally decomposing complex queries in terms of primitives so that multiple reuse sites are exposed to the query optimizer, to increase the amount of reuse. The optimization framework and the database system implemented with it are designed to be efficient irrespective of the underlying parallel and/or distributed machine configuration. We present experimental results highlighting the performance improvements obtained by our methods using real scientific data-analysis applications on multiple parallel and distributed processing configurations (e.g., single symmetric multiprocessor (SMP) machine, cluster of SMP nodes, and a Grid computing configuration).
Year
DOI
Venue
2007
10.1016/j.parco.2007.03.001
Parallel Computing
Keywords
Field
DocType
functionally decomposing complex query,multiple query,scientific data analysis,query optimization,multi-query optimization framework,multiple reuse site,multidimensional data analysis,active semantic,query scheduling technique,active semantic caching,optimization framework,query batch,multiple parallel,parallel databases,query processing,query optimizer,grid computing,scientific data,distributed environment,database system
Query optimization,Grid computing,Computer science,Scheduling (computing),Cache,Reuse,Multidimensional analysis,Parallel computing,Multiprocessing,Theoretical computer science,Distributed computing,Computation
Journal
Volume
Issue
ISSN
33
7-8
Parallel Computing
Citations 
PageRank 
References 
5
0.42
40
Authors
4
Name
Order
Citations
PageRank
Henrique Andrade1181.08
Tahsin M. Kurç21423149.77
Alan Sussman31211174.52
Joel Saltz481.62