Title
Efficient evaluation of threshold queries of derived fields in a numerical simulation database.
Abstract
In this paper, we present a method for the ecient evaluation of threshold queries of derived fields for large numerical simulation datasets stored in a cluster of relational databases. The datasets produced by these simulations are in the TB and even PB ranges. Data-intensive computations that examine entire time-steps of the simulation data are impractical to perform locally by the user, taking days or months to iterate over the entire dataset. The integrated method for the evaluation of threshold queries that we have developed achieves scalability through data-parallel execution of the computations on the nodes of an analysis database cluster. We extend the scientific analysis environment with the introduction of an application-aware cache for query results, building on the concept of semantic caching. The cache has little overhead and improves query performance by over an order of magnitude for queries that hit the cache. Caching the results of threshold queries preserves both the I/O and computation e↵ort used to obtain them. In the case of computational turbulence, this allows scientists to quickly focus on the most intense events and interesting regions in any time-step or the dataset as a whole, which greatly speeds up the rate of scientific exploration and discovery.
Year
Venue
Field
2015
EDBT
Data mining,Relational database,Computer simulation,Computer science,Cache,Database,Computation,Scalability
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
27
3
Name
Order
Citations
PageRank
Kalin Kanov1113.06
Randal Burns21955115.15
Cristian Constantin Lalescu311.72