Title
LifeRaft: Data-Driven, Batch Processing for the Exploration of Scientific Databases
Abstract
Workloads that comb through vast amounts of data are gaining importance in the sciences. These workloads consist of "needle in a haystack" queries that are long running and data intensive so that query throughput limits performance. To maximize throughput for data-intensive queries, we put forth LifeRaft: a query processing system that batches queries with overlapping data requirements. Rather than scheduling queries in arrival order, LifeRaft executes queries concurrently against an ordering of the data that maximizes data sharing among queries. This decreases I/O and increases cache utility. However, such batch processing can increase query response time by starving interactive workloads. LifeRaft addresses starvation using techniques inspired by head scheduling in disk drives. Depending upon the workload saturation and queuing times, the system adaptively and incrementally trades-off processing queries in arrival order and data-driven batch processing. Evaluating LifeRaft in the SkyQuery federation of astronomy databases reveals a two-fold improvement in query throughput.
Year
Venue
Keywords
2009
Clinical Orthopaedics and Related Research
batch process
DocType
Volume
Citations 
Conference
abs/0909.1760
8
PageRank 
References 
Authors
0.69
22
3
Name
Order
Citations
PageRank
Xiaodan Wang132632.31
Randal Burns21955115.15
Tanu Malik330435.97