Title
Adaptive Caching in Big SQL using the HDFS Cache.
Abstract
The memory and storage hierarchy in database systems is currently undergoing a radical evolution in the context of Big Data systems. SQL-on-Hadoop systems share data with other applications in the Big Data ecosystem by storing their data in HDFS, using open file formats. However, they do not provide automatic caching mechanisms for storing data in memory. In this paper, we describe the architecture of IBM Big SQL and its use of the HDFS cache as an alternative to the traditional buffer pool, allowing in-memory data to be shared with other Big Data applications. We design novel adaptive caching algorithms for Big SQL tailored to the challenges of such an external cache scenario. Our experimental evaluation shows that only our adaptive algorithms perform well for diverse workload characteristics, and are able to adapt to evolving data access patterns. Finally, we discuss our experiences in addressing the new challenges imposed by external caching and summarize our insights about how to direct ongoing architectural evolution of external caching mechanisms.
Year
DOI
Venue
2016
10.1145/2987550.2987553
SoCC
Keywords
Field
DocType
SQL-on-Hadoop, HDFS Caching
SQL,File format,IBM,Cache,Computer science,Cache algorithms,Smart Cache,Big data,Data access,Database
Conference
Citations 
PageRank 
References 
12
0.77
17
Authors
6
Name
Order
Citations
PageRank
Avrilia Floratou121316.29
Nimrod Megiddo24244668.46
Navneet Potti3384.23
F. Ozcan4996.71
Uday Kale5120.77
Jan Schmitz-Hermes6120.77