Title
Choosing A Cloud DBMS: Architectures and Tradeoffs.
Abstract
As analytic (OLAP) applications move to the cloud, DBMSs have shifted from employing a pure shared-nothing design with locally attached storage to a hybrid design that combines the use of shared-storage (e.g., AWS S3) with the use of shared-nothing query execution mechanisms. This paper sheds light on the resulting tradeoffs, which have not been properly identified in previous work. To this end, it evaluates the TPC-H benchmark across a variety of DBMS offerings running in a cloud environment (AWS) on fast 10Gb+ networks, specifically database-as-a-service offerings (Redshift, Athena), query engines (Presto, Hive), and a traditional cloud agnostic OLAP database (Vertica). While these comparisons cannot be apples-to-apples in all cases due to cloud configuration restrictions, we nonetheless identify patterns and design choices that are advantageous. These include prioritizing low-cost object stores like S3 for data storage, using system agnostic yet still performant columnar formats like ORC that allow easy switching to other systems for different workloads, and making features that benefit subsequent runs like query precompilation and caching remote data to faster storage optional rather than required because they disadvantage ad hoc queries.
Year
DOI
Venue
2019
10.14778/3352063.3352133
PVLDB
Field
DocType
Volume
Computer science,Computer data storage,Online analytical processing,Database,Cloud computing
Journal
12
Issue
ISSN
Citations 
12
2150-8097
1
PageRank 
References 
Authors
0.35
0
9
Name
Order
Citations
PageRank
Junjay Tan110.69
Thanaa Ghanem210.69
Matthew Perron331.45
Xiangyao Yu427016.17
Michael Stonebraker5124634310.17
David J. DeWitt610.35
Marco Serafini710.69
Ashraf Aboulnaga8128991.33
Tim Kraska92226133.57