Title
Efficient and scalable statistics gathering for large databases in Oracle 11g
Abstract
Large tables are often decomposed into smaller pieces called partitions in order to improve query performance and ease the data management. Query optimizers rely on both the statistics of the entire table and the statistics of the individual partitions to select a good execution plan for a SQL statement. In Oracle 10g, we scan the entire table twice, one pass for gathering the table level statistics and the other pass for gathering the partition level statistics. A consequence of this gathering method is that, when the data in some partitions change, not only do we need to scan the changed partitions to gather the partition level statistics, but also we have to scan the entire table again to gather the table level statistics. Oracle 11g adopts a one-pass distinct sampling based method which can accurately derive the table level statistics from the partition level statistics. When data change, Oracle only re-gathers the statistics for the changed partitions and then derives the table level statistics without touching the unchanged partitions. To the best of our knowledge, although the one-pass distinct sampling has been researched in academia for some years, Oracle is the first commercial database that implements the technique. We have performed extensive experiments on both benchmark data and real customer data. Our experiments illustrate the this new method is highly accurate and has significantly better performance than the old method used in Oracle 10g.
Year
DOI
Venue
2008
10.1145/1376616.1376721
SIGMOD Conference
Keywords
Field
DocType
changed partition,entire table,table level statistic,data change,large databases,partition level statistic,one-pass distinct sampling,scalable statistic,benchmark data,partitions change,large table,data management,sampling,query optimization
SQL,Data mining,Computer science,Oracle,Theoretical computer science,Sampling (statistics),Partition (number theory),Statistics,Data management,Database,Scalability
Conference
Citations 
PageRank 
References 
14
0.78
18
Authors
7
Name
Order
Citations
PageRank
Sunil Chakkappen1303.07
Thierry Cruanes2846.20
Benoit Dageville3615.13
Linan Jiang4542.97
Uri Shaft51050107.01
Hong Su6181.16
Mohamed Zait7937.80