Title
External sorting with on-the-fly compression
Abstract
Evaluating a query can involve manipulation of large volumes of temporary data. When the volume of data becomes too great, activities such as joins and sorting must use disk, and cost minimisation involves complex trade-offs. In this paper, we explore the effect of compression on the cost of external sorting. Reduction in the volume of data potentially allows costs to be reduced - through reductions in disk traffic and numbers of temporary files - but on-the-fly compression can be slow and many compression methods do not allow random access to individual records. We investigate a range of compression techniques for this problem, and develop successful methods based on common letter sequences. Our experiments show that, for a given memory limit, the overheads of compression outweigh the benefits for smaller data volumes, but for large files compression can yield substantial gains, of one-third of costs in the best case tested. Even when the data is stored uncompressed, our results show that incorporation of compression can significantly accelerate query processing.
Year
DOI
Venue
2003
10.1007/3-540-45073-4_10
BNCOD
Keywords
Field
DocType
compression technique,temporary data,large files compression,smaller data volume,compression method,large volume,cost minimisation,disk traffic,on-the-fly compression,query processing,random access
Data mining,Compression (physics),Joins,Computer science,Sorting,Minimisation (psychology),External sorting,Data compression,Database,Random access,Uncompressed video
Conference
Volume
ISSN
ISBN
2712
0302-9743
3-540-40536-4
Citations 
PageRank 
References 
2
0.37
20
Authors
2
Name
Order
Citations
PageRank
John Yiannis11448.07
Justin Zobel26882880.46