Title
Perfect Hashing for Data Management Applications
Abstract
Perfect hash functions can potentially be used to compress data in connection with a variety of data management tasks. Though there has been considerable work on how to con- struct good perfect hash functions, there is a gap between theory and practice among all previous methods on min- imal perfect hashing. On one side, there are good the- oretical results without experimentally proven practicality for large key sets. On the other side, there are the the- oretically analyzed time and space usage algorithms that assume that truly random hash functions are available for free, which is an unrealistic assumption. In this paper we attempt to bridge this gap between theory and practice, us- ing a number of techniques from the literature to obtain a novel scheme that is theoretically well-understood and at the same time achieves an order-of-magnitude increase in per- formance compared to previous "practical" methods. This improvement comes from a combination of a novel, theo- retically optimal perfect hashing scheme that greatly sim- plifies previous methods, and the fact that our algorithm is designed to make good use of the memory hierarchy. We demonstrate the scalability of our algorithm by considering a set of over one billion URLs from the World Wide Web of average length 64, for which we construct a minimal perfect hash function on a commodity PC in a little more than 1 hour. Our scheme produces minimal perfect hash functions using slightly more than 3 bits per key. For perfect hash functions in the range {0, . . . ,2n −1} the space usage drops to just over 2 bits per key (i.e., one bit more than optimal for representing the key). This is significantly below of what has been achieved previously for very large values of n.
Year
Venue
Keywords
2007
Clinical Orthopaedics and Related Research
data management,world wide web,hash function,data structure
Field
DocType
Volume
Combinatorics,Double hashing,Universal hashing,Algorithm,Theoretical computer science,Hash function,Perfect hash function,K-independent hashing,Mathematics,Dynamic perfect hashing,Hash table,Linear hashing
Journal
abs/cs/070
Citations 
PageRank 
References 
1
0.37
28
Authors
3
Name
Order
Citations
PageRank
Fabiano C. Botelho117411.06
Rasmus Pagh2134486.08
Nivio Ziviani31598154.65