Title
External perfect hashing for very large key sets
Abstract
We present a simple and efficient external perfect hashing scheme (referred to as EPH algorithm) for very large static key sets. We use a number of techniques from the literature to obtain a novel scheme that is theoretically well-understood and at the same time achieves an order-of-magnitude increase in the size of the problem to be solved compared to previous "practical" methods. We demonstrate the scalability of our algorithm by constructing minimum perfect hash functions for a set of 1.024 billion URLs from the World Wide Web of average length 64 characters in approximately 62 minutes, using a commodity PC. Our scheme produces minimal perfect hash functions using approximately 3.8 bits per key. For perfect hash functions in the range {0,...,2n - 1} the space usage drops to approximately 2.7 bits per key. The main contribution is the first algorithm that has experimentally proven practicality for sets in the order of billions of keys and has time and space usage carefully analyzed without unrealistic assumptions.
Year
DOI
Venue
2007
10.1145/1321440.1321532
CIKM
Keywords
Field
DocType
billion urls,average length,space usage,large key set,novel scheme,perfect hash function,minimum perfect hash function,large static key set,eph algorithm,world wide web,minimal perfect hash function,large,hash function,functions,perfect,hash
Data mining,Double hashing,Computer science,Universal hashing,Algorithm,Theoretical computer science,Perfect hash function,Hash function,Consistent hashing,Dynamic perfect hashing,Linear hashing,Hash table
Conference
Citations 
PageRank 
References 
27
1.22
31
Authors
2
Name
Order
Citations
PageRank
Fabiano C. Botelho117411.06
Nivio Ziviani21598154.65