Abstract | ||
---|---|---|
We present a simple and efficient external perfect hashing scheme (referred to as EPH algorithm) for very large static key sets. We use a number of techniques from the literature to obtain a novel scheme that is theoretically well-understood and at the same time achieves an order-of-magnitude increase in the size of the problem to be solved compared to previous "practical" methods. We demonstrate the scalability of our algorithm by constructing minimum perfect hash functions for a set of 1.024 billion URLs from the World Wide Web of average length 64 characters in approximately 62 minutes, using a commodity PC. Our scheme produces minimal perfect hash functions using approximately 3.8 bits per key. For perfect hash functions in the range {0,...,2n - 1} the space usage drops to approximately 2.7 bits per key. The main contribution is the first algorithm that has experimentally proven practicality for sets in the order of billions of keys and has time and space usage carefully analyzed without unrealistic assumptions. |
Year | DOI | Venue |
---|---|---|
2007 | 10.1145/1321440.1321532 | CIKM |
Keywords | Field | DocType |
billion urls,average length,space usage,large key set,novel scheme,perfect hash function,minimum perfect hash function,large static key set,eph algorithm,world wide web,minimal perfect hash function,large,hash function,functions,perfect,hash | Data mining,Double hashing,Computer science,Universal hashing,Algorithm,Theoretical computer science,Perfect hash function,Hash function,Consistent hashing,Dynamic perfect hashing,Linear hashing,Hash table | Conference |
Citations | PageRank | References |
27 | 1.22 | 31 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Fabiano C. Botelho | 1 | 174 | 11.06 |
Nivio Ziviani | 2 | 1598 | 154.65 |