Title
Minimal perfect hashing: A competitive method for indexing internal memory
Abstract
A perfect hash function (PHF) is an injective function that maps keys from a set S to unique values. Since no collisions occur, each key can be retrieved from a hash table with a single probe. A minimal perfect hash function (MPHF) is a PHF with the smallest possible range, that is, the hash table size is exactly the number of keys in S. MPHFs are widely used for memory efficient storage and fast retrieval of items from static sets. Differently from other hashing schemes, MPHFs completely avoid the problem of wasted space and wasted time to deal with collisions. Until recently, the amount of space to store an MPHF description for practical implementations found in the literature was O(logn) bits per key and therefore similar to the overhead of space of other hashing schemes. Recent results on MPHFs presented in the literature changed this scenario: an MPHF can now be described by approximately 2.6 bits per key. The objective of this paper is to show that MPHFs are, after the new recent results, a good option to index internal memory when static key sets are involved and both successful and unsuccessful searches are allowed. We have shown that MPHFs provide the best tradeoff between space usage and lookup time when compared with other open addressing and chaining hash schemes such as linear hashing, quadratic hashing, double hashing, dense hashing, cuckoo hashing, sparse hashing, hopscotch hashing, chaining with move to front heuristic and exact fit. We considered lookup time for successful and unsuccessful searches in two scenarios: (i) the MPHF description fits in the CPU cache and (ii) the MPHF description does not fit entirely in the CPU cache. Considering lookup time, the minimal perfect hashing outperforms the other hashing schemes in the two scenarios and, in the first scenario, the performance is better even when the compared methods leave more than 80% of the hash table entries free. Considering space overhead (the amount of used space other than the key-value pairs), the minimal perfect hashing is within a factor of O(logn) bits lower than the other hashing schemes for both scenarios.
Year
DOI
Venue
2011
10.1016/j.ins.2009.12.003
Inf. Sci.
Keywords
Field
DocType
hash table size,competitive method,hash table,lookup time,internal memory,perfect hash function,hash table entry,unsuccessful search,mphf description,minimal perfect hash function,cpu cache,chaining hash scheme,minimal perfect hashing,indexation,comparative method,hash chain,hash function
Hopscotch hashing,Double hashing,Universal hashing,Theoretical computer science,Consistent hashing,Mathematics,Dynamic perfect hashing,Open addressing,Linear hashing,Hash table
Journal
Volume
Issue
ISSN
181
13
0020-0255
Citations 
PageRank 
References 
9
0.56
35
Authors
4
Name
Order
Citations
PageRank
Fabiano C. Botelho117411.06
Anísio Lacerda217216.18
Guilherme Vale Menezes3282.00
Nivio Ziviani41598154.65