Title
Distributed perfect hashing for very large key sets.
Abstract
A perfect hash function (PHF) h: S → [0, m -- 1] for a key set S ⊆ U of size n, where m ≥ n and U is a key universe, is an injective function that maps the keys of S to unique values. A minimal perfect hash function (MPHF) is a PHF with m = n, the smallest possible range. Minimal perfect hash functions are widely used for memory efficient storage and fast retrieval of items from static sets. In this paper we present a distributed and parallel version of a simple, highly scalable and near-space optimal perfect hashing algorithm for very large key sets, recently presented in [4]. The sequential implementation of the algorithm constructs a MPHF for a set of 1.024 billion URLs of average length 64 bytes collected from the Web in approximately 50 minutes using a commodity PC. The parallel implementation proposed here presents the following performance using 14 commodity PCs: (i) it constructs a MPHF for the same set of 1.024 billion URLs in approximately 4 minutes; (ii) it constructs a MPHF for a set of 14.336 billion 16-byte random integers in approximately 50 minutes with a performance degradation of 20%; (iii) one version of the parallel algorithm distributes the description of the MPHF among the participating machines and its evaluation is done in a distributed way, faster than the centralized function.
Year
Venue
Keywords
2008
Infoscale
billion urls,minimal perfect hash function,large key set,centralized function,key universe,injective function,perfect hash function,billion 16-byte random integer,static set,parallel algorithm
Field
DocType
Citations 
Integer,Byte,Injective function,Parallel algorithm,Computer science,Perfect hash function,Dynamic perfect hashing,Scalability,Distributed computing
Conference
3
PageRank 
References 
Authors
0.37
11
4
Name
Order
Citations
PageRank
Fabiano C. Botelho117411.06
Daniel Galinkin230.37
Wagner Meira Jr32039188.06
Nivio Ziviani41598154.65