Abstract | ||
---|---|---|
Signature files and inverted files are well-known index structures. In this paper we undertake a direct comparision of the two for searching for partially-specified queries in a large lexicon stored in main memory. Using n-grams to index lexicon terms, a bit-sliced signature file can be compressed to a smaller size than an inverted file if each n-gram sets only one bit in the term signature. With a signature width less than half the number of unique n-grams in the lexicon, the signature file method is about as fast as the inverted file method, and significantly smaller. Greater flexibility in memory usage and faster index generation time make signature files appropriate for searching large lexicons or other collections in an environment where memory is at a premium. |
Year | DOI | Venue |
---|---|---|
2005 | 10.1016/j.ipm.2003.12.003 | Inf. Process. Manage. |
Keywords | Field | DocType |
dictionaries,signature file,signature width,performance evaluation,term signature,performance evaluation.,signature file method,personal digital assistants pdas,compression,index lexicon term,indexing methods,inverted file method,large lexicon,faster index generation time,personal digital assistants,inverted file,bit-sliced signature file,indexation,generation time | Inverted index,Data mining,Indexation,Information retrieval,Computer science,Search engine indexing,Lexicon,Lexico,Signature file,Statistical analysis | Journal |
Volume | Issue | ISSN |
41 | 3 | Information Processing and Management |
Citations | PageRank | References |
4 | 0.37 | 36 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ben Carterette | 1 | 1544 | 83.86 |
Fazli Can | 2 | 581 | 94.63 |