Complete inverted files for efficient text retrieval and analysis - Citegraph

Paper Info

Title
Complete inverted files for efficient text retrieval and analysis

Abstract
Given a finite set of texts S = {w1, … , wk} over some fixed finite alphabet &Sgr;, a complete inverted file for S is an abstract data type that provides the functions find(w), which returns the longest prefix of w that occurs (as a subword of a word) in S; freq(w), which returns the number of times w occurs in S; and locations(w), which returns the set of positions where w occurs in S. A data structure that implements a complete inverted file for S that occupies linear space and can be built in linear time, using the uniform-cost RAM model, is given. Using this data structure, the time for each of the above query functions is optimal. To accomplish this, techniques from the theory of finite automata and the work on suffix trees are used to build a deterministic finite automaton that recognizes the set of all subwords of the set S. This automaton is then annotated with additional information and compacted to facilitate the desired query functions. The result is a data structure that is smaller and more flexible than the suffix tree.

Year	DOI	Venue
1987	10.1145/28869.28873	J. ACM
Keywords	DocType	Volume
finite automaton,text retrieval,complete inverted file,finite set,deterministic finite automaton,string matching,data structure,fixed finite alphabet,query function,inverted tile,additional key words and phrases: dawg,set S.,suffix tree,abstract data type,efficient text retrieval	Journal	34
Issue	ISSN	Citations
3	0004-5411	112
PageRank	References	Authors
10.96	12	5

Search Limit

100112

Authors (5 rows)

Cited by (100 rows)

References (12 rows)

Name	Order	Citations	PageRank
A. Blumer	1	112	10.96
J. Blumer	2	112	10.96
David Haussler	3	8327	3068.93
R. McConnell	4	112	10.96
A. Ehrenfeucht	5	1823	497.83

1