Title
Building a complete inverted file for a set of text files in linear time
Abstract
Given a finite set of texts S &equil; {&ohgr;1, ..., &ohgr;k} over some fixed finite alphabet &Sgr;, a complete inverted file for S is an abstract data type that provides the functions find(&ohgr;), which returns the longest prefix of &ohgr; which occurs in S; freq(&ohgr;), which returns the number of times &ohgr; occurs in S; and locations(&ohgr;) which returns the set of positions at which &ohgr; occurs. We give a data structure to implement a complete inverted file for S which occupies linear space and can be built in linear time, using the uniform cost RAM model. Using this data structure, the time for each of the above query functions is optimal. To accomplish this, we use techniques from the theory of finite automata to build a deterministic finite automaton which recognizes the set of all sub words of the set S. This automaton is then annotated with additional information and compacted to facilitate the desired query functions.
Year
DOI
Venue
1984
10.1145/800057.808700
STOC
Keywords
Field
DocType
query function,finite automaton,complete inverted file,set s.,abstract data type,linear time,finite set,text file,deterministic finite automaton,data structure,linear space,fixed finite alphabet,algorithm,finite automata,compactness
Inverted index,Discrete mathematics,Data structure,Combinatorics,Finite set,Deterministic automaton,Deterministic finite automaton,Linear space,Finite-state machine,Time complexity,Mathematics
Conference
ISBN
Citations 
PageRank 
0-89791-133-4
9
4.72
References 
Authors
10
5
Name
Order
Citations
PageRank
A. Blumer194.72
J. Blumer23310.50
A. Ehrenfeucht31823497.83
David Haussler483273068.93
R. McConnell594.72