Title
Constant interaction-time scatter/gather browsing of very large document collections
Abstract
The Scatter/Gather document browsing method uses fast document clustering to produce table-of-contents-like outlines of large document collections. Previous work [1] developed linear-time document clustering algorithms to establish the feasibility of this method over moderately large collections. However, even linear-time algorithms are too slow to support interactive browsing of very large collections such as Tipster, the DARPA standard text retrieval evaluation collection. We present a scheme that supports constant interaction-time Scatter/Gather of arbitrarily large collections after near-linear time preprocessing. This involves the construction of a cluster hierarchy. A modification of Scatter/Gather employing this scheme, and an example of its use over the Tipster collection are presented.
Year
DOI
Venue
1993
10.1145/160688.160706
SIGIR
Keywords
Field
DocType
large document collection,interactive browsing,constant interaction-time scatter,evaluation collection,tipster collection,constant interaction-time,linear-time algorithm,darpa standard text retrieval,linear-time document,cluster hierarchy,large collection,document clustering,aggregation,linear time,graph theory,clustering,structural analysis,table of contents,hypertext
Graph theory,Data mining,Hypertext,Information retrieval,Computer science,Document clustering,Preprocessor,Cluster analysis,Hierarchy,Arbitrarily large,Text retrieval
Conference
ISBN
Citations 
PageRank 
0-89791-605-0
119
62.26
References 
Authors
6
3
Search Limit
100119
Name
Order
Citations
PageRank
Douglas R. Cutting11030423.10
David R. Karger2193672233.64
Jan O. Pedersen363011177.07