Abstract | ||
---|---|---|
The Scatter/Gather document browsing method uses fast document clustering to produce table-of-contents-like outlines of large document collections. Previous work [1] developed linear-time document clustering algorithms to establish the feasibility of this method over moderately large collections. However, even linear-time algorithms are too slow to support interactive browsing of very large collections such as Tipster, the DARPA standard text retrieval evaluation collection. We present a scheme that supports constant interaction-time Scatter/Gather of arbitrarily large collections after near-linear time preprocessing. This involves the construction of a cluster hierarchy. A modification of Scatter/Gather employing this scheme, and an example of its use over the Tipster collection are presented. |
Year | DOI | Venue |
---|---|---|
1993 | 10.1145/160688.160706 | SIGIR |
Keywords | Field | DocType |
large document collection,interactive browsing,constant interaction-time scatter,evaluation collection,tipster collection,constant interaction-time,linear-time algorithm,darpa standard text retrieval,linear-time document,cluster hierarchy,large collection,document clustering,aggregation,linear time,graph theory,clustering,structural analysis,table of contents,hypertext | Graph theory,Data mining,Hypertext,Information retrieval,Computer science,Document clustering,Preprocessor,Cluster analysis,Hierarchy,Arbitrarily large,Text retrieval | Conference |
ISBN | Citations | PageRank |
0-89791-605-0 | 119 | 62.26 |
References | Authors | |
6 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Douglas R. Cutting | 1 | 1030 | 423.10 |
David R. Karger | 2 | 19367 | 2233.64 |
Jan O. Pedersen | 3 | 6301 | 1177.07 |