Constant interaction-time scatter/gather browsing of very large document collections - Citegraph

Paper Info

Title
Constant interaction-time scatter/gather browsing of very large document collections

Abstract
The Scatter/Gather document browsing method uses fast document clustering to produce table-of-contents-like outlines of large document collections. Previous work [1] developed linear-time document clustering algorithms to establish the feasibility of this method over moderately large collections. However, even linear-time algorithms are too slow to support interactive browsing of very large collections such as Tipster, the DARPA standard text retrieval evaluation collection. We present a scheme that supports constant interaction-time Scatter/Gather of arbitrarily large collections after near-linear time preprocessing. This involves the construction of a cluster hierarchy. A modification of Scatter/Gather employing this scheme, and an example of its use over the Tipster collection are presented.

Year	DOI	Venue
1993	10.1145/160688.160706	SIGIR
Keywords	Field	DocType
large document collection,interactive browsing,constant interaction-time scatter,evaluation collection,tipster collection,constant interaction-time,linear-time algorithm,darpa standard text retrieval,linear-time document,cluster hierarchy,large collection,document clustering,aggregation,linear time,graph theory,clustering,structural analysis,table of contents,hypertext	Graph theory,Data mining,Hypertext,Information retrieval,Computer science,Document clustering,Preprocessor,Cluster analysis,Hierarchy,Arbitrarily large,Text retrieval	Conference
ISBN	Citations	PageRank
0-89791-605-0	119	62.26
References	Authors
6	3

Search Limit

100119

Authors (3 rows)

Cited by (100 rows)

References (6 rows)

Name	Order	Citations	PageRank
Douglas R. Cutting	1	1030	423.10
David R. Karger	2	19367	2233.64
Jan O. Pedersen	3	6301	1177.07

1