Abstract | ||
---|---|---|
Methods exist for constant-time clustering of corpus subsets selected via Scatter/Gather browsing [3]. In this paper we expand on those techniques, giving an algorithm for almost-constant-time clustering of arbitrary corpus subsets. This algorithm is never slower than clustering the document set from scratch, and for medium-sized and large sets it is significantly faster. This algorithm is useful for clustering arbitrary subsets of large corpora - obtained, for instance, by a boolean search - quickly enough to be useful in an interactive setting. |
Year | DOI | Venue |
---|---|---|
1997 | 10.1145/258525.258535 | SIGIR |
Field | DocType | Volume |
Data mining,Correlation clustering,Computer science,Cluster analysis | Conference | 31 |
Issue | ISSN | ISBN |
SI | 0163-5840 | 0-89791-836-3 |
Citations | PageRank | References |
29 | 10.28 | 4 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Craig Silverstein | 1 | 29 | 10.28 |
Jan O. Pedersen | 2 | 6301 | 1177.07 |