Abstract | ||
---|---|---|
We present a parallel implementation of the Buckshot document clustering algorithm. We demonstrate that this parallel approach is highly efficient both in terms of load balancing and minimization of communication. In a series of experiments using the 2GB of SGML data from TReC disks 4 and 5, our parallel approach was shown to be scalable in terms of processors efficiently used and the number of clusters created. |
Year | DOI | Venue |
---|---|---|
2002 | 10.1145/584792.584919 | CIKM |
Keywords | Field | DocType |
sgml data,load balancing,trec disk,parallel implementation,buckshot algorithm,efficient document clustering,parallel approach,buckshot document,association rules,text mining,load balance,document clustering | Data mining,CURE data clustering algorithm,SGML,Correlation clustering,Computer science,Document clustering,Load balancing (computing),Algorithm,Association rule learning,Minification,Scalability | Conference |
ISBN | Citations | PageRank |
1-58113-492-4 | 8 | 0.85 |
References | Authors | |
5 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Eric C. Jensen | 1 | 696 | 46.72 |
Steven M. Beitzel | 2 | 696 | 46.72 |
Angelo J. Pilotto | 3 | 8 | 1.19 |
Nazli Goharian | 4 | 460 | 49.93 |
Ophir Frieder | 5 | 3300 | 419.55 |