Title
Parallelizing the buckshot algorithm for efficient document clustering
Abstract
We present a parallel implementation of the Buckshot document clustering algorithm. We demonstrate that this parallel approach is highly efficient both in terms of load balancing and minimization of communication. In a series of experiments using the 2GB of SGML data from TReC disks 4 and 5, our parallel approach was shown to be scalable in terms of processors efficiently used and the number of clusters created.
Year
DOI
Venue
2002
10.1145/584792.584919
CIKM
Keywords
Field
DocType
sgml data,load balancing,trec disk,parallel implementation,buckshot algorithm,efficient document clustering,parallel approach,buckshot document,association rules,text mining,load balance,document clustering
Data mining,CURE data clustering algorithm,SGML,Correlation clustering,Computer science,Document clustering,Load balancing (computing),Algorithm,Association rule learning,Minification,Scalability
Conference
ISBN
Citations 
PageRank 
1-58113-492-4
8
0.85
References 
Authors
5
5
Name
Order
Citations
PageRank
Eric C. Jensen169646.72
Steven M. Beitzel269646.72
Angelo J. Pilotto381.19
Nazli Goharian446049.93
Ophir Frieder53300419.55