Title
Communication-Efficient Exact Clustering Of Distributed Streaming Data
Abstract
A widely used approach to clustering a single data stream is the two-phased approach in which the online phase creates and maintains micro-clusters while the off-line phase generates the macro-clustering from the micro-clusters. We use this approach to propose a distributed framework for clustering streaming data. Every remote-site process generates and maintains micro-clusters that represent cluster information summary from its local data stream. Remote sites send the local micro-clusterings to the coordinator, or the coordinator invokes the remote methods in order to get the local micro-clusterings from the remote sites. Having received all the local micro-clusterings from the remote sites, the coordinator generates the global clustering by the macro-clustering method. Our theoretical and empirical results show that the global clustering generated by our distributed framework is similar to the clustering generated by the underlying centralized algorithm on the same data set. By using the local micro-clustering approach, our framework achieves high scalability, and communication-efficiency.
Year
DOI
Venue
2012
10.1007/978-3-642-39640-3_31
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2013, PT V
Field
DocType
Volume
Cluster (physics),Data mining,Data stream clustering,Serialization,Data stream,Computer science,Streaming data,Cluster analysis,Database,Distributed computing,Scalability
Journal
7975
ISSN
Citations 
PageRank 
0302-9743
0
0.34
References 
Authors
20
2
Name
Order
Citations
PageRank
Dang-Hoan Tran171.52
Kai-uwe Sattler21144126.81