Abstract | ||
---|---|---|
A widely used approach to clustering a single data stream is the two-phased approach in which the online phase creates and maintains micro-clusters while the off-line phase generates the macro-clustering from the micro-clusters. We use this approach to propose a distributed framework for clustering streaming data. Every remote-site process generates and maintains micro-clusters that represent cluster information summary from its local data stream. Remote sites send the local micro-clusterings to the coordinator, or the coordinator invokes the remote methods in order to get the local micro-clusterings from the remote sites. Having received all the local micro-clusterings from the remote sites, the coordinator generates the global clustering by the macro-clustering method. Our theoretical and empirical results show that the global clustering generated by our distributed framework is similar to the clustering generated by the underlying centralized algorithm on the same data set. By using the local micro-clustering approach, our framework achieves high scalability, and communication-efficiency. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1007/978-3-642-39640-3_31 | COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2013, PT V |
Field | DocType | Volume |
Cluster (physics),Data mining,Data stream clustering,Serialization,Data stream,Computer science,Streaming data,Cluster analysis,Database,Distributed computing,Scalability | Journal | 7975 |
ISSN | Citations | PageRank |
0302-9743 | 0 | 0.34 |
References | Authors | |
20 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Dang-Hoan Tran | 1 | 7 | 1.52 |
Kai-uwe Sattler | 2 | 1144 | 126.81 |