Title
Online Clustering of Distributed Streaming Data Using Belief Propagation Techniques
Abstract
Extraction of patterns out of streaming data that are generated from geographically dispersed devices is a major challenge in data mining. The sequential, distributed fashion in which data become available to the decision maker, together with the fact that the decision maker needs to rely only on recently received data due to storage and communication constraints, render the objective of keeping track of data evolution a nontrivial one. We consider a set of distributed nodes that communicate directly with a central location. We address the problem of clustering distributed streaming data through a two-level clustering approach. We adopt belief propagation techniques to perform stream clustering at both levels. At the node level, a batch of data arrives at each time slot, and the goal is to maintain a set of salient data (local exemplars) at each time slot, which best represents the data received up to that slot. At each epoch, the local exemplars from distributed nodes are sent to the central location, which in turn performs a second-level clustering on them to derive a data synopsis global for the whole system. The local exemplars that emerge from the second level clustering procedure are fed back to the nodes with appropriately modified weights which reflect their importance in global clustering. As demonstrated by our experiments, the two-level belief propagation-based clustering approach together with the feedback is ideal for handling data from different nodes, as it has the same performance in terms of clustering quality with the case where the clustering is performed on the raw data sent from nodes to the central location.
Year
DOI
Venue
2011
10.1109/MDM.2011.63
MDM), 2011 12th IEEE International Conference
Keywords
Field
DocType
belief maintenance,data handling,data mining,decision making,pattern clustering,storage management,belief propagation techniques,clustering quality,communication constraints,data evolution,data handling,data mining,data synopsis global,decision maker,distributed nodes,distributed streaming data,geographically dispersed devices,global clustering,local exemplars,online clustering,pattern extraction,salient data,second level clustering procedure,second-level clustering,storage constraints,stream clustering,two-level belief propagation-based clustering approach,two-level clustering approach
Data mining,Canopy clustering algorithm,Fuzzy clustering,CURE data clustering algorithm,Data stream clustering,Correlation clustering,Affinity propagation,Computer science,Constrained clustering,Cluster analysis
Conference
Volume
ISBN
Citations 
1
978-0-7695-4436-6
7
PageRank 
References 
Authors
0.49
15
2
Name
Order
Citations
PageRank
Maria Halkidi1130472.90
Iordanis Koutsopoulos2142.42