Title
BibClus: A Clustering Algorithm of Bibliographic Networks by Message Passing on Center Linkage Structure
Abstract
Multi-type objects with multi-type relations are ubiquitous in real-world networks, e.g. bibliographic networks. Such networks are also called heterogeneous information networks. However, the research on clustering for heterogeneous information networks is little. A new algorithm, called NetClus, has been proposed in recent two years. Although NetClus is applied on a heterogeneous information network with a star network schema, considering the relations between center objects and all attribute objects linking to them, it ignores the relations between center objects such as citation relations, which also contain rich information. Hence, we think the star network schema cannot be used to characterize all possible relations without integrating the linkage structure among center objects, which we call the Center Linkage Structure, and there has been no practical way good enough to solve it. In this paper, we present a novel algorithm, BibClus, for clustering heterogeneous objects with center linkage structure by taking a bibliographic information network as an example. In BibClus, we build a probabilistic model of pair wise hidden Markov random field (P-HMRF) to characterize the center linkage structure, and convert it to a factor graph. We further combine EM algorithm with factor graph theory, and design an efficient way based on message passing algorithm to inference marginal probabilities and estimate parameters at each iteration of EM. We also study how factor functions affect clustering performance with different function forms and constraints. For evaluating our proposed method, we have conducted thorough experiments on a real dataset that we had crawled from ACM Digital Library. The experimental results show that BibClus is effective and has a much higher quantity than the recently proposed algorithm, NetClus, in both recall and precision.
Year
DOI
Venue
2011
10.1109/ICDM.2011.27
ICDM
Keywords
Field
DocType
star network schema,bibliographic information network,heterogeneous information network,bibliographic network,rich information,center object,new algorithm,clustering algorithm,bibliographic networks,novel algorithm,em algorithm,center linkage structure,message passing,digital libraries,clustering,graph theory,probabilistic model,factor graph,digital library,iterative methods,functional form,markov processes
Data mining,Star network,Computer science,Theoretical computer science,Artificial intelligence,Cluster analysis,Message passing,Graph theory,Factor graph,Expectation–maximization algorithm,Precision and recall,Machine learning,A* search algorithm
Conference
Citations 
PageRank 
References 
2
0.36
12
Authors
2
Name
Order
Citations
PageRank
Xiaoran Xu1514.34
Zhi-Hong Deng218523.33