Title
Affinity-based similarity measure for Web document clustering
Abstract
Compared to the regular documents, the major distinguishing characteristics of the Web documents are the dynamic hyper-structure. Thus, in addition to terms or keywords for regular document clustering, Web document clustering can incorporate some dynamic information such as the hyperlinks and the access patterns extracted from the user query logs. In this paper, we extend the concept of document clustering into Web document clustering by introducing the strategy of affinity-based similarity measure, which utilizes the user access patterns in determining the similarities among Web documents via a probabilistic model. Several comparison experiments are conducted using a real data set and the experimental results demonstrate that the proposed similarity measure outperforms the cosine coefficient and the Euclidean distance method under different document clustering algorithms.
Year
DOI
Venue
2004
10.1109/IRI.2004.1431469
IRI
Keywords
Field
DocType
euclidean distance method,document retrieval,information retrieval,affinity-based similarity measure,hyperlinks,user access patterns,internet,data mining,user query logs,web document clustering,document handling,cosine coefficient,probabilistic model,document clustering,euclidean distance
Fuzzy clustering,Data mining,Similarity measure,Information retrieval,Document clustering,Computer science,Euclidean distance,Hyperlink,Statistical model,Cluster analysis,The Internet
Conference
ISBN
Citations 
PageRank 
0-7803-8819-4
0
0.34
References 
Authors
10
4
Name
Order
Citations
PageRank
Mei-Ling Shyu11863141.25
Shu-ching Chen231.75
Min Chen324414.75
Stuart Harvey Rubin47320.96