Title
A similarity reinforcement algorithm for heterogeneous web pages
Abstract
Many machine learning and data mining algorithms crucially rely on the similarity metrics. However, most early research works such as Vector Space Model or Latent Semantic Index only used single relationship to measure the similarity of data objects. In this paper, we first use an Intra- and Inter- Type Relationship Matrix (IITRM) to represent a set of heterogeneous data objects and their inter-relationships. Then, we propose a novel similarity-calculating algorithm over the Inter- and Intra- Type Relationship Matrix. It tries to integrate information from heterogeneous sources to serve their purposes by iteratively computing. This algorithm can help detect latent relationships among heterogeneous data objects. Our new algorithm is based on the intuition that the intra-relationship should affect the inter-relationship, and vice versa. Experimental results on the MSN logs dataset show that our algorithm outperforms the traditional Cosine similarity.
Year
DOI
Venue
2005
10.1007/978-3-540-31849-1_13
APWeb
Keywords
Field
DocType
similarity reinforcement algorithm,latent semantic index,heterogeneous data object,similarity metrics,data mining,heterogeneous web page,msn log,traditional cosine similarity,heterogeneous source,type relationship matrix,data object,new algorithm,web pages,vector space model,machine learning,latent semantic indexing
Data mining,Cosine similarity,Web page,Computer science,Matrix (mathematics),Artificial intelligence,Vector space model,Data mining algorithm,Semantic similarity,Algorithm,Data objects,Latent semantic analysis,Database,Machine learning
Conference
Volume
ISSN
ISBN
3399
0302-9743
3-540-25207-X
Citations 
PageRank 
References 
2
0.47
21
Authors
10
Name
Order
Citations
PageRank
Ning Liu120717.03
Jun Yan2179885.25
Fengshan Bai318420.65
Benyu Zhang4213590.41
Wensi Xi591250.23
Weiguo Fan62055133.38
Zheng Chen75019256.89
Lei Ji8594.49
Chenyong Hu9534.71
Wei-ying Ma10145871003.11