Title
Exploring Similarity among Web Pages Using the Hyperlink Structure
Abstract
Hyperlinks inside HTML pages contain a wealth ofinformation about the relationships among web pages. Given aset of web pages, we can explore the hyperlink relationshipsamong these pages. This paper first provides formal definitionsof hyperlink relations. We then use the notations to definesimilarity between two web pages and between two sets of webpages. For each one of them, we provide several definitions ofsimilarity using forward and backward links. The similaritymeasure gives us a number between 0 and 1. We alsodemonstrate how to use the similarity measure to study clusteringwithin a set of pages and to determine the "diversity" of a set ofweb pages.
Year
DOI
Venue
2004
10.1109/ITCC.2004.1286477
ITCC (1)
Keywords
Field
DocType
hyperlink structure,set ofweb page,wealth ofinformation,web page,exploring similarity,formal definitionsof hyperlink relation,similarity measure,hyperlink relationshipsamong,web pages,html page,internet,search engines,data mining,html,computer science,information retrieval,search engine,world wide web
Static web page,Web search engine,World Wide Web,HITS algorithm,Web page,Information retrieval,Computer science,Doorway page,Focused crawler,Hyperlink,HTML
Conference
Volume
ISBN
Citations 
1
0-7695-2108-8
1
PageRank 
References 
Authors
0.35
5
7