Title
Clustering of Web Search Results Based on Combination of Links and In-Snippets
Abstract
Search engine is a common tool to retrieve the information in the Web. But the current status of returned results is still far from satisfaction. Users have to be confronted with searching for a long result list to get the information really wanted. Many works focused on the post processing search results to facilitate users to examine the results. One of the common ways of post processing search result is clustering. Term-based clustering appears as first way to cluster the results. But this method is suffering from the poor quality while the processed pages have little text. Link-based clustering can conquer this problem. But the quality of clusters heavily depends on the number of in-links and out-links in common. In this paper, we propose that the short text attached to in-link is valuable information and it is helpful to reach high clustering quality. To distinguish them with general snippet, we name it as in-snippet. Based on the in-snippet, we propose a new clustering method that combines the links and the in-snippets together. In our method, similarity between pages consists of two parts : link similarity and term similarity. We designed related algorithm to implement clustering. In order to prevent bias from human judgments, the experiment datasets are collected from Open Directory Project(DMOZ). Due to DMOZ is human-edited directory, the datasets from DMOZ has higher quality and larger scale. We use entropy and f-measure to evaluate the quality of the final clusters. By being compared with the link-based and the pure term-based algorithms, our method outperforms others in clustering quality.
Year
DOI
Venue
2011
10.1109/WISA.2011.28
WISA
Keywords
Field
DocType
poor quality,common way,high clustering quality,term-based clustering,higher quality,link-based clustering,common tool,post processing search result,new clustering method,web search results,clustering quality,web pages,vectors,link analysis,clustering algorithms,algorithm design and analysis,internet,entropy,information retrieval,search engine,algorithm design,clustering,search engines
Fuzzy clustering,Data mining,CURE data clustering algorithm,Clustering high-dimensional data,Data stream clustering,Information retrieval,Link analysis,Computer science,Brown clustering,Snippet,Cluster analysis
Conference
Citations 
PageRank 
References 
1
0.38
11
Authors
3
Name
Order
Citations
PageRank
Nan Yang110.38
Yue Liu244184.32
Gang Yang3329.38