Title
Measuring semantic similarity between words by removing noise and redundancy in web snippets
Abstract
Semantic similarity measures play important roles in many Web-related tasks such as Web browsing and query suggestion. Because taxonomy-based methods can not deal with continually emerging words, recently Web-based methods have been proposed to solve this problem. Because of the noise and redundancy hidden in the Web data, robustness and accuracy are still challenges. In this paper, we propose a method integrating page counts and snippets returned by Web search engines. Then, the semantic snippets and the number of search results are used to remove noise and redundancy in the Web snippets (‘Web-snippet’ includes the title, summary, and URL of a Web page returned by a search engine). After that, a method integrating page counts, semantics snippets, and the number of already displayed search results are proposed. The proposed method does not need any human annotated knowledge (e.g., ontologies), and can be applied Web-related tasks (e.g., query suggestion) easily. A correlation coefficient of 0.851 against Rubenstein–Goodenough benchmark dataset shows that the proposed method outperforms the existing Web-based methods by a wide margin. Moreover, the proposed semantic similarity measure significantly improves the quality of query suggestion against some page counts based methods. Copyright © 2011 John Wiley & Sons, Ltd.
Year
DOI
Venue
2011
10.1002/cpe.1816
Concurrency and Computation: Practice and Experience
Keywords
DocType
Volume
semantic similarity,web snippet,Web search engine,Web data,page count,Web snippet,Web page,Web-related task,query suggestion,Web browsing,search result,proposed method
Journal
23
Issue
ISSN
Citations 
18
1532-0626
20
PageRank 
References 
Authors
1.00
26
4
Name
Order
Citations
PageRank
Zheng Xu135219.51
Xiangfeng Luo21251124.38
Jie Yu3201.00
Weimin Xu4617.98