Title
Leveraging integrated information to extract query subtopics for search result diversification
Abstract
Search result diversification aims to diversify search results to cover different query subtopics, i.e., pieces of relevant information. The state of the art diversification methods often explicitly model the diversity based on query subtopics, and their performance is closely related to the quality of subtopics. Most existing studies extracted query subtopics only from the unstructured data such as document collections. However, there exists a huge amount of information from structured data, which complements the information from the unstructured data. The structured data can provide valuable information about domain knowledge, but is currently under-utilized. In this article, we study how to leverage the integrated information from both structured and unstructured data to extract high quality subtopics for search result diversification. We first discuss how to extract subtopics from structured data. We then propose three methods to integrate structured and unstructured data. Specifically, the first method uses the structured data to guide the subtopic extraction from unstructured data, the second one uses the unstructured data to guide the extraction, and the last one first extracts the subtopics separately from two data sources and then combines those subtopics. Experimental results in both Enterprise and Web search domains show that the proposed methods are effective in extracting high quality subtopics from the integrated information, which can lead to better diversification performance.
Year
DOI
Venue
2014
10.1007/s10791-013-9228-1
Inf. Retr.
Keywords
Field
DocType
Web search,Enterprise search,Diversification,Query subtopics,Structured data,Unstructured data
Data mining,Existential quantification,Domain knowledge,Information retrieval,Computer science,Enterprise search,Unstructured data,Diversification (marketing strategy),Data model
Journal
Volume
Issue
ISSN
17
1
1386-4564
Citations 
PageRank 
References 
1
0.34
32
Authors
4
Name
Order
Citations
PageRank
Wei Zheng15110.41
Hui Fang291863.03
Conglei Yao3496.41
Min WANG41662192.58