Title
Mining subtopics from text fragments for a web query
Abstract
Web search queries are often ambiguous or faceted, and the task of identifying the major underlying senses and facets of queries has received much attention in recent years. We refer to this task as query subtopic mining. In this paper, we propose to use surrounding text of query terms in top retrieved documents to mine subtopics and rank them. We first extract text fragments containing query terms from different parts of documents. Then we group similar text fragments into clusters and generate a readable subtopic for each cluster. Based on the cluster and the language model trained from a query log, we calculate three features and combine them into a relevance score for each subtopic. Subtopics are finally ranked by balancing relevance and novelty. Our evaluation experiments with the NTCIR-9 INTENT Chinese Subtopic Mining test collection show that our method significantly outperforms a query log based method proposed by Radlinski et al. (2010) and a search result clustering based method proposed by Zeng et al. (2004) in terms of precision, I-rec, D-nDCG and D#-nDCG, the official evaluation metrics used at the NTCIR-9 INTENT task. Moreover, our generated subtopics are significantly more readable than those generated by the search result clustering method.
Year
DOI
Venue
2013
10.1007/s10791-013-9221-8
Inf. Retr.
Keywords
Field
DocType
mining subtopics,readable subtopic,query subtopic mining,ntcir-9 intent chinese subtopic,extract text,query log,ntcir-9 intent task,query term,web search query,web query,group similar text fragment,search result
Web search query,Data mining,Ranking,Information retrieval,Query expansion,Computer science,Web query classification,Ranking (information retrieval),Novelty,Cluster analysis,Language model
Journal
Volume
Issue
ISSN
16
4
1573-7659
Citations 
PageRank 
References 
10
0.48
36
Authors
7
Name
Order
Citations
PageRank
Qinglei Wang1100.48
Yanan Qian21116.75
Ruihua Song3113859.33
Zhicheng Dou470641.96
Fan Zhang522969.82
Tetsuya Sakai61460139.97
Qinghua Zheng71261160.88