Title
Automatic web query classification using labeled and unlabeled training data
Abstract
Accurate topical categorization of user queries allows for increased effectiveness, efficiency, and revenue potential in general-purpose web search systems. Such categorization becomes critical if the system is to return results not just from a general web collection but from topic-specific databases as well. Maintaining sufficient categorization recall is very difficult as web queries are typically short, yielding few features per query. We examine three approaches to topical categorization of general web queries: matching against a list of manually labeled queries, supervised learning of classifiers, and mining of selectional preference rules from large unlabeled query logs. Each approach has its advantages in tackling the web query classification recall problem, and combining the three techniques allows us to classify a substantially larger proportion of queries than any of the individual techniques. We examine the performance of each approach on a real web query stream and show that our combined method accurately classifies 46% of queries, outperforming the recall of the best single approach by nearly 20%, with a 7% improvement in overall effectiveness.
Year
DOI
Venue
2005
10.1145/1076034.1076138
SIGIR
Keywords
Field
DocType
real web query stream,automatic web query classification,accurate topical categorization,unlabeled training data,general-purpose web search system,general web collection,general web query,web query classification recall,user query,large unlabeled query log,sufficient categorization recall,web query,query classification,supervised learning
Query optimization,Data mining,Web search query,Categorization,Query language,Query expansion,Information retrieval,Computer science,Web query classification,Supervised learning,Spatial query
Conference
ISBN
Citations 
PageRank 
1-59593-034-5
35
2.52
References 
Authors
3
7
Name
Order
Citations
PageRank
Steven M. Beitzel169646.72
Eric C. Jensen269646.72
Ophir Frieder33300419.55
David Grossman452534.73
David D. Lewis54560737.43
Abdur Chowdhury62013160.59
Aleksandr Kolcz7352.52