Title
Improving Automatic Query Classification via Semi-Supervised Learning
Abstract
Accurate topical classification of user queries allows for increased effectiveness and efficiency in general-purpose web search systems. Such classification becomes critical if the system is to return results not just from a general web collection but from topic-specific back-end databases as well. Maintaining sufficient classification recall is very difficult as web queries are typically short, yielding few features per query. This feature sparseness coupled with the high query volumes typical for a large-scale search service makes manual and supervised learning approaches alone insufficient. We use an application of computational linguistics to develop an approach for mining the vast amount of unlabeled data in web query logs to improve automatic topical web query classification. We show that our approach in combination with manual matching and supervised learning allows us to classify a substantially larger proportion of queries than any single technique. We examine the performance of each approach on a real web query stream and show that our combined method accurately classifies 46% of queries, outperforming the recall of best single approach by nearly 20%, with a 7% improvement in overall effectiveness.
Year
DOI
Venue
2005
10.1109/ICDM.2005.80
ICDM
Keywords
Field
DocType
high query volume,real web query stream,improving automatic query classification,general-purpose web search system,general web collection,semi-supervised learning,best single approach,web query log,user query,accurate topical classification,web query,automatic topical web query,computational linguistics,information retrieval,semi supervised learning,learning artificial intelligence,query classification,classification,supervised learning,data mining,search engines,internet
Data mining,Query language,Semi-supervised learning,Computer science,Web query classification,Artificial intelligence,Query optimization,Web search query,Information retrieval,Query expansion,Sargable,Supervised learning,Machine learning
Conference
ISSN
ISBN
Citations 
1550-4786
0-7695-2278-5
51
PageRank 
References 
Authors
4.51
14
6
Name
Order
Citations
PageRank
Steven M. Beitzel169646.72
Eric C. Jensen269646.72
Ophir Frieder3514.85
David D. Lewis44560737.43
Abdur Chowdhury52013160.59
Aleksander Kołcz662866.65