Title
Using word-sense disambiguation methods to classify web queries by intent
Abstract
Three methods are proposed to classify queries by intent (CQI), e.g., navigational, informational, commercial, etc. Following mixed-initiative dialog systems, search engines should distinguish navigational queries where the user is taking the initiative from other queries where there are more opportunities for system initiatives (e.g., suggestions, ads). The query intent problem has a number of useful applications for search engines, affecting how many (if any) advertisements to display, which results to return, and how to arrange the results page. Click logs are used as a substitute for annotation. Clicks on ads are evidence for commercial intent; other types of clicks are evidence for other intents. We start with a simple Naïve Bayes baseline that works well when there is plenty of training data. When training data is less plentiful, we back off to nearby URLs in a click graph, using a method similar to Word-Sense Disambiguation. Thus, we can infer that designer trench is commercial because it is close to www.saksfifthavenue.com, which is known to be commercial. The baseline method was designed for precision and the backoff method was designed for recall. Both methods are fast and do not require crawling webpages. We recommend a third method, a hybrid of the two, that does no harm when there is plenty of training data, and generalizes better when there isn't, as a strong baseline for the CQI task.
Year
Venue
Keywords
2009
EMNLP
training data,click graph,query intent problem,cqi task,word-sense disambiguation method,bayes baseline,search engine,web query,commercial intent,backoff method,strong baseline
Field
DocType
Volume
Dialog box,Web page,Computer science,Natural language processing,Artificial intelligence,Annotation,Crawling,Search engine,Naive Bayes classifier,Information retrieval,Recall,Machine learning,Word-sense disambiguation
Conference
D09-1
Citations 
PageRank 
References 
5
0.46
26
Authors
2
Name
Order
Citations
PageRank
Emily Pitler157327.65
Ken Church2373.74