Abstract | ||
---|---|---|
Three methods are proposed to classify queries by intent (CQI), e.g., navigational, informational, commercial, etc. Following mixed-initiative dialog systems, search engines should distinguish navigational queries where the user is taking the initiative from other queries where there are more opportunities for system initiatives (e.g., suggestions, ads). The query intent problem has a number of useful applications for search engines, affecting how many (if any) advertisements to display, which results to return, and how to arrange the results page. Click logs are used as a substitute for annotation. Clicks on ads are evidence for commercial intent; other types of clicks are evidence for other intents. We start with a simple Naïve Bayes baseline that works well when there is plenty of training data. When training data is less plentiful, we back off to nearby URLs in a click graph, using a method similar to Word-Sense Disambiguation. Thus, we can infer that designer trench is commercial because it is close to www.saksfifthavenue.com, which is known to be commercial. The baseline method was designed for precision and the backoff method was designed for recall. Both methods are fast and do not require crawling webpages. We recommend a third method, a hybrid of the two, that does no harm when there is plenty of training data, and generalizes better when there isn't, as a strong baseline for the CQI task. |
Year | Venue | Keywords |
---|---|---|
2009 | EMNLP | training data,click graph,query intent problem,cqi task,word-sense disambiguation method,bayes baseline,search engine,web query,commercial intent,backoff method,strong baseline |
Field | DocType | Volume |
Dialog box,Web page,Computer science,Natural language processing,Artificial intelligence,Annotation,Crawling,Search engine,Naive Bayes classifier,Information retrieval,Recall,Machine learning,Word-sense disambiguation | Conference | D09-1 |
Citations | PageRank | References |
5 | 0.46 | 26 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Emily Pitler | 1 | 573 | 27.65 |
Ken Church | 2 | 37 | 3.74 |