Title
Cross-lingual query classification: a preliminary study
Abstract
The non-English Web is growing at breakneck speed, but available language processing tools are mostly English based. Taxonomies are a case in point: while there are plenty of commercial and non-commercial taxonomies for the English Web, taxonomies for other languages are either not available or of very limited quality. Given that building taxonomies in all non-English languages is prohibitively expensive, it is natural to ask whether existing English taxonomies can be leveraged, possibly via machine translation, to enable information processing tasks in other languages. Preliminary results presented in this paper indicate that the answer is affirmative with respect to query classification, a task which is essential both for understanding the user intent and thus provide better search results, and for better targeting of search-based advertising, the economic underpinning of commercial Web search engines. We propose a robust method for classifying non-English queries against an English taxonomy and classifier using widely available, off-the-shelf machine translation systems. In particular, we show that by viewing the search results in the query's original language as independent sources of information, we can alleviate the impact of poor quality or erroneous machine translations. Empirical results for Chinese queries show that we achieve remarkably encouraging results.
Year
DOI
Venue
2008
10.1145/1460027.1460046
CIKM-iNEWS
Keywords
Field
DocType
non-english web,non-english language,commercial web search engine,cross-lingual query classification,better search result,english taxonomy,erroneous machine translation,non-english query,machine translation,available language processing tool,english web,preliminary study,web search engine,query classification,english language,information processing
Web search query,Query language,Relevance feedback,Information processing,Information retrieval,Query expansion,Computer science,Machine translation,Web query classification,Natural language processing,Artificial intelligence,Classifier (linguistics)
Conference
Citations 
PageRank 
References 
3
0.39
10
Authors
5
Name
Order
Citations
PageRank
Xuerui Wang11735123.38
Andrei Broder27357920.20
Evgeniy Gabrilovich34573224.48
Vanja Josifovski42265148.84
Bo Pang55795451.00